Avoiding Polluted Data Streams in Enterprise IoT

The Clear Value of Clean Data

Throughout our series, Four Critical Design Factors for IoT Project Success, we have shown both the importance and challenge of capturing clean, trustworthy data for deriving the insights that produce the immense value of IoT for the Enterprise.

Software expert Hollis Tibbetts has estimated that duplicate data and bad data combined cost the U.S. economy over $3 trillion every year.

Gartner estimates that poor data quality costs the average business $14.2 million annually. Larry P. English, creator of the Total Information Quality Management (TIQM) methodology, finds that “as much as 40 to 50 percent or more of a typical IT budget is really ‘information scrap and rework.’” He continues, “the direct costs of poor data quality can be as high as 15 to 25+ percent of a large organization’s (operating) revenue or budget.”

There is no shortage of statistics on the negative impact on businesses from poor data quality. Dirty data affects business processes, investment decisions and overall productivity—time and money. Add to this the proliferation of IoT connected devices and the corresponding increase in corporate data volume, and the impact of bad data intensifies.

Treat Your Data Like Water

In a municipal water supply system, central treatment facilities purify the water before releasing it out to individual neighborhoods. The alternative – piping dirty water around and cleaning it locally is both more expensive due to distributed cost and inevitably results in inconsistent water quality.

Similarly, a centralized data cleaning delivery model is the most reliable and cost effective way to ensure dirty data is detected, investigated, and either removed or corrected prior to being fed into any systems upstream that depend on clean data.

The alternatives of live streaming data directly into analytics engines and other systems without centralized cleaning mean you are either drinking dirty water (with unpredictable results) or you must implement independent cleaning methods inside each enterprise system that consumes data from the IoT pipeline. If the IoT system provides dirty data, your ERP system would need to do its own cleaning. Your CRM system would also need to clean the data. Every point of integration would need to do its own cleaning. ERP systems will catch what they care about, and miss something CRM cares about, and vice versa. Once these systems do their own cleaning to satisfy their specific needs, you can no longer reconcile the data between the two systems. This is not only cost prohibitive, it’s also error prone.

The IoT system is the hub communicating with every device, and provides the data feed to the various enterprise systems. An effective IoT system stores the raw data, but provides only clean data to other connected systems. The dirty data contains important information about chain of custody, time, and identity. This information holds great value for finding and analyzing issues in the IoT system and the connected devices, and for improving the system and services that rely upon it.

Clean Data Starts at the Source: The Four Design Factors

Data quality can have a huge impact on business decisions, product and market development, productivity and the bottom line. Doing the upfront work to ensure a centralized data delivery model and your IoT system’s integrity in terms of Trust, Identity, Time, and Chain of Custody will help you avoid the roadblocks between prototype and production – and ensure your IoT deployment and the data it produces provide lasting enterprise value.

Looking Forward

Stay tuned for more about techniques, design styles, and lessons learned. If you’ve got any questions or a particular challenge in mind, contact us and let us know how we can help.

Related posts