We’ve talked previously about the importance of centralized IoT data management and data flow and how analytics tools and teams require clean, trusted data to produce value from the enterprise system. Ensuring reports and recommendations are based on reliable data should be a hallmark of any industrial IoT data platform. But the IoT world is a messy world, and sometimes data that is initially trusted and used by the system can later be discovered to be bad. It could pollute months worth of collected data, eliminating the value of expensive analytics tools. Let’s look at two simple examples showing how the right IoT system architecture enables retroactive cleaning of historical data. In each case, the reprocessed data is made useful to the entire organization with minimal effort.
The Big Demo
You’re the manager in charge of an IoT pilot program, and finally have enough data from your initial group of sensors to run a set through the expensive analytics tools your company has invested in. The resulting graph looks like this:
While the system overall appears to be working (data is getting from your devices to the cloud and into your existing enterprise systems), the values reported by several of your sensors contain errors that make for a very poor demo.
While it may be an option to simply dump the raw data to a spreadsheet, remove the data from the problematic sensors, and create a graph from the manually cleaned data set, this won’t help the other systems and team members trying to make sense of the data. It won’t look good when you show the VP a useful looking chart if another team member comes to the meeting complaining that your system is full of useless noise.
What can you do to enable any team member or tool in your organization to generate a chart like this?
With many “drag and drop” IoT platforms your options are likely limited to postponing the meeting, replacing the sensors, and hope the next round of data is better, or deleting the problematic raw data from the database. Neither is likely acceptable if you’re looking for approval to continue and expand the project throughout the enterprise.
In systems that follow a pull-based model and provide a chain of custody for each bit of data that enters the system, you can retroactively flag the data, now identified as dirty, as untrusted so that it will no longer be included in query results to downstream systems who request data for analysis or reporting. Now everyone viewing the initial round of demo data will see the same thing (the second graph), no matter what tools they use. Approval to move forward, granted!
The Billing Disaster
Now let’s look at a production system where real customers depend on your services on a daily basis, and your organization’s reputation and revenue are at stake.
You’re running a large production IoT system, with thousands of customers using your devices and applications to meet their goals each day. One of your services provides mileage tracking for one of your customers’ fleet of rental construction vehicles. You provide a dashboard with a map showing where each piece of equipment is located at any given time, and calculations behind the scenes determine how far each renter has traveled in each vehicle.
You notice that whenever vehicles travel through a recently completed mountain tunnel that their location is briefly reported as being out in the middle of the Atlantic Ocean, then reappears correctly again a few miles further down the road. The most likely cause is a condition where, after losing GPS and cellular signal inside the tunnel, the device regains connectivity to the network prior to obtaining a GPS lock, and sends a default location value that happens to be in the middle of the ocean. It only takes about 3 minutes to go from one end of the tunnel to the other, so from a real-time tracking perspective the issue can be largely ignored. Changing the firmware on the device to wait for GPS signal before sending location data could be expensive, and not all devices are updatable. You let you administrators and customers know about the issue with the tunnel, and assume that will prevent curious phone calls. Problem solved.
About a month later (lining up nicely with the billing cycle), a different set of calls begin. All of your customers whose vehicles regularly travelled on routes through the tunnel have received outrageous invoices for thousands of dollars above their normal monthly charges.
As noted previously, a portion of their service fees are for total miles traveled. While dashboard operators may have ignored the sporadic oceanic location reports, the system itself used each location event to calculate distance traveled. So every time the vehicle was reported to have gone for a quick dip, it was counted as a very long round trip, with charges applied for each of those phantom miles between the mountain and the sea.
Similarly, other reports tracking tire performance, oil change schedules, and other non-customer billing events were also corrupted. Forget about predictive maintenance or machine learning.
Now, if your system had sent data as it came in to each independent component and service, you would have quite a disaster on your hands. There would be little chance of finding and cleaning all the instances of the bogus location events. With a pull-based, centralized data model, however, you can simply flag data received with the erroneous location value (both in the future and from the past). Fortunately for you, the events will then disappear from all downstream reports and analytics exercises. Customers, satisfied.
Mastering IoT Data Flow
For more details on what a pull-based architecture looks like, take a look at the Bright Wolf Strandz data flow. We’ve built large-scale productions systems across a variety of industries where incoming data is of high volume, velocity, and variety. Inevitably this inflow contains errors, either caused by corruption somewhere along the network or from issues with devices themselves. For help with your particular project, let’s talk and see what we can do together.