From Prototype to Production: Chain of Custody

Following on with the Four Critical Design Factors to consider when building an IoT project for the enterprise, this article addresses the third pillar of Chain of Custody. The first pillar of Identity and second pillar of Time have been discussed previously. As a reminder, these three pillars must be built atop a solid Foundation of Trust for the project to succeed.

The need to process and analyze large amounts of data is neither new nor unique to Enterprise IoT. What is significantly more challenging to handle within an IoT system though is the wide variety and sheer number of devices sending data as well as the myriad paths the data may traverse on its way to the system of record. Reliable Chain of Custody – where did the data come from, who handled it, who altered it (and what changed), and when exactly did all of this occur – is critical to ensuring the Enterprise system of record contains clean data for driving insights and actions.

Furthermore, without Chain of Custody, you can’t track back bugs to specific versions of software or hardware, so it is more difficult do the forensic work when things go wrong. A proper Chain of Custody let’s you look at populations of devices and figure out statistical probabilities to decide whether you should do a physical recall, send a recall notice to existing owners, or handle cases individually as you receive support calls. Support for a reliable Chain of Custody often makes the difference between an IoT system that generates significant value for an organization and one that simply generates increased operational complexity.

From Purpose-built to Integrated Systems

In any complex system, IoT or otherwise, errors will occur and you will end up with two sets of data. Data you can trust, and data that must be ignored. The first question is how to determine which data belongs to which set. The second is what to do about it. Designers of old power plant controls, for example, endowed their systems with rudimentary “detect and toss” functionality. General parameters were monitored, and events missing key attributes like timestamps or with wildly out of range values were ignored, and only data meeting expectations was flagged as “trusted” and processed down the line. No finer grained categories, no recording of patterns or common sources – data was either judged to be “good” or tossed out as “bad” by a single-shot gatekeeper and that was that. While this proved to be (marginally) sufficient for a monolithic old-school power plant, it is grossly insufficient for modern distributed enterprise systems.

You Must Understand Everything Along the Chain

Ultimately, you need the three pillars working together to support clean data. Reliable tracking of Identity – who sent, received, and altered the data (and were they authorized to do so), and Time – when did they do it – are the two most common structural pieces that engineers new to IoT get wrong. This leads to dirty data entering the system of record. Even worse, it may also prevent you from knowing which data is dirty and which data can be trusted. That’s bad. Chain of Custody provides the tools to let you locate potentially suspect data after the fact so you can take another look at it. Without this ability, you are permanently stuck with a polluted data lake and any insights derived from it may be incorrect. By planning ahead and getting Chain of Custody right, you can quickly separate good data from bad (with several shades of gray in between) and properly cleanse the data set for use by analytics and other components downstream.

While clean data enables analytics engines and visualization tools to work their magic, Chain of Custody is also what enables you to quickly find the needle that is poking you through the haystack. The ability to flag data as dirty is important. A method of restoring bad data to useful information is even better. What Chain of Custody also gives you are the tools to zero in on the source of the corrupted data and shed light on the most cost effective plan for removal or reconciliation while the overall system continues to deliver corporate value. Without Chain of Custody, you could end up flying blind through a storm of lingering problems, significantly reducing the return on your investment.

Errors, Errors, Everywhere

As mentioned previously, there are many sources of bad (or partially bad) data within an enterprise IoT system. These can range of spurious hiccups that drop occasional event attributes (mitigated by catching and tossing out just a few events here and there) to full blown bogus information from a sensor gone wild in the field (requiring tagging and dropping of anything sent from that particular source). Sometimes the data can be reconstructed, as in cases where a particular version of firmware is known to consistently report a device’s location as 25 miles east of where the device is actually running. With a solid Chain of Custody you can identify all traffic from devices containing the particular firmware version and deploy a software fix that cleans the data after it has arrived from the device (resets the location value 25 miles west). This can be done either before the data is allowed into the data lake (once the problem is known), or run retroactively on historic events already in the system. A huge leap forward from the single-shot gatekeeper of the past.

Other sources of problems range from BIOS power management causing application-level data corruption, to firmware bugs inside the Wi-Fi module itself introducing noise in the stream. Errors can, and will, come from throughout the system. Fortunately each piece of software has version numbers and are subject to revisions, which can be tracked. There can also be issues caused by the software that ingests the data, commonly referred to as a bridge, which parses and stores the raw bytes, preparing them for processing. By engineering the ability to track both hardware and software components through which each piece of data is processed, it becomes possible not only to quickly track down sources of dirty data, but also enables fixes, or “cleaners”, to be inserted into the stream and address known issues before the data reaches the system of record. While there may be three versions of firmware deployed globally for a particular model of embedded board, only one was running on each unit at the specific time you collected the data. Therefore, if your system can reliably track Identity and Time, we can build and take advantage of the Chain of Custody to produce clean data.

Chain of Custody allows you to address all the data and metadata along that processing chain so that you have traceability for each data point. It provides the tools you need to clean structural noise, as well as the ability to compensate later either by marking data as untrusted or potentially reprocessing the data and taking a remediating action. Chain of Custody isn’t just a record of who did the data pass through on the way to the ultimate system of record, but who manipulated that data on the way and what was their software version and configuration.

A Neverending Story

IoT deployments in the enterprise are long-lasting, constantly evolving systems. New features, new devices, and new integrations with changing interfaces mean a continuing flow of data with new and different problems. Even experienced enterprise engineers who are new to IoT aren’t generally aware of this “living system” challenge in the context of device data. While big data analytics people are aware of data quality issues, they generally push data responsibility onto whoever is producing the data.

The inevitable outcome of this failure to produce a reliable Chain of Custody is the unfortunate state of many IoT projects today – very expensive, very complex systems that enterprises simply cannot rely on to produce valuable insights and grow their business. The promise of IoT in the enterprise is real, but the journey is fraught with peril for organizations who lack expertise or partners to guide them along the way.

Think of the End From the Beginning

The challenge of detecting and repairing errors in data from devices and networks make it critical to build an end to end, well-integrated IoT system right from the start. Our experience at Bright Wolf spans the entire design, development, and deployment lifecycle. By preventing the introduction of problems upfront you avoid taking a revenue hit in terms of rework and higher support costs down the road.

Looking Forward

Stay tuned for more about techniques, design styles, and lessons learned. If you’ve got any questions or a particular challenge in mind, contact us and let us know how we can help.

Related posts