Episode 80: Dirty Data – Preventing the Pollution of Your IoT Data Lake
We’ve all heard it before: garbage in – garbage out. This is especially problematic with IoT data where dirty data, data that is well formed but wrong, can cause a butterfly effect chain reaction, polluting the data lake, resulting in bad decisions and bad business. In this episode of the IoT Business Show I speak with James Branigan about what bad data is, how to identify it, stop it and then deal with the aftermath.
- Here’s What We’ll Cover in this Episode
- The key to avoiding HIPAA regulation problems.
- What dirty data is and how to avoid it.
- The difference between cleaning IoT big data and non-IoT big data.
- The four sources of dirty data.
- The three different categories of dirty data.
- The gotchas with OTA systems in real IoT deployments.
- The three ways of finding if there are any data issues.
- How to fix data problems once they’ve happened.
- The differences between discrete product data issues and those found in IoT systems and processes.
In every long-term IoT deployment, chances are that something is going to go wrong with the integrity of your data. As software, firmware and hardware is updated over time, especially at the edge, bugs will creep in and data will get corrupted. By following the best practices discussed in this episode, most of it can be prevented or spotted on route to the data lake, but not all. Then you need to go into forensic mode, which is only possible with upfront planning. Like cyberattacks, dirty data must be planned for upfront. The best way to fight dirty data is with more data, or contextual data to be exact – the type you store in your application protocol.