Bright Wolf Blog

Why Your Data Scientist Can’t Spin Your IoT Data Into Gold

Data Science, as described at the first international workshop on Data Science for Internet of Things, part of the IEEE International Conference on Mobile Ad hoc and Sensor Systems (MASS) in October 2016, is “an interdisciplinary field that involves techniques to acquire, store, analyze, manage, and publish data. Data can be analyzed using machine learning, data analysis, and statistics – optimizing processes and maximizing their power in larger scenarios.” A key finding of the workshop’s organizers was the importance of well-planned IoT data management and how it impacts the success of any IoT project depending on its capacity for allowing researchers to reproduce scenarios, and optimize the acquisition, analysis, and visualization of the data acquired by IoT devices.

Good Data Management Enables Better Data Science

Moving forward, more data cleaning, pre-processing, and exploratory data analysis will be automated. With this in mind, the future value of insights and analytics produced by data scientists and their tools will increasingly depend on the Internet of Things data management capabilities and architecture of the IoT system itself.

“The monetary value of an IoT system increases at a rate directly proportional to the system’s ability to enable data scientists to learn from incoming data and then rapidly operationalize those learnings.”

In many organizations, there is a one way street from the IoT system to the data science team – here’s a mountain of raw data for you to clean and learn something from. See you again tomorrow. Same place, same time, same messy data flow.

In a proper IoT data management architecture, methods of cleaning are iteratively automated and incorporated into the body of the system, enabling data scientists to dive into learning and deriving actionable insights from each new batch rather than manual pre-processing. Critically, the insights and analytics themselves are operationalized as well, bringing the next round of challenges to the forefront automatically.

It’s Not Going to Get Easier

Adding to the challenge, the data scientist, AI designer, and author Ajit Jaokar calls out the importance of iterational design not just in the cloud, but at the edge as well. The combination of volume, variety, and velocity of IoT data pouring in from sensors and devices and the challenges of latency and connectivity encourage a design where trained models (rules, recommendations, scores, etc) are created in one location and deployed at multiple points.

In their paper Data Science and Machine Learning in the Internet of Things and Predictive Maintenance, the Data Science Group at SAP makes it clear that data science faces “many new challenges in the domain of IoT while at the same time the traditional challenges have not gone away.” They also note the major activity in the data science process is “identifying, accessing, and preparing data for analysis.” So what separates an internet of things system that serves as a flywheel for business innovation and increasing revenues, from a Rumplestiltskin-esque nightmare of failed promises to spin data from your devices into gold? It is the ability to ingest the learnings from each day into the system in a manner that enables the team to seek the next higher order of value rather than repeat the process ad infinitum.

How can this be accomplished?

The 3 Keys to operationalize IoT data cleaning & learning algorithms

As mentioned, the ability to operationalize cleaning algorithms and learned insights is paramount to building a system that becomes more valuable over time as a factor of the volume of data collected. This applies to both inputs (the raw sensor data and information from 3rd party sources like tide levels and fuel prices) and outputs of the current generation of analytics and machine learning tools. A continuous loop means iterative improvement.

Secondly, storing the incoming events as immutable data points (including metadata) in a single source of truth not only ensures a complete audit and debugging trail, but the data is also modeled to enable analysis from many angles across different axes. Normalization and transformation become dramatically easier, with these clean, “synthetic” values being provided to downstream systems while the original raw inputs are maintained for deeper learning, analysis, and troubleshooting down the road.

Lastly (for the purposes of this article), the notion of trust must be foundational to the IoT system architecture design for both security/provenance (is the data from an authorized source) and reliability (is the data accurate). The ability to flag data (at a system level) from your devices as trusted or untrusted as a way to protect downstream analytics and other enterprise systems is critical to driving actions and deriving insights that benefit the business. Furthermore, the state of the trust flag for each bit of data must be editable by the internet of things system for any particular point in time – if you learn an air temperature sensor on your machine became detached and immersed in water for 7 weeks and then repaired, you need to be able to remove data from that sensor in historical reports for only that 7 week period while maintaining all other periods of time and for all users and systems who query this data. Without the ability to clean data retroactively, much of the hard work of data scientists never makes it back into the main body of the system.

Putting it all together

At a high level, such a system will look like this:
Industrial IoT Systems to Promote Effective Data Science

To learn more about designing industrial IoT systems that promote effective data science and become more valuable over time, get in touch with our team and let us know how we can help.

About Bright Wolf

Bright Wolf helps industrial enterprises increase business value by transforming operations and organizations with digital strategy, technology, solution delivery, and team enablement.

Industrial IoT Newsletter

    Protected by reCAPTCHA, Google Privacy Policy and Terms of Service apply.
    Featured in…

    IoT OneCIO ReviewIoT Agenda IoT Evolution IoT Inc IoT Central IoT for All Industry Today

    Learn how Bright Wolf can help your team

    Bright Wolf IoT Services
    Bright Wolf Services

    Digital strategy, architecture, development, integration, and operations

    IoT Platform Accelerators
    IoT Platform Accelerators

    Connect equipment and generate value in the cloud faster with AWS and Azure solution starters

    IoT Case Studies
    Client Success Stories

    Learn how Bright Wolf clients are optimizing operations & creating business value for customers

    Privacy Settings
    We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
    Consent to display content from Youtube
    Consent to display content from Vimeo
    Google Maps
    Consent to display content from Google
    Consent to display content from Spotify
    Sound Cloud
    Consent to display content from Sound