This Azure IoT reference implementation guide provides industrial equipment manufacturers an accelerated, flexible path for delivering differentiated connected products to gain a competitive advantage through digital transformation with Microsoft Azure. This guide follows best practices and patterns outlined in the official Microsoft Azure IoT Reference Architecture with additional domain-specific recommendations and in-depth walkthroughs based on Bright Wolf’s decade of experience designing and delivering connected products for some of the largest companies in the world.
Collecting and transforming data from pumps, motors, filters, chillers, and other industrial equipment across manufacturing, oil & gas, cold chain transportation, healthcare, agriculture, and other verticals – and integrating this data into enterprise systems for generating insights and actions embedded inside your customers’ operations – presents a unique set of challenges for equipment manufacturers that don’t apply to other kinds of IoT projects. These include incorporating legacy devices already deployed in the field, achieving and maintaining regulatory compliance, and integrating securely with a wide variety of applications and business tools across multiple business units and product lines.
Bright Wolf works directly with industrial OEMs from prototype to production to help you design and commercialize connected versions of your products. We refer to this “Thing Maker” sub-category of IoT as Connected Product Systems. This document is intended for product management and technical teams in engineering who are considering developing an industrial Connected Product System using an Azure platform.
Azure IoT Reference for Industrial Connected Product Systems
Over the last decade, Bright Wolf has built production enterprise IoT systems deployed globally across a variety of industries. Our experiences (both successes and failures) have taught us that there are 3 key foundational architectural areas especially critical to connected product system success: asset and data modeling; access control; and an enterprise API. When designed correctly, these fundamental components can enable the agility and control required to succeed in an evolving IoT space. When done incorrectly or deferred, the results can prove fatal to the project and inflict lasting damage on the organization responsible.
Things, Insights, and Actions
Microsoft describes IoT applications as Things (machines or devices), sending data or events that are used to generate Insights, which are used to generate Actions to help improve an enterprise or process. An example is an engine (a thing), sending pressure and temperature data used to evaluate whether the engine is performing as expected (an insight), which is used to proactively prioritize the maintenance schedule for the engine (an action). This document provides a reference implementation for accelerating delivery of Azure IoT solutions that take action on business insights found through gathering data from industrial assets, enterprise systems, and 3rd party data to generate improved business outcomes and customer value.
Architecture Overview and Subsystems
The Microsoft Azure Reference Architecture document explains how IoT applications consist of the following core subsystems: 1) devices (and/or on premise edge gateways) that have the ability to securely register with the cloud, and connectivity options for sending and receiving data with the cloud, 2) a cloud gateway service, or hub, to securely accept that data and provide device management capabilities, 3) stream processors that consume that data, integrate with enterprise processes, and place the data into storage, and 4) a user interface to visualize telemetry data and facilitate device management. IIoT solutions also require 5) telemetry data transformation which allows restructuring, combination, or transformation of telemetry data sent from devices, 6) machine learning which allows predictive algorithms to be executed over historical telemetry data, enabling scenarios such as predictive maintenance, and 7) user management which allows splitting of functionality amongst different roles and users. Bright Wolf solution implementations include the Azure services as outlined in the Reference Architecture, with the addition of our data management services, GearBox edge suite, and SpringBoard application blocks for accelerating delivery of Azure IoT solutions designed for meeting the rigorous enterprise requirements and unique complexities of industrial connected product systems.
Connected Product Specific Design Considerations
There are many design considerations that a Connected Product architect needs to consider when planning development of the solution. This section highlights the key considerations.
Enterprise Rest API
When creating a Connected Product, it is crucial to establish the digital interface to your product, just as you would establish an interface for your physical product. Your physical product will have mounting brackets and other interfaces that are managed over the life cycle of that product line. In the same way, the digital interface to the connected version of your product will have data formats, protocols, and other digital interfaces that require the same degree of care, attention, and lifecycle management. Integrations with other business operations are required for delivering the outcomes promised by your Connected Product initiative. These integrations should be created through a managed interface, not by allowing ad-hoc connections to the inner workings of your Connected Product System. Establishing a clean boundary between the Connected Product System and other company systems enables ongoing evolution and avoids significant rework as requirements change. An ounce of prevention is worth many pounds of cure in this case.
User Management is an area where the wrong implementation can hobble the business and go-to-market strategies, and result in cross-cutting rework of the system. It’s also critical to the security and trustworthiness of the operations. The roles assigned to each user need to protect both the data in the system as well as the operations that can be performed. In many IoT solutions that are made available B2B, the end customer will desire to use their own Identity Provider (IdP), to manage the authentication of their own users. The authorization of those users will be left as a concern for the IoT system. When multi-tenant capabilities are required by enterprises seeking to service multiple customers from a unified system, this presents a challenge of how to integrate with multiple IdPs from each end customer. A common mistake made when attempting to implement authorization capabilities is to rely on the IAM service to provide the enforcement capabilities for your system. This approach may work when it is an internal operation with relatively few users, but it becomes untenable to IT departments to allow your customers and other non-organizational members of your value delivery chain to have IAM users in your Azure subscription. In an ideal scenario, user role descriptions are made in a declarative form that can be easily audited by your CISO and CIO to ensure that the roles don’t leak data or allow access to a portion of the Azure architecture that they shouldn’t. It is also critical to understand the potential for extensibility. For example, consider what it would take to add new role types to the system later.
Data Layer Considerations
Additional key design factors for production Connected Product Systems include how the architecture handles Identity, Time, and Chain of Custody. While it’s possible to deliver a system that does not account for each in depth, data quality will progressively degrade and make extraction of business value all but impossible.
Authorization Enforcement Points
To simplify management and auditing, implementers should consider centralizing the policy decision points, so that enforcement isn’t spread throughout the code-base of the system. Using a declarative policy description language also simplifies auditing of the policies and ensuring that policy authors work with an already vetted suite of policy choices.
Device and Data Models
Delivering enterprise outcomes requires taking data sources, metadata, and context, and allowing external systems to access, reason about, and compute over those inputs. How you model the data and where you model and describe the metadata determine the level of value your system can ultimately deliver. A well architected operation should be able to provide context and meaning for all consumers of data from the system, including unknown future users and external processes needed to drive evolving business models. Bright Wolf’s experience is that keeping data and metadata external from the application code or business logic provides far greater flexibility as your business case evolves. Doing so enables you to integrate your system with other data stores and value extraction points, such as Power BI, that need access to the metadata, not just the raw data streams.
Data Records and Streams
The Azure IoT Reference Architecture provides a good overview of the design considerations for Data Records and Streams. There are a few additional items that are important to industrial manufacturing equipment makers seeking digital transformation.
Many devices persist data locally and transmit that data at a later time to the cloud (perhaps when a device loses and returns to cellular connectivity). It is often desirable to transmit current state upon establishment of cellular communications so that any current alarms can be transmitted immediately. Following the current state transmission, a request would be sent to start sending the cached data. This pattern results in both re-transmission of previously received data, as well as out-of-time-order arrival of telemetry data. In cases with cached data, it is critical to ensure the data ingestion pipeline can properly handle cached data ingestion without firing spurious notifications.
Depending on the device’s capabilities, there are several issues that can come up regarding timestamp resolution. In some cases the device has a higher resolution timestamp than the cloud process is designed to support. In this case, it can be critical to keep order of event information for events from the same machine that map to the same timestamp resolution. In other cases, devices may not have a valid clock timestamp when the event occurs. In these cases, there are various techniques for attempting to derive the timestamp of the event based on a synchronization of the server time with the device clock.
Over the lifespan of the system, it will be the case that sensor data will be determined to be bad. As an example, this situation is often due to a broken sensor. This data needs to be marked as untrusted in the data record and stream so that consumers of the data understand not to rely on the reading.
Data Records Format
One additional best practice to consider is to send the firmware version along with the message version in the payload. We have encountered many bugs from device makers where they intended for the new firmware to support a protocol version, but they in fact introduced a bug that wasn’t caught before the firmware was fielded. By including the firmware version in the payload, rules can be configured in your data processing pipeline, for example Azure Stream Analytics, to handle messages differently by the combination of firmware version and protocol version.
There is a complex set of considerations regarding pending commands that are being held on the cloud side, waiting for the device to connect to the platform again. Sometimes these commands may cancel each other out and shouldn’t be performed at all. Other times the commands may have a time to live before they are no longer valid. The state machine to manage this logic is often custom based on the specific control model of the device. Building this control capability directly from cloud primitives and verifying the control loop for correctness can be challenging.
Devices, Connectivity, Field Gateways (Edge Device), Cloud Gateway
One additional consideration on this topic is the situation where devices exist outside of the synchronized time domain. In terms of the Network Time Protocol (NTP) these are stratum 16 devices. In these cases, it is critical for the first device that has access to a synchronized time source to attempt to timestamp the events and telemetry data to the best of its ability. It is frequently the case that these timestamps have some degree of ambiguity. It can be useful to upstream consumers to pass along that variability, along with information about the order of arrival.
Topology and Entity Store
The Azure IoT Reference Architecture describes the topology and entity store as a foundational component of your IoT system. Implementing this part of the system is time-consuming, difficult work that is central to the success and longevity of the overall operation. In an effort to move quickly to a prototype, most system design projects shortcut these important aspects and later regret that decision as retrofitting foundational architecture proves prohibitive. This is further exacerbated as the initial system grows or is replicated across multiple product lines or business units within a company or organization. Similar to other enterprise infrastructure components leveraged by development teams for IoT projects, Bright Wolf provides production-ready services in this area, enabling development projects to proceed quickly without sacrificing these critical design factors. This foundation layer provides a flexible IoT Data/Metadata model and Data Management API layer for your topology and entity store. The data itself is still stored in Azure managed data stores in your Azure subscription, ensuring that you stay in control of your data.
Rather than implement access control checks in all of your application code, declarative policy descriptions enable enforcement of access control over the data in your topology and entity store. You can have one topology and entity store for all your customers and feel confident that they can’t see each other’s data, with an hierarchical and tag based organization by Customer, Customer Site, Sub-Site, and Groups, granting role assignments in various contexts such as Device Type, Customer, Customer Site and Sub-Site.
Business Systems Integration and Backend Application Processing
For Connected Product Systems, the enterprise API is one of the most critical aspects of the entire system, requiring diligence and design for the solution to be capable of delivering the required enterprise outcomes.The API should exist between the system and all other consumers of data from the system. This provides a stable boundary that can be managed as the product evolves. Web Applications, Mobile and Voice Applications and Enterprise systems (including those implemented with Azure Logic Apps) should interact with the Connected Product System through its API. Implementing these integrations through an API prevents tight coupling between systems, provides flexibility in evolving the system over time, and prevents cascading failure modes between systems. It also ensures proper data and user policy enforcement for all consumers of the API, as described in the User Management paragraph.
Machine Learning (At-rest data analytics)
Predictive Models take significant volumes of data to train. These models must then be deployed, and the required input data must be shared with the model to generate predictions. These predictions are then made available to the system applications and end users. Perhaps the most important aspect of an ML-enabled industrial IoT system is the concept of learning loops. Learning loops are feedback loops from humans in the field on the accuracy of the prediction. This feedback mechanism is critical in two ways:
- Providing a feedback mechanism to humans beings provides them with a voice in the operation and has been found to greatly aid in adoption rates for relying on predictions.
- Data scientists desperately need feedback on the performance of their models in the real-world.
As a best practice, the feedback on the prediction accuracy should be collected and stored in the entity and topology store and accessed through the IoT System API.
Industrial Learning Loops and Predictive Maintenance
Connected Product makers are uniquely positioned at the right point in the value chain to develop a large and varied enough training set to train predictive maintenance models for their products. To improve the accuracy of their models over time, feedback should be collected from the consumers of each prediction, such as a “yes/no” response from service technicians as to whether or not each machine reported as needing service did in fact have issues requiring attention. Such learning loops are critical to the evolution of your predictive model over time as well as for gaining acceptance of the value of the prediction by field service staff.
Asset vs Device Lifecycle
In IoT, while devices may be sending the data, it is the assets that are the logical thing being monitored. There are several cases where this distinction can matter over time: Asset Sale – Some of the time stream data should be available to the new owner and some of the data should not. For example, the maintenance data should transfer to the new owner, but the prior location or performance history should not transfer. Device Malfunction – A device may cease to function due to a variety of reasons. A new device should be logically associated with the asset without interrupting the history of the asset’s data and metadata. Device Return – Devices may be returned by a customer and reconditioned and sent out to another customer. In this case a device needs to associate its data with a new asset. The new asset owner should not be able to see any of the data of the prior asset. Here is a more detailed look at an Azure IoT cloud architecture following best practices outlined in the Microsoft Azure IoT Reference Architecture.
Bright Wolf components provide critical industrial connected product system capabilities such as asset and data modeling, access control, and an Enterprise API. For low power devices, Azure Sphere and the Azure IoT Client SDKs are used to communicate with Azure IoT Hub. For more powerful devices and Gateways, Azure IoT Edge is used to manage containers on the edge. Bright Wolf GearBox edge technology provides a variety of additional edge capabilities including configurable protocol adapter runtimes, onboard historian, local HMI, and integration with various Azure IoT Edge components.
Industrial Protocol Clients
Configurable protocol adapter runtimes are able to collect, process, and transform machine data from your control systems and programmable logic controllers (PLCs). These include popular protocols like Modbus, OPC-UA, J1939, and other industrial PLC protocols. Toolkits are available for adding additional protocol support as needed.
A local, onboard historian keeps history of the local sensor data and various other computed outputs that need to either be shared locally or sent to the cloud. This historian has a configurable database option and can make use of a variety of local datastores.
Web HMI Framework
GearBox also contains a local HMI framework and library, built in HTML5 and CSS for enabling the rapid creation of local user interfaces for the Connected Product. In cases where an HMI refresh is desired, this toolset enables rapid development of a new brandable user interface. These HMI components support touch interfaces as well as physical hard button input.
To enable edge developers to easily specify the payloads that they want sent to the cloud and the conditions and rules for transmitting those payloads, GearBox also contains a configurable Gateway Client that interfaces with Azure IoT Edge or the Azure IoT Client SDK. Together with Azure services at the edge, these components enable development teams to produce industry solutions running in the field with the power to execute local logic, make decisions, and take action locally with or without connectivity to the cloud. Additionally, this flexible architecture provides gateway hardware independence for preventing hardware lock-in.
Azure IoT Solution Acceleration for Industrial Manufacturing Equipment Makers
Using this Azure IoT reference architecture at the edge and in the cloud, industrial equipment manufacturers are accelerating their digital transformation with Microsoft Azure IoT solutions, avoiding commoditization, and gaining a competitive advantage in the market. To learn more or get started, let us know your requirements and goals and we’ll help you put together a plan for rapid delivery of your connected product systems.