Useful IoT prototypes can be built from a variety of visual and step-by-step tools, and are instrumental in showing the vision and promise of bringing an Azure or Amazon Web Services IoT platform into your enterprise. When it’s time to design the production system architecture that will support your production business cases, new aspects must be considered and accounted for that were not needed during the original conceptual phase. Your system architecture must be designed upfront with these in mind or your project will stall out before reaching production, leaving your organization vulnerable to more connected competitors.
1) Plan for an Evolving Data Model
Every IoT application has a data model that includes data from the devices themselves, as well as user generated data and data from outside systems. The data model must support the use cases for which the system architecture is being designed.
Modeling this data and allowing the data model to evolve gracefully over time are both areas that require a great deal of attention in the design phase. Model everything out of primitives that track history. This allows you to not just keep time series data, but also the evolution of the schema and other data over the lifespan of the system. An API-accessible graph based, time series data store is critical to the proper operation of Enterprise-grade IoT system architecture. Model the data elements as well as the associated metadata to provide a rich, contextualized graph for algorithms to use to process the data into information.
2) Start at the End (User)
Most IoT systems are meant for consumption by an end-user who may not be an employee of the provider of the system. The user interface can be a mobile or web based application, as well as a voice or event a data driven interface consumed through yet another application. There are several common elements that should be considered for an end user interface to any IoT application.
- User Authentication and Authorization
- Sales Demo vs. Manufacturing vs. Individual vs. Corporate Use Cases (and transitions between them)
- Implementation, Interaction and Visualization of User Story capabilities (including reports)
- Alert Condition Notification Mechanism
- User Generated/Supplied Data
Within each specific type of interface there are other dimensions that need to be considered. A non-exhaustive list includes:
- Mobile – Distribution and Periodic Updates; Minimum Supported OS Versions
- Web – Browser Level support; Page Load Time
- Enterprise Connectors – Authentication Mechanisms, Inter-networking Configuration, Data Formats
- Voice – Command/Query Structure; Device Support (Alexa, Siri, Cortana, etc)
3) Learn the Fundamentals of Trusted Communication
Enterprise IoT systems rely on a foundation of trust. For every connection, the following 3 questions must be answered with confidence at each tier of communication:
- Is the device communicating with the system that it should be?
- Is the device really who it claims to be?
- Can the system validate that the device has not been compromised?
These can only be answered in production when the right architecture design is used from the start.
Device validating Servers
The best practice recommendation is for a device to validate the security of a server by communicating using TLS. Both the HTTP and MQTT protocols support TLS. For legacy protocols that don’t cleanly map into either HTTP or MQTT, other mechanisms are used.
Servers authenticating Devices
In most IoT topologies, a device initiates communication with a server. In other topologies, a device and server both communicate to a trusted intermediary, which relays communication between them. In rare cases, the server initiates communication with the device.
The Use of Challenges
Challenges are a technique for verifying with a high degree of confidence and security that your communications partner is who they claim to be.
The best practice recommendation for IoT applications is for a server to challenge a device. In an ideal world, the device would have a Trusted Computing Module (TCM) in their hardware design that would have a unique private encryption key stored inside. This challenge along with other inputs would be passed to the TCM to generate a response to the challenge. Using this technique, servers can detect if someone is spoofing a device (by having compromised credentials) or if the security of a device is corrupted with some form of malware.
Expiring Tokens are a technique where a device accesses a provisioning service to acquire a token and then uses the token for communication with the IoT application on the server. This technique isn’t as robust as a full challenge mechanism, but it does have the benefit of providing a scoped window of time for exploits in the event the token is compromised.
Another common technique is to generate a shared certificate that is loaded onto the device and the server. This technique can work well, but introduces issues around managing the lifecycle of the certificates.
There are often requirements to authenticate with a specific communication channel in order to be able to establish an internet connection. In the cellular world, this is usually dictated by the network provider handled by the cellular module. In the WiFi world, this is typically dictated by the router infrastructure (WPA2, WEP, etc) and has varying support depending on the WiFi module. It is critical to understand the variation of the end customers WiFi environment in order to ensure that network authentication is possible. For example, many businesses require WPA2 Enterprise to connect to their WiFi, but not all WiFi modules support this standard.
4) Understand the Complexity of Identity
Identity is a fluid concept in IoT systems architecture. There are multiple notions of identity, which can vary depending on your frame of reference. More importantly, each sub-type can change over time with impacts that ripple throughout the system.
Take a look at 7 examples of different identities that a single device can have in an enterprise IoT platform.
- Communication Network Identity – the uniqueness value for the communication channel such as a MAC address for WiFi devices and IMEI or MEID for cellular devices.
- Manufacturing Identity – the uniqueness value for the manufacturing system used to track the serialized and unserialized (by batch number) sub-component parts that went into the BOM of a specific device.
- Enterprise Identity – the uniqueness value for the Enterprise IT system used by software that needs to know about a device.
- IoT Identity – the uniqueness value for the IoT system used to associate incoming communications with a specific device and to address outgoing commands to a specific device.
- Customer Identity – how the customer refers to the device, often not specific to the device but rather the thing/asset that the device is monitoring and controlling, or a location in a process.
- Device Identity – what the device knows and can report to an IoT system about its own identity.
- Financially Responsible Entity Identity – the party that is ultimately the “owner” of the device and responsible for paying any service fees.
Identity Changes and Impacts
Further complicating the matter, these identities often change over time. It is critical to have an identity mapping early in a project and to ensure the wider team from manufacturing to enterprise is aware of the impacts of any changes to identity. Ideally, there would be a 1:1 mapping between the Manufacturing, Enterprise, IoT and Device identity, but this is rarely the case in retro-fit scenarios or when multiple vendors are involved. There are at least 5 key events that trigger changes to at least one type of identity.
- Complete Device Replacement – Customer Identity may stay the same, but the other identities may or may not change. If the prior device is going out of service permanently, it should be marked inactive in all systems and if it is sold it needs to be moved into the purchasers list of devices. Historical data must be properly segmented and permissions set depending on the particular terms of sale.
- Subcomponent Replacement – a serialized sub-component may be replaced as a part of servicing the device. Data about the replacement component (and it’s history) and the one going out of service often need to be recorded, and tasks such as resetting of timers for maintenance alerts must be addressed.
- Sale of Device – the sale of a smart motorcycle or other product changes the Customer Identity but the other device identities may not change, and impacts histories from sensors in various ways and may even have legal data privacy concerns for some geographies and datatypes.
- Communications Module Replacement – when better data rates become available and SIM cards are swapped out, the Communications Network Identity changes, which can be indistinguishable from a complete device replacement or programmable serial number scenario.
- Programmable Serial Numbers – when the identity is programmable in the device, which can contain errors or omissions of key steps leading to duplicate identities. Incoming data may be mis-associated with the wrong device history or orphaned altogether.
5) Listen to the Ticking Bomb of Time
For each datapoint in an IoT system, there are multiple notions of time. Event Time, aka ‘actualized at’ time, describes when the physical event happened. Server Time, aka ‘created at’ time, describes when the physical event arrived at the IoT system.
Many events occur in a domain without reliable access to a reference clock. In NTP terminology, the events happen in Stratum 16, which is the unsynchronized time domain. Depending on topology, events and their unsynchronized times may be handed off across a network with multiple hops before reaching a system with access to a reference clock. Each of these handoff points introduces ambiguity into the time of the event, such that Event Time is actually a probability distribution, rather than a discrete point.
Event Time Formats
Event Times are tricky due to the fact that events may occur on systems without reliable access to a reference clock. To enable analytics and machine learning from real-world data, time must be properly accounted for. Here are 5 scenarios where time formats vary for events that happen:
- Prior to gaining a GPS lock or reference time from a cellular network
- In a device that is intermittently connected to a network.
- Without knowing the device location to determine a time zone offset
- In a system that only contains an incrementing counter from boot time
- When the real-time clock power source is removed or replaced
As a best practice, time should be represented in ISO8601 format from the point that the event is first handled by a system with access to a reference clock. If known and trusted, the timezone offset of the event should be preserved to the extent possible. In data handoffs between Stratum 16 devices, using clock ticks is a useful technique for synchronizing Event Times.
In cases where a device is operating temporarily without access to a reference clock, a best practice is to use clock ticks for the Event Times and then apply ISO8601 timestamps retroactively once a reference clock is established. This is most common in the case of acquiring a GPS lock much later after startup.
6) Use Cached Data to Store Events on the Device
Cached data is a buffer of events that can be requested and reported at a later time. This technique is critical for devices that go in and out of coverage areas. It is also a best practice for any WiFi devices, where they wish to keep recording data even if the WiFi connection is disrupted.
Devices must be capable of associating a timestamp with each event, and respond to requests for data that occurred between time X and time Y, between X seconds ago and Y seconds ago, and for data that occurred between X seconds and Y seconds from the end of bootcycle Z.
It is best practice for devices to keep a rolling circular buffer containing h the timestamps, clock ticks and bootcycle for the events. If events are not formatted until transmittal, special care needs to be taken when writing new firmware versions to ensure authentic rendering to the firmware that was in play at the time of the event.
Getting to Production
While lengthy, this list includes just a few of the architectural challenges you will face when building a production IoT system and best practices to follow. For organizations without in-house enterprise IoT system and platform design expertise, it may be wise to seek advice from a partner like Bright Wolf. We’ve helped several Fortune 1000 businesses worldwide with Azure and Amazon Web Services (AWS) IoT solutions and are happy to chat about how we could help you too.