Bright Wolf Blog, Partners

5 Best Practices for Running Scalable IoT Architecture Solutions on AWS

Whether you’re building a connected solution for asset management, predictive maintenance, yield optimization, or another business goal, the ultimate value returned to your organization will be determined by how well the solution meets your customers’ domain-specific requirements and individual needs.

By properly leveraging cloud infrastructure and services as outlined here, the IoT solution you build today can continue to “just work,” no matter how big or fast your business grows. Because if it does develop the way you hope it will, you’re in for a data tsunami.

Whether the flood of incoming data comes from a sudden surge in business, consistent monthly growth, or even a malicious attack, your system should be prepared to handle it right from the start. Fortunately, our architects bring unique expertise in the vast array of cloud infrastructure components needed to build a system where all your data is always reliably processed without fail.

By implementing solid IoT architecture up-front, you free up your team to focus on building application intelligence, where the value comes from the data. The insights you can get from that data let you move ahead confidently with your market strategy, no matter how ambitious it may be.

1. Pay attention to the critical link

The ability to ingest data quickly and reliably from your IoT devices is crucial to the success of your IoT initiative. AWS IoT Rules allow you to trigger many different actions upon receipt of a message. While many cloud services adapt to different data flow properties, not every service was designed to be used as the single point of entry into the system. Some have behaviors that could prevent some or all your data from being processed the way you would expect.

For example, in high-volume applications, you should consider buffering or queueing incoming data before invoking downstream services like AWS Lambda. Buffering and queueing each can help ensure an application can still recover in the event of subsequent failures.

With unpredictable spikes in data load, there could already be hundreds of thousands of messages in the pipeline in the time it takes for a Lambda function to spin up. Going to Lambda directly, as in the following chart, could result in message loss — but if our data is buffered in Amazon Simple Queue Service (SQS) first, downstream systems have more processing time for any large spikes in incoming data.

iot data tsunami

2. Get a life(cycle)

IoT devices are notoriously difficult to manage over the lifespan of initiatives that often take several years to make it into production. Once deployed, devices need to be trackable and maintainable while facing challenges from varying business needs, staff turnover, and hardware/software degradation. Luckily, Amazon recently added features to AWS IoT Core that address these concerns head-on.

Configurable endpoints, in conjunction with multi-account registration, provide some powerful abstractions that enable you to communicate with IoT devices using environment-specific accounts and device-specific endpoints. These abstractions free developers and operators from the daunting challenge of organizing disparate AWS IoT resources from the same AWS account with the same IoT endpoint.

Real-world IoT deployments often face networking challenges in the form of device users’ legitimate concerns about enterprise security. IoT devices need regular maintenance to ensure that they remain reliable and secure over long periods. Providing access to devices that tend to be hundreds, if not thousands, of miles from the home base of operations teams is challenging. The addition of secure tunneling to AWS IoT Core gives teams the powerful capability to access devices protected by private networks.

aws iot rules

3. Sometimes the best way to scale is to not scale at all

We don’t want to neglect to mention the fact that sometimes, in some situations, you don’t want to process all of your machine data in the cloud, as it can get expensive. Once you’ve had some growth and are collecting a robust amount of data, you might be better off running a significant amount of your computations and algorithms elsewhere.

One way to reduce traffic to the cloud is to run AWS Greengrass at the edge. Greengrass can intelligently process and filter data locally, eliminating the need to send all device data upstream. Greengrass also allows you to harness the power of machine learning through the ML Inference engine. Machine learning models are developed — in AWS SageMaker or independently — and are run on the device.

This approach allows you to leverage the volume and complexity of cloud-side data with the flexibility of device-side execution.

While cloud infrastructure allows your system to automatically scale when it needs to, it also provides the ability to process data from your devices whenever and however you want to.

4. Choose the ingest that is best

Just because you are building an IoT system doesn’t mean all your different types of data have to go through AWS IoT Core. AWS IoT supports both MQTT and WebSockets, and there are appropriate settings for each service based on recommended use cases. If you ever find yourself bumping up against AWS IoT service limits, there’s a good chance you’re just not using the right service for the task.

MQTT is perfect for the small payloads of sensor data that are typical in IoT systems. In a legacy system architecture with log files or XML file dumps, however, it may make more sense to parse the files and send selective values up, in small chunks, via MQTT. Let’s say you want more than a few values out of the file: it could, in fact, be more appropriate to send file data via Kinesis or directly to S3 and process it later. MQTT exists to enable communications on tiny, embedded devices, why use your device’s limited power for parsing when it can you can offload it to the cloud?

The story is the same for historical and third-party data. Many legacy devices can’t run newer SDKs that enable MQTT and WebSocket protocols. However, many of these systems can make HTTP requests or use plain TCP sockets for communication. By leveraging services such as Amazon API Gateway and Lambda, you can build custom REST APIs that allow you to ingest data into an SQS queue, an S3 bucket, or an RDS database. By proactively choosing the right tool for the right job, you can maximize the scalability your overall solution provides and realize more value from IoT on AWS.

aws iot amazon greengrass

5. Don’t Drown in Your Data Lake

While it’s essential to make sure your data ingestion solution is scalable, keep in mind that other aspects of your IoT operation need the same critical early attention.

For example, let’s say you’re collecting data from a range of sensor devices. If you’re working with an AWS solution, you might host a data lake in S3 and use Kinesis to send Things data there. Then you might use AWS Redshift to run some specific queries at a certain set of data points.

But there are insights you might miss if you aren’t looking to the future, says Tim Hambourger, a senior architect at Bright Wolf.

“It’s an art, not a science,” Hambourger says of data collection. “It depends so much on what kind of data you’re talking about and what you might hope to someday do with it.”

What to do if you’re not sure which of the million data points you collect will be the ones that might become important in the future? It’s OK not to be clairvoyant. Just regularly apply metadata. AWS Glue can be a great solution for this. AWS Glue crawlers discover and search across multiple AWS data sets, writing metadata, and creating a Glue Data catalog. You can use the information in the Glue Data catalog to create and monitor ETL jobs in AWS AthenaAWS EMR, and AWS Redshift.

If you’ve applied metadata in a consistent way, you might reap an enormous benefit later, too. Often, the assumption is that you’re going to do the analysis now that the data’s in the data lake, just make it accessible to tools you already have.

Putting it All Together

Bright Wolf has used these best practices to deliver IoT solutions on AWS across manufacturing, oil & gas, cold chain transportation, health care, agriculture, smart building, heavy equipment, and other industries. If you’re looking to get started on a rapid prototype or are facing challenges with the architecture of an existing system, we can help. Let’s talk about your project, and we’ll share more guides and case studies to accelerate your digital strategy and implementation initiatives.

bright wolf aws iot architecture

About Bright Wolf

Bright Wolf helps industrial enterprises increase business value by transforming operations and organizations with digital strategy, technology, solution delivery, and team enablement.

Industrial IoT Newsletter

    Protected by reCAPTCHA, Google Privacy Policy and Terms of Service apply.
    Featured in…

    IoT OneCIO ReviewIoT Agenda IoT Evolution IoT Inc IoT Central IoT for All Industry Today

    Learn how Bright Wolf can help your team

    Bright Wolf IoT Services
    Bright Wolf Services

    Digital strategy, architecture, development, integration, and operations

    IoT Platform Accelerators
    IoT Platform Accelerators

    Connect equipment and generate value in the cloud faster with AWS and Azure solution starters

    IoT Case Studies
    Client Success Stories

    Learn how Bright Wolf clients are optimizing operations & creating business value for customers

    Privacy Settings
    We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
    Consent to display content from Youtube
    Consent to display content from Vimeo
    Google Maps
    Consent to display content from Google
    Consent to display content from Spotify
    Sound Cloud
    Consent to display content from Sound