Solution Pattern: Edge to Core Data Pipelines for AI/ML
Architecture
1. Common Challenges
To better explain and detail the reasons for the existence of this solution pattern we’ll picture some common needs and challenges amongst organizations that already have production systems and seeks innovation and modernization.
AI/ML production deployment is an iterative process that goes beyond simply generating AI/ML models or even ML Ops.
The primary milestones in the AI/ML life cycle depending on an organization’s business goals for its AI/ML effort are as follows:
-
Collect and prepare the data needed for your AI/ML project. It is critical to be astute in terms of what functions take place at the edge. We do not want to do compute or storage intensive operations at the Edge, but rather should use the core or cloud. What we should concentrate on at the edge is data collecting and data inference, where you will use AIML to make real-time decisions.
-
Create ML models based on business objectives. MLOps should take place on the core or in the cloud to retrain models as needed.
-
Integrate the ML models with the intelligent applications that you will create to serve the model and assist in making critical business choices.
-
Model accuracy should be monitored and managed throughout time.
As organizations continue to grapple with massive amounts of data being generated from sources ranging from device edge to off-site facilities and public and private cloud environments, data managers are encountering a new challenge: how to properly ingest and process that data to receive actionable intelligence in a timely manner.
With fresh, relevant data, businesses can learn effectively and adapt to changing customer behavior. However, managing vast amounts of ingested data and preparing to make that data ready as soon as possible—preferably in real time—for analytics and artificial intelligence and machine learning (AI/ML), is extremely challenging for data engineers.
2. Technology Stack
Red Hat® OpenShift® Container Platform, Red Hat OpenShift Data Foundation, Red Hat Application Foundations, and other tools can help automate the workflow of data ingestion, preparation, and management building data pipelines for hybrid cloud deployments that automate data processing upon acquisition.
-
Red Hat Application Foundations
-
AMQ Broker: Pure-Java multi-protocol message broker. It’s built on an efficient, asynchronous core with a fast native journal for message persistence and the option of shared-nothing state replication for high availability.
-
AMQ Streams: Based on the Apache Kafka project, AMQ Streams offers a distributed backbone that allows microservices and other applications to share data with extremely high throughput and extremely low latency.
-
Red Hat build of Apache Camel: Utilize the integration capabilities of Apache Camel and its vast component library in the Quarkus runtime, optimizing for peak application performance with fast start up time.
-
Red Hat build of Quarkus: Utilizes an innovative compile-time boot process that moves typical runtime steps to compile time. The result is an application that can consume as little as 10’s of MB of memory and start in 10’s of milliseconds.
-
3. An in-depth look at the solution’s architecture
This solution pattern implements an architecture that allows large organisations to have a core data center doing all the CPU-hungry processing (AI/ML training) where different branches of the company are connected to push training data and obtain upgraded model versions.
In the retail space for instance you may find a company that runs multiple stores with different needs. One branch (Edge1 for example) runs a big surface area store, with a big product catalogue aimed at the every-day citizen. Another branch (Edge2), small-sized store, may be located in a prominent area and offers luxury products to their customers. Both require different product catalogues and run at different speeds.
Each Near Edge zone in the diagram above represents the branches (stores) and run independent from each other. They all rely on the main data center to produce iterations of the AI/ML models they run for their stores.
The demo provided in this Solution Pattern can be provisioned with multiple edge zones, as per the diagram above, to showcase zone isolation and scalability (growing number of branches). |
In the sections that follow the focus is centred in the interactions between one of near-edge zones (one single branch) and the core data center.
3.1. Data Acquisition
Ingesting and preparing data help organizations to deploy automated data pipelines in an event-driven architecture. In other spaces where continuous data is produced, such as sensor metrics, temperature, and vibrations can be imported directly into an Apache Kafka topic in AMQ Streams for example.
Data engineers can create pipelines to help automate workflows and business processes.
-
Discrete data: Images, video, documents, are types of content data you can store in object buckets.
-
Multiprotocol: Data can be acquired using different protocols. Databases can connect to Kafka through a Debezium connector for example.
The Data Acquisition stage in the solution pattern is presented in the diagram below:
In a first phase, when a new product needs to be added in the catalogue, its imagery is pushed to the system (Device to Edge1). The data is kept in local object storage (S3) while waiting to complete the image collection.
In a second phase, when the imagery is collected in full, an edge integration application (Camel) packages all the data and sends it over the wire, to the main data centre (Central) where a second integration (Camel) unpacks the images and pushes them to a local S3 bucket.
When the training data upload to S3 completes, a coordination signal is sent via Kafka (Streams) to announce new training data is available, making other systems to react.
Edge/Core connectivity is enabled with Service Interconnect. More details below. |
3.2. Data Preparation & Modelling
The second stage is focussed on two important tasks:
-
Design and Implement the AI/ML model that meets the business requirements.
-
Prepare/Pre-process the raw data to be used as input training data.
There is considerable manual labour involved into the activities above listed. However, once the team of data scientists produce the desired implementation to generate the AI/ML models, it can be injected as part of an automated pipeline, along with other automated steps.
The same is true for the pre-processing tasks to refine the raw training data. More and more organisations seek to implement autonomous refinement and labelling processes which can be added to the chain of automated steps in the pipeline.
The solution pattern implements this stage as a fully automated workflow, initiated by the Kafka signal (Streams) triggered upon the reception of new training data, as illustrated in the diagram below:
The Kafka (Streams) event (signal) announces new training data is available. The Camel integration consumes the event and triggers the automated pipeline.
The pipeline obtains all the training data from the S3 bucket and starts training the new AI/ML model. When the training completes, it uploads the new version to a model repository.
3.3. Application Development and Delivery
The third stage is primarily focussed on including all the DevOps and MLOps processes necessary to deliver the applications and upgrades the near edge needs to continuously evolve.
Here, all the best cloud-native practices apply. To know more, follow the link of the resources listed below:
Smart applications, powered by the AI/ML models, need to be designed and implemented along with their automated delivery mechanisms.
The demo provided in this Solution Pattern already includes all the applications and integration systems. The focus is set on showcasing the automated pipeline responsible for producing new model versions, which are uploaded to the platform’s object storage system (S3), keeping duplicates in a model repository.
The diagram below illustrates the stage:
When the pipeline produces new model versions it pushes a copy into an edge-dedicated S3 bucket. An integration system (Camel) on the near edge environment monitors the bucket and when a new model version is available it pulls it and hot-deploys it in the model server.
Edge/Core connectivity is enabled with Service Interconnect. More details below. |
3.4. Edge ML Inference
The last stage involves customers and users generating live traffic that interacts with the platform and invokes AI/ML inferences against the Model server.
Other systems are also at play in orchestrated workflows to deliver end-to-end business use cases and provide responses to customers.
The demo also includes a monitoring view that provides more in-depth insight into the systems involved in each stage.
4. Additional notes about the Technologies
4.1. Red Hat Service Interconnect
The Solution Pattern connects (Near) Edges to a Core Data Centre with Red Hat Service Interconnect.
Red Hat Service Interconnect simplifies application connectivity across the hybrid cloud. Unlike traditional means of interconnectivity (such as VPNs combined with complex firewall rules), development teams can easily create interconnections without elevated privileges and deliver protected links without compromising the organization’s security or data.
Applications and services across your environments can communicate with each other using Red Hat Service Interconnect as if they were all running in the same site. This connectivity can be maintained even as applications are migrated between environments.
In the Solution Pattern, when Edge systems interact with Core systems, they don’t need to configure remote endpoints. Service Interconnect bridges the connectivity between the two sites and exposes remote Core services as local ones to Edge, as illustrated in the image below.
4.2. Edge Computing
Although not explicitly implemented in the Solution Pattern’s demonstration, the following sections discusses topics very relevant to our Solution Pattern. |
Edge computing shifts computing power away from core data-centers and distributes it closer to users and data sources—often across a large number of locations, providing faster response times, more reliable services, and a better application experience back to users.
4.2.1. What is Red Hat Device Edge?
Red Hat® Device Edge extends operational consistency across edge and hybrid cloud environments, no matter where devices are deployed in the field. Red Hat Device Edge combines enterprise-ready lightweight Kubernetes container orchestrations using MicroShift with Red Hat Enterprise Linux® to support different use cases and workloads on small, resource-constrained devices at the farthest edge.
MicroShift comes as an RPM software package that you can add to the blueprint of your system images when needed. Include your Kubernetes workloads, too, if you want. They will be deployed the next time you roll out updates to your devices. Red Hat Device Edge with MicroShift runs on Intel and Arm systems as small as 2 CPU cores and 2GB RAM.
MicroShift also provides OpenShift’s APIs for security context constraints and routes, but to reduce footprint we’ve removed APIs that are only useful on build clusters or clusters with multi-user interactive access. We’ve also removed Operators responsible for managing the operating system updates and configuration or orchestrating control plane components, as they are not needed in the MicroShift model.
Learn more about Red Hat Device Edge collaborations with Lockheed Martin and ABB. |
4.3. Single Node Apache Kafka Broker
The Red Hat® AMQ streams component is a massively scalable, distributed, and high-performance data streaming platform based on the Apache Kafka project. It offers a distributed backbone that allows microservices and other applications to share data with high throughput and low latency.
The latest AMQ Streams release introduces the new UseKRaft
feature gate. This feature gate provides a way to deploy a Kafka cluster in the KRaft (Kafka Raft metadata) mode without ZooKeeper. This feature gate is currently in an experimental stage, but it can be used for development and testing of AMQ Streams and Apache Kafka.