Artificial Intelligence of Things: Bringing AI into our physical environments

Ivana Podnar Žarko / November 5, 2025

The true potential of the Internet of Things (IoT) lies in its convergence with artificial intelligence (AI), giving rise to the concept of Artificial Intelligence of Things (AIoT) which integrates AI models and concepts into the Cloud-Edge-IoT (CEI) environments (the computing continuum). AIoT is essentially bringing AI into our physical environment and removing the boundaries between the physical and digital worlds. These environments can learn, reason and make autonomous decisions based on the data continuously generated by many IoT devices sensing the physical environment. Consequently, AIoT allows devices to operate autonomously at the edge while being aware of the surrounding environment and opens new possibilities for decentralized intelligence close to data sources placed in the physical environment. 

Case Study: Real-Time Traffic Monitoring 

To understand the core concept of AIoT, consider a real-time traffic monitoring solution implemented with cameras on urban roads. This solution requires adequate ML models -- Convolutional Neural Networks (CNNs), a variant of deep neural networks for image classification -- to identify vehicles in video frames and output details like the number and type of vehicles. The open question is: Where should we host these ML models? A model can be hosted on a powerful cloud server, but also on a GPU-based device located at the edge, near or even collocated with the camera, as shown in Figure 1. The answer determines whether we transmit a raw video stream from the camera or any other IoT device to a nearby edge device or all the way to the cloud for processing. 

A black screen with red text

AI-generated content may be incorrect.

Figure 1. AIoT solutions for real-time traffic monitoring: a) cloud-based and b) edge-based 

 

Cloud-Based vs. Edge-Driven AIoT Solutions 

Each approach comes with distinct advantages and disadvantages that influence a deployment strategy. First, the two ML models will inevitably differ in terms of inference accuracy, processing latency and respose time since an edge device has limited resources compared to the cloud. The model placed in the cloud will have higher inference accuracy compared to the one deployed on an edge device, while the cloud introduces higher latency due to increased network distance compared to a nearby edge device. Second, we need to compare the generated network traffic and energy footprint of the two solutions since data transmission of camera streams to the cloud generates high network traffic compared to an edge solution which sends only the output of the ML model to the cloud. The energy footprint of the two solutions will also showcase different footprints. Finally, if we want to deploy a privacy-preserving solution, the video stream must be processed by an edge device to remove private information to create a GDPR-compliant solution. 

Thus, where should these ML models be hosted for a given AIoT solution? The question shifts from "Cloud or Edge?" to "Why not both?" This mixed approach with multiple ML models deployed in the computing continuum provides the flexibility needed to deploy large-scale AIoT solutions, but also significantly increases their complexity. 

The Complexity of AIoT Deployments in the Computing Continuum 

Deploying scalable AIoT solutions across the heterogeneous and dynamic computing continuum raises fundamental technical challenges that require a novel, AI- and CEI-aware service orchestration infrastructure. The orchestrator must constantly adapt to events and infrastructure changes as well as monitor ML performance at runtime to enable the needed reconfiguration. Key questions in managing AIoT deployments in the CEI environment include: 

  • Optimal Service Placement: Where should ML inference models (service instances) be deployed? The goal is to provision enough instances to process client requests (e.g., video frames) while meeting their Quality of Service (QoS) requirements. Improper deployment of service instances can cause losses for some clients, while over-provisioning wastes valuable resources. 

  • Dynamic Request Routing: How are video frames from a camera routed to the optimal service (cloud or edge) at runtime? This decision must simultaneously consider multiple factors: the service's current workload, model accuracy, inference latency, and energy consumption, all while adhering to QoS objectives.. 

  • Fault Tolerance: What happens in the event of possible node failures and dynamic infrastructure changes? 

  • Model Updates: How do we efficiently update ML services on many diverse edge devices with a new version of an ML model? ML models need to be tailored and configured to use the available hardware on edge devices. 

  • Continual Learning: How should we deploy and organize federated learning pipelines across the entire infrastructure (cameras, edge devices, and the cloud) for continual learning tasks? This is essential for improving deployed model accuracy in line with specific operational contexts (e.g., angle of sight, surrounding environment). 

Introducing the AIoTwin Orchestration Middleware 

To address the orchestration challenges of dynamic AIoT environments, we introduce the AIoTwin orchestration middleware. It is designed for adaptive orchestration in CEI environments (Figure 2), reacting dynamically to infrastructure changes, like edge node failures, and shifts in ML performance within streaming and real-time IoT contexts. The AIoTwin middleware tackles three specific service orchestration problems for AIoT: 

  • Adaptive ML Pipelines: Managing the hierarchical federal learning and organization of adaptive ML pipelines. 

  • QoS-Driven Inference: Ensuring inference meets predefined Quality of Service (QoS) requirements. 

  • Online Integration: Combining online federated learning and inference tasks. 

The open-source middleware is published in the GitHub repository of the AIoTwin project (https://github.com/AIoTwin). Its primary users are ML engineers managing Federated Learning (FL) experiments and DevOps engineers/integrators setting up and integrating ML-driven technical solutions for specific AIoT use cases. 

Figure 2. AIoTwin orchestration middleware for CEI environments 

 

The AIoTwin orchestration middleware includes the following components: 

  1. Framework for Adaptive Orchestration of Federated Learning Pipelines provides dynamic and intelligent mechanisms for deploying, monitoring and reconfiguring FL pipelines at runtime. It uses both predictive and reactive strategies to adapt the pipeline in response to changing environmental conditions and unexpected events in the continuum. A key feature is support for Hierarchical Federated Learning (HFL) pipelines, which introduce multiple local aggregators between clients and a global aggregator to reduce training latency and communication costs while balancing the computational load through local aggregation happening closer to the edge.  
  2. Extension of the Flower Framework for Hierarchical Federated Learning offers an original and generic implementation of HFL services (client, local aggregator, global aggregator) based on the popular Flower framework which is applied by the framework specified under 1. HFL services rely on the task module specified on each client and the global aggregator. The task module defines the local/global model architecture in PyTorch as well as functions for retrieving and setting model weights. The task module also includes functions for training and evaluating the model and for loading a local client dataset or creating a distinct partition of a shared dataset, which is then split into training and test sets.  
  3. QEdgeProxy: QoS-aware Load Balancer for the Computing Continuum. QEdgeProxy serves as an intelligent intermediary between IoT clients and inference services distributed across the continuum. Unlike traditional edge-aware proxies that aim to optimize QoS at all costs, QEdgeProxy focuses on meeting a predefined QoS requirement (e.g., a latency threshold) while actively balancing the load across service instances to prevent node overload. This guarantees stable performance in resource-constrained, dynamic environments—for example, ensuring that 95% of client requests (for each client) have a latency below a specified threshold (e.g., 80 ms). 

 

Stay tuned for more details about each component in our future blogposts!