Katarina Vuknić / October 9, 2025
A promising approach for distributed learning is federated learning (FL), introduced in 2016 [1]. In this paradigm, a single server and multiple clients exchange model updates instead of raw data to collaboratively learn a global model that generalizes well across all clients. This approach enhances privacy and reduces bandwidth usage, typically achieved through a significant reduction in transmitted data volume, local training with fewer uploads, and structured updates. However, communication cost in FL remains a major challenge, largely influenced by model size, client participation rate, and the number of training rounds. Additionally, relying on a single aggregator node (i.e., cloud server) can create both a bottleneck and a single point of failure, limiting the scalability and robustness of the FL system. To address these issues, an extension known as hierarchical federated learning (HFL) has been proposed.
Concept and Working Principles
In addition to traditional flat FL, in HFL, an intermediate layer of edge servers is introduced between the central server and the clients. Edge servers aggregate updates from their local clients and also act as clients to the central cloud server, enabling a more scalable, resilient, and more communication-efficient learning architecture.

The learning process is explained in the following steps:
- Model Initialization and distribution: The global model is initialized at the cloud server and distributed to all edge server nodes. Edge server nodes propagate the received model to their respective clients.
- Training locally on the clients: Each client trains the received model on its local data for a predefined number of iterations or epochs.
- Model aggregation on the edge servers: Clients send modified model parameters to the corresponding edge server. The edge server aggregates all received local model updates using a chosen strategy (e.g., FedAvg) and shares the new model with all corresponding clients. Steps 2 and 3 are repeated for a predefined number of iterations or local rounds.
- Model aggregation on the cloud server: Edge servers send updated models to the cloud server. The cloud server aggregates all received local model updates using a chosen strategy (e.g., FedAvg) and shares the new model with all edge servers. Steps 1–4 are repeated for a predefined number of iterations or global rounds, until the target model accuracy is reached or the model has converged. (See the video of the learning process at [2].)
The roles of worker nodes (clients) and aggregators (edge servers) can be assigned based on various criteria, such as available resources, network proximity, data or model similarity, performance over time, or can be statically configured according to the developer's deployment plan. In resource-based assignment, nodes with greater computational power, memory, and network bandwidth are selected as aggregators, while less capable nodes serve as worker nodes. When relying on network proximity, aggregators are nodes with low-latency connections to other nodes. Another approach groups clients based on data or model similarity, then selects one client per group to act as an aggregator.
More advanced systems use dynamic role assignment, where nodes adaptively switch roles based on real-time performance, availability, and other factors. Alternatively, roles may be statically defined according to the deployment plan or configuration, especially in simulations. The design of assignment logic depends on system utilization, architecture, and constraints.
Handling Dynamics in Data and Network Conditions
Implementations of HFL in domains that operate under highly dynamic conditions must include mechanisms for continuous tracking of network states and node availability, as well as mechanisms for monitoring data distribution shifts and resource drifts over time. To ensure robust and efficient performance under such variability, the HFL pipeline must support runtime reconfiguration, allowing the system to adapt dynamically to changing environments. A promising approach is to integrate an orchestrator, a dedicated control component responsible for monitoring system metrics, making informed decisions, and coordinating reconfiguration actions across the hierarchy. This mechanism enables adaptive role assignments, load balancing, and communication optimization.
The orchestration objective should be tailored to the specific deployment scenario. For example, it might aim to maximize model performance within a predefined communication cost budget, or minimize communication cost while satisfying a target performance metric (e.g., accuracy or loss) [3, 4].
Challenges and Strengths of HFL Compared to Traditional FL
The convergence rate in HFL is typically slightly slower than in centralized FL due to several additional complexities. First, the multi-level aggregation structure introduces latency and can amplify gradient variance. Second, data heterogeneity, both within client groups and across regions, leads to model divergence and slower alignment. Third, stale or delayed updates caused by asynchronous communication further affect training stability. Finally, the dynamic nature of real-world deployments, such as - reassigning clients to different edge servers, role changes between worker and aggregator nodes, fluctuating bandwidth, and device heterogeneity - requires frequent reconfiguration, which can interrupt or delay convergence. However, several algorithms address these issues by incorporating variance reduction techniques, model regularization, adaptive aggregation, or orchestrator-based coordination to improve convergence.
By introducing an intermediate layer of edge servers, HFL significantly reduces uplink communication costs compared to traditional flat FL. Instead of every client communicating with the cloud, local updates are first aggregated at nearby edge servers, lowering bandwidth usage and enabling parallel processing. However, communication efficiency in HFL is influenced by several factors, including the number and placement of edge servers, client-to-aggregator ratios, and synchronization protocols. Techniques such as update compression and orchestrator-based coordination help maintain communication efficiency, even under dynamic network conditions or shifting client participation.
Applying HFL in Practice
HFL can be particularly useful for applications deployed over distributed infrastructure comprising sensors and edge computing units. In such systems, sensor readings may be used locally for model training at the units that collected them or offloaded to nearby edge nodes with greater computational capacity. For example, in a smart farming system deployed across geographically distributed farms, local devices at each farm act as worker nodes, collecting data such as soil moisture, temperature, and crop health. Regional aggregator nodes, strategically placed based on network proximity or resource availability, aggregate updates from these local workers, capturing regional patterns. During global aggregation, the cloud server synthesizes the regional models into a unified global model, enabling system-wide learning while preserving data locality and reducing communication overhead. This hierarchical structure allows efficient scaling and adaptation to diverse environmental conditions across different regions. Another example is a smart traffic light control system, where HFL can be used to learn optimal traffic signal timings adapted to current traffic conditions. [5]
References
[2] Hierarchical federated learning, https://www.youtube.com/watch?v=-1nGPu_Jh2M