For years, the most powerful artificial intelligence systems have been trained behind closed doors–inside massive data centers owned by a select few technology giants. These facilities concentrate thousands of GPUs in a single location, connected by ultra-fast internal networks that allow models to be trained as one tightly synchronized system.
This setup has long been treated as a technical necessity. However, it is increasingly clear that centralised data centers are not only expensive and risky, but also reaching physical limits. Large language models are growing rapidly, and systems trained just months ago quickly become outdated, pushing each new training cycle toward significantly larger scales. The question is no longer only about the concentration of power, but whether centralised infrastructure can physically scale fast enough to keep up.
Today’s frontier models already consume the full capacity of top-tier data centers. Training a meaningfully larger model increasingly requires building an entirely new facility or fundamentally upgrading an existing one, at a time when co-located data centers are approaching limits on how much energy can be concentrated in a single location. Much of that energy is spent not only on raw silicon but on the cooling systems required to keep it operational. As a result, the ability to train frontier AI models remains concentrated among a handful of companies, primarily in the United States and China.
This concentration has consequences far beyond engineering. Access to AI capabilities is shaped by geopolitics, export controls, energy constraints, and corporate priorities. As AI becomes foundational to economic productivity, scientific research, and national competitiveness, reliance on a small number of centralised hubs turns infrastructure decisions into strategic vulnerabilities.
What if this concentration is not inevitable – but instead a side effect of the algorithms we use to train AI?
The Hidden Limits of Centralised AI Training
Modern AI models are simply too large to be trained on a single machine. Foundation models with billions of parameters require many GPUs working in parallel, synchronizing their work after extremely small increments of progress, often every few seconds, millions of times over the course of training.
The industry’s default solution has been centralised, co-located training: thousands of GPUs placed together in purpose-built data centers, connected by specialized networking hardware capable of transferring data at extreme speeds. These networks allow every processor to constantly synchronize with the others, ensuring that each copy of the model remains perfectly aligned during training.
This approach works well–but only under very specific conditions. It assumes ultra-fast internal networks, physical proximity between machines, reliable energy supply, and centralised operational control. Once training needs to scale beyond a single facility– across cities, countries, or continents – the system begins to break down.
Standard internet connections are orders of magnitude slower than the specialized links inside data centers. As a result, powerful GPUs spend most of their time stalled, waiting for synchronization rather than doing useful work. In practice, this doesn’t make training slower–it makes it infeasible. Estimates suggest that attempting to train modern large language models over standard internet links would stretch training timelines from months into centuries, which is why such setups are rarely even attempted.
Over time, this technical constraint has shaped the entire AI ecosystem. Only organizations with access to massive capital and privileged infrastructure can afford to train large-scale models. Researchers, startups, and institutions outside these hubs are effectively locked out–not due to a lack of expertise, but because the training process itself is designed for centralisation.
When Communication Becomes the Bottleneck
At the heart of traditional AI training lies a simple assumption: machines must communicate after every small step of learning.
After processing a batch of data, each processing unit pauses, exchanges information with all others, and agrees on the next update. This process repeats millions of times during training. Inside a data center, this level of synchronization is manageable. Over long distances, it becomes a bottleneck that stalls progress.
For years, the response was to build faster networks and larger facilities. But a different solution emerged from an unexpected direction–federated learning.
Originally developed to train models across geographically distributed devices, typically orchestrated by a central coordinator, federated learning introduced a powerful idea: machines do not need to communicate constantly. They can work independently for longer periods and synchronize only occasionally.
This insight evolved beyond federated learning into a broader set of techniques often referred to as federated optimization–methods designed to make distributed training practical under real-world constraints. Among these, low-communication approaches stand out by dramatically reducing how often machines must exchange information. By allowing more local computation between synchronization rounds, they make it possible to train models across slower, geographically distributed networks.
DiLoCo and the Proof That Global Training Is Possible
This shift became tangible with the development of DiLoCo, it is short for Distributed Low-Communication training.
Instead of forcing constant synchronization, DiLoCo allows each machine to train locally for extended periods before sharing updates. These updates are then aggregated through a lightweight global coordination step.
The results were striking. Experiments showed that models trained with DiLoCo could match the performance of traditional, tightly synchronized approaches while reducing communication requirements by hundreds of times.
Crucially, this made training viable outside controlled data-center environments. Open-source implementations demonstrated that large language models could be trained over standard internet connections in peer-to-peer setups, without relying on centralised infrastructure.
First developed by researchers at DeepMind, low-communication training methods such as DiLoCo have since been adopted by organizations including Prime Intellect, Nous Research, and Flower Labs to pre-train billion-parameter models. What began as a research idea has become a practical alternative for building advanced AI systems.
At the same time, DiLoCo is not a final solution: it requires each participant to hold a full copy of the model, which sets practical limits on who can participate. As a result, recent systems often combine low-communication techniques with other approaches, including splitting models across nodes–a sign that research in distributed and decentralised training continues to evolve.
What This Changes for the AI Industry
The implications of this shift extend far beyond efficiency gains.
If large models can be trained across the internet, AI development no longer needs to be confined to a handful of centralised facilities. Compute can be contributed from many locations, by different participants, under diverse conditions.
This opens the door to:
- broader collaboration across borders and institutions;
- reduced dependence on a small number of infrastructure providers;
- greater resilience to supply-chain disruptions and geopolitical constraints;
- and wider participation in building foundational AI technologies.
In this model, the balance of power in AI shifts away from those who control the largest data centers toward those who can coordinate computation most effectively.
Building Open and Verifiable AI Infrastructure
As training becomes more distributed, new challenges emerge–particularly around trust and verification. In open networks, participants must be able to verify that computational contributions are legitimate and that models are not being manipulated.
This has driven growing interest in robust aggregation techniques and cryptographic methods that can validate work without relying on centralised authority.
Several emerging infrastructure projects are exploring how these ideas can be implemented in practice. One example is Gonka, a decentralised network designed around AI inference, training and verification. Rather than depending on centralised data centers, Gonka coordinates compute across independent participants and uses algorithmic validation to ensure that contributions are real and trustworthy.
The network reflects the same shift seen in low-communication training: reduced reliance on high-speed private infrastructure, and greater emphasis on efficiency, openness, and resilience. In this context, decentralisation is not an ideological goal, but an engineering outcome–enabled by algorithms that no longer require constant synchronization.
A Different Path Forward
The history of AI training has been shaped by the limits of communication. For years, progress depended on placing machines ever closer together, inside increasingly complex data centers.
Recent research suggests that this path is not inevitable. By changing how machines coordinate — by communicating less, not more — it becomes possible to train powerful models across the global internet.
As algorithms evolve, the future of AI may depend less on where compute is located, and more on how intelligently it is connected. That shift has the potential to make AI development more open, more resilient, and less dependent on centralised control.

Egor Shulgin is the Co-creator of Gonka Protocol. He is currently a PhD candidate at KAUST and previously worked with Apple and Samsung AI.
Leave a comment