A recent method developed by MIT researchers could speed up a privacy-preserving artificial intelligence training method by about 81 percent. These advances could enable a wider range of resource-constrained edge devices, such as sensors and smartwatches, to deploy more correct AI models while keeping user data protected.
MIT researchers have improved the effectiveness of a technique called federated learning, which involves a network of connected devices that work together to train a common artificial intelligence model.
In federated learning, the model is transmitted from a central server to wireless devices. Each device trains the model using its local data and then pushes model updates back to the server. Data is protected because it stays on every device.
However, not all devices on the network have enough capacity, computational capabilities, and connectivity to store, train, and transfer the model back and forth with the server in a timely manner. This causes delays that impair training performance.
MIT researchers have developed a technique to overcome memory limitations and communication bottlenecks. Their method was designed to support a heterogeneous network of wireless devices with different constraints.
This recent approach could make AI models more feasible for high-stakes applications with exacting security and privacy standards, such as healthcare and finance.
“This work is about bringing AI to small devices that are not currently possible to run these kinds of powerful models. We carry these devices with us in our daily lives. We need AI to be able to run on these devices, not just on giant servers and GPUs, and this work is an important step towards making that possible,” says Irene Tenison, an electrical engineering and computer science (EECS) graduate student and lead author of the book article about this technique.
Her co-authors are Anna Murphy ’25, a machine learning engineer at Lincoln Laboratory; Charles Beauville, visiting student from Ecole Polytechnique Fédérale de Lausanne (EPFL) in Switzerland and machine learning engineer at Flower Labs; and senior author Lalana Kagal, principal investigator at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). The study results will be presented at the IEEE International Joint Conference on Neural Networks.
Reduction of delay time
Many federated learning approaches assume that all devices on the network have enough memory to train the full AI model, and stable connectivity to quickly push updates back to the server.
However, these assumptions do not hold for networks of heterogeneous devices such as smartwatches, wireless sensors, and mobile phones. These edge devices have narrow memory and processing power and often encounter sporadic network connections.
The central server typically waits to receive model updates from all devices and then averages them to complete the training round. This process is repeated until the training is completed.
“This delay can slow down the learning procedure or even cause it to fail,” Tenison says.
To overcome these limitations, MIT researchers have developed a recent platform called FTTE (Federated Small Training Engine), which reduces the memory and communication overhead required by each mobile device.
Their framework includes three main innovations.
First, instead of broadcasting the entire model to all devices, FTTE instead sends a smaller subset of model parameters, reducing memory requirements for each device. Parameters are internal variables that the model adjusts during training.
FTTE uses a special search procedure to identify parameters that will maximize model accuracy while maintaining a given memory budget. This limit is based on the device with the highest memory constraints.
Second, the server updates the model using an asynchronous approach. Instead of waiting for responses from all devices, the server accumulates incoming updates until it reaches a set capacity and then moves on to a training round.
Third, the server weighs updates from each device based on when it received them. This way, older updates don’t contribute as much to the training process. This obsolete data can hold up the model, slowing down the training process and reducing accuracy.
“We use this semi-asynchronous approach because we want to involve the least powerful devices in the training process so they can contribute their data to the model, but we don’t want the more powerful devices on the network to sit idle for long periods of time and waste resources,” Tenison says.
Achieving acceleration
The researchers tested their framework in simulations involving hundreds of heterogeneous devices and a variety of models and datasets. On average, FTTE enabled training procedures to be completed 81 percent faster than standard federated learning methods.
Their method reduced device memory load by 80 percent and communication load by 69 percent, while achieving accuracy close to other techniques.
“Because we want the model to learn as fast as possible to extend the battery life of these resource-constrained devices, we have to make a trade-off in accuracy. However, in some applications a small loss in accuracy may be acceptable, especially since our method runs much faster,” he says.
FTTE also demonstrated effective scalability and provided greater performance gains for larger groups of devices.
In addition to these simulations, researchers tested FTTE on a miniature network of real devices with various computing capabilities.
“Not everyone has the latest Apple iPhone. For example, in many developing countries, users may have weaker cell phones. With our technique, we can transfer the benefits of associate learning to these settings,” he says.
In the future, the researchers want to explore how their method can be used to raise the personalized performance of AI models on each device, rather than focusing on the average model performance. They also want to conduct larger experiments on real hardware.
This work was supported in part by a Takeda Doctoral Fellowship.
