Join our daily and weekly newsletters to receive the latest updates and exclusive content on our industry-leading AI coverage. Find out more
Year by year, cyberattacks are becoming more regular and data breaches are becoming more high-priced. Whether companies are trying to protect their AI system during development or using the algorithm to improve their security posture, they must mitigate cybersecurity threats. Associate learning can provide both.
What is federated learning?
Federated learning is an approach to artificial intelligence development in which multiple parties separately train a single model. Each downloads the current base algorithm from a central cloud server. They independently train their configuration on local servers, uploading it when finished. This way, they can share data remotely without revealing raw data or model parameters.
The centralized algorithm weighs the number of samples obtained from each differently trained configuration, aggregating them to create a single global model. All information remains on each participant’s local servers or devices – a centralized repository weighs updates rather than processing raw data.
Federated learning is growing rapidly in popularity because it solves common security problems associated with development. It is also highly sought after for its performance benefits. Research shows that this technique can improve the image classification model accuracy by up to 20% — significant enhance.
Horizontal federated learning
There are two types of federated learning. The conventional option is horizontal federated learning. In this approach, data is divided among different devices. The datasets share common feature spaces but have different samples. This allows edge nodes to train a machine learning (ML) model together without sharing information.
Vertical federated learning
In vertical federated learning, the opposite is true – the features are different, but the samples are the same. Functions are distributed vertically among participants, each of which has different attributes about the same set of entities. Since only one party has access to the full set of sample labels, this approach preserves privacy.
How federated learning strengthens cybersecurity
Established development is prone to security vulnerabilities. While algorithms must have extensive, relevant datasets to remain right, the involvement of multiple departments or vendors creates a vulnerability for threat actors. They can exploit the lack of visibility and wide attack surface to introduce bias, perform rapid engineering, or extract sensitive training data.
When algorithms are deployed in cybersecurity roles, their performance can impact an organization’s security posture. Research shows that model accuracy can suddenly drop when processing novel data. While AI systems may appear right, they can fail when tested elsewhere because they have learned to utilize false shortcuts to produce convincing results.
Because AI cannot think critically or truly take into account context, its accuracy decreases over time. Even though machine learning models evolve as they absorb novel information, their performance will stagnate if their decision-making skills rely on shortcuts. This is where federated learning comes into play.
Other notable benefits of training a centralized model with various updates include privacy and security. Because each participant works independently, no one has to share proprietary or sensitive information to continue training. Moreover, the fewer data transfers, the lower the risk of a man-in-the-middle (MITM) attack.
All updates are encrypted for secure aggregation. Multi-party computation hides it behind different encryption schemes, reducing the risk of a MITM breach or attack. Doing so improves collaboration while minimizing risk, which ultimately improves security.
One of the overlooked benefits of federated learning is speed. It has significantly lower latency than its centralized counterpart. Because training takes place locally rather than on a central server, the algorithm can detect, classify and respond to threats much faster. Minimal latency and brisk data transmission enable cybersecurity professionals to deal with bad actors with ease.
Notes for cybersecurity professionals
Before using this training technique, AI engineers and cybersecurity teams should consider several technical, security, and operational factors.
Resource utilization
Developing artificial intelligence is high-priced. Teams building their own model should expect to spend anywhere From $5 to $200 million upfront and over $5 million a year in maintenance. The financial commitment is significant even with costs spread over many parties. Business leaders should account for the costs of cloud and edge computing.
Federated learning is also computationally intensive, which may introduce bandwidth, disk space, or compute constraints. While the cloud enables on-demand scalability, cybersecurity teams risk vendor lock-in if they’re not careful. Strategic selection of equipment and suppliers is of paramount importance.
Participant’s trust
While dissimilar training is sheltered, it lacks transparency, making intentional bias and malicious injection a problem. A consensus mechanism is necessary to approve model updates before they are aggregated by a centralized algorithm. This way, they can minimize the risk of threats without sacrificing confidentiality or revealing sensitive information.
Data security training
While this machine learning technique can improve a company’s security posture, there is no such thing as 100% security. Developing a model in the cloud carries the risk of internal threats, human errors and data loss. Redundancy is key. Teams should create backups to prevent disruptions and roll back updates when necessary.
Policymakers should re-examine the sources of their training datasets. There is regular borrowing of datasets in ML communities, which raises legitimate concerns about model misfit. On code documents, more than 50% of task communities utilize borrowed datasets at least 57.8% of the time. Moreover, 50% of the datasets come from just 12 universities.
Applications of federated learning in cybersecurity
Once the main algorithm has aggregated and considered participant updates, it can be made available again to any application it has been trained for. Cybersecurity teams can utilize it to detect threats. The benefit here is two-fold – while threat actors are left guessing because they can’t easily extract data, professionals gather insights to produce highly right results.
Federated learning is ideal for related applications such as threat classification or compromise detection rate. The AI’s immense dataset and extensive training build its knowledge base, providing extensive expertise. Cybersecurity professionals can utilize this model as a unified defense mechanism to protect broad attack surfaces.
Machine learning models – especially those that make predictions – tend to drift over time as concepts evolve or variables become less crucial. With federated learning, teams can periodically update their model with different features or data samples, providing more right and timely insights.
Leveraging federated learning for cybersecurity
Whether companies want to secure their training data set or utilize artificial intelligence to detect threats, they should consider using federated learning. This technique can improve accuracy and efficiency and strengthen security posture if it strategically allows you to bypass potential insider threats or risk of breaches.
Zac Amos is the company’s features editor ReHack.
Data decision makers
Welcome to the VentureBeat community!
DataDecisionMakers is a place where experts, including data scientists, can share data-related insights and innovations.
If you want to read about inventive ideas and current information, best practices and the future of data and data technologies, join us at DataDecisionMakers.
You might even consider writing your own article!
Read more from DataDecisionMakers