Our approach to analyzing and mitigating future threats posed by advanced AI models
Google DeepMind consistently pushes the boundaries of artificial intelligence, developing models that have changed our understanding of what is possible. We believe the AI technology on the horizon will provide society with invaluable tools to assist address critical global challenges such as climate change, drug discovery and economic productivity. At the same time, we recognize that as we continue to expand the boundaries of AI capabilities, these breakthroughs may ultimately pose up-to-date threats beyond those posed by today’s models.
Today we present ours Border security framework – a set of protocols to proactively identify future AI capabilities that have the potential to cause significant harm and put in place mechanisms to detect and mitigate them. Our Framework focuses on significant threats arising from powerful model-level capabilities, such as unique agency or advanced cyber capabilities. It aims to complement our alignment research that trains models to operate in line with human values and social purpose, as well as Google’s existing AI responsibility and safety suite practices.
The framework is exploratory in nature and we expect it to evolve significantly as we learn lessons from its implementation, deepen our understanding of AI risks and assessments, and engage with industry, academia and government. Although these threats are beyond the reach of today’s models, we hope that implementing and refining the Framework will assist us prepare to address them. Our goal is to fully implement this initial framework by early 2025.
Framework
The first version of the Framework announced today is based on the results of our research on: assessment critical capabilities in frontier models and follows an emerging approach Responsibly scale opportunities. The framework consists of three key elements:
- Identifying capabilities the model may have that could cause sedate harm. To do this, we explore the paths through which a model can cause sedate harm in high-risk areas, and then determine the minimum level of capabilities a model must have to play a role in causing such harm. We call these “critical capability levels” (CCLs) and they guide our assessment and mitigation approach.
- Periodically evaluate our boundary models to detect when they reach these critical capability levels. To do this, we will develop sets of model assessments, called “early warning assessments,” that will alert us when the model is approaching the CCL, and perform them frequently enough that we will notice it before it reaches that threshold.
- Applying a mitigation plan once the model passes our early warning assessments. This should take into account the overall balance of benefits and risks and the intended contexts of implementation. These remediation measures will primarily focus on security (preventing model exfiltration) and implementation (preventing misuse of critical capabilities).
Risk domains and risk mitigation levels
Our initial set of critical capability levels is based on examining four domains: autonomy, biosecurity, cybersecurity, and machine learning research and development (R&D). Our preliminary research suggests that the capabilities of future baseline models are likely to pose significant risks in these areas.
In terms of autonomy, cybersecurity and biosecurity, our main goal is to assess the extent to which threat actors can employ a model with advanced capabilities to carry out malicious activities with high consequences. For machine learning research and development, the focus is on whether models with these capabilities would enable the proliferation of models with other critical capabilities, or whether they would enable rapid and unmanageable escalation of AI capabilities. As further research is conducted on these and other risk domains, we expect that these CCLs will evolve and several CCLs will be added at higher levels or in other risk domains.
To enable us to tailor the strength of countermeasures to each CCL, we have also provided a set of security and implementation countermeasures. Higher levels of security provide greater protection against model weight exfiltration, and higher levels of implementation security enable tighter management of critical capabilities. However, these measures can also leisurely down the pace of innovation and limit the wide availability of capabilities. Finding the optimal balance between mitigating risk and supporting access and innovation is of paramount importance for the responsible development of AI. By weighing overall benefits against risks and considering the context of model development and deployment, we aim to ensure responsible AI advancements that unlock transformative potential while guarding against unintended consequences.
Investment in science
The research underlying the Framework is emerging and advancing rapidly. We made significant investments in our border security team, which coordinated the cross-functional efforts behind our Framework. Their role is to advance the science of border risk assessment and refine our Framework based on our improved knowledge.
The team developed a suite of assessments to assess the risk of critical capabilities, with a focus on autonomous LLM agents, and field-tested it on our state-of-the-art models. Their recent article in describing these assessments, mechanisms that could create a future “early warning system” are also explored. It describes technical approaches to assessing how close a model is to success in a task it is not currently performing, and includes predictions about future capabilities from a team of experienced forecasters.
Let’s stay true to our AI principles
We will review and develop the Framework periodically. In particular, as we pilot the Framework and deepen our understanding of risk domains, CCLs, and implementation contexts, we will continue our work on calibrating specific CCL mitigation measures.
Google services are at the heart of our work Principles of artificial intelligence, which commit us to pursuing universal benefits while mitigating risks. As we improve our systems and enhance their capabilities, measures such as the Border Security Framework will ensure that our practices continue to meet these commitments.
We look forward to working with peers in industry, academia and governments to develop and refine the Framework. We hope that sharing our approaches will facilitate collaboration with others to agree on standards and best practices for assessing the security of future generations of AI models.