We study the boundaries of Aga, prioritizing readiness, a proactive risk assessment and cooperation with a wider AI community.
Artificial general intelligence (agi), AI, which is at least as talented as people in most cognitive tasks, can be here in the coming years.
Integrated with agency capabilities, Agi could make AI to understand, reason, planning and performing autonomous activities. Such technological progress will provide the public for invaluable tools for meeting critical global challenges, including discovering drugs, economic growth and climate change.
This means that we can expect material benefits to billions of people. For example, enabling faster, more true medical diagnoses, it can revolutionize healthcare. By offering personalized educational experiences, it can make education more accessible and engaging. By improving information processing, Aga can assist reduce the barriers for innovation and creativity. By democratizing access to advanced tools and knowledge, this may allow a diminutive organization to solve convoluted challenges that were previously possible to solve huge, well -financed institutions.
Running a path to Aga
We are optimists about the potential of Aga. It has the power to transform our world, acting as a progress catalyst in many areas of life. But for each technology so powerful it is necessary that even a diminutive possibility of damage should be taken seriously and prevented.
AGA SAFETY REMOVAL REQUIREMENTIVE Planning, Preparation and cooperation. Earlier we introduced our approach to Aga in “Aga levels” framework Paper, which is a prospect of classifying the possibilities of advanced AI systems, understanding and comparing their performance, assessment of potential threats and assessing progress in a more general and talented AI.
Today we share our views on Aga’s security and safety when we are moving the path of this transformation technology. This new article, entitled, approach to technical safety and Agi protectionIt is a starting point for significant conversations with a wider industry about how we monitor Aga’s progress and we assure you that it is safe and sound and responsible.
In the article, we describe in detail how we adopt a systematic and comprehensive approach to Aga security, examining four main risk areas: improper utilize, non -profit, accidents and structural risk, with deeper focus on improper utilize and non -sociality.
Understanding and solving the potential of improper utilize
Incorrect utilize occurs when a man intentionally uses the AI system for harmful purposes.
The improved insight into contemporary damage and mitigation of soothing continues to raise our understanding of long -term stern damage and how to prevent them.
For example, Incorrect use of contemporary generative artificial intelligence It includes the creation of harmful content or disseminating faulty information. In the future, advanced AI systems may have the ability to influence public beliefs and public behavior in a way that can lead to unintentional social consequences.
The potential severity of such damage requires proactive safety and safety measures.
As we describe in detail paperThe key element of our strategy is identification and limitation of access to hazardous possibilities that can be incorrectly used, including those enabling cyber attacks.
We are investigating a series of soothing to prevent improper utilize of advanced AI. This includes sophisticated security mechanisms that could prevent malicious actors to gain raw access to the model that allow them to bypass our safety handrails; relief that limit the potential of improper utilize after the implementation of the model; and threat modeling research that helps identify the thresholds of abilities in which increased security is necessary. In addition, our recently launched cyber security ratings take this step further to assist reduce AI threats.
Even today, we evaluate our most advanced models, such as Gemini, in terms of potential Dangerous possibilities before releasing them. Our border safety framework delve into the assessment of possibilities and we utilize soothing, including in the case of threats to cyber security and biological safety.
Challenge of non -social
To really supplement human abilities, it must be adapted to human values. Non -sociality occurs when the AI system pursues a goal that differs from human intentions.
Earlier, we showed how non -sociality can arise with our examples of specifications, in which artificial intelligence finds a solution to achieve its goals, but not in a man’s intentional way, and incorrect generalization.
For example, the AI system has asked to book tickets for a movie, he can decide to break into the ticket system in order to obtain already taken places – something that a person asking for buying places may not consider.
We also conduct extensive risk tests Misleading equalizationi.e. the risk that the AI system realizes that its goals are not in line with human instructions and intentionally tries to bypass security measures introduced by people to prevent him from taking improper action.
Counteracting non -social
Our goal is to have advanced AI systems that are trained to achieve the right goals to follow human instructions thoroughly, preventing AI from using potentially unethical shortcuts to achieve their goals.
We do it by strengthened supervisioni.e. it is possible to determine whether AI’s answers are good or bad in achieving this goal. Although it is now relatively straightforward, it can become tough when AI has advanced capabilities.
As an example, even experts did not realize how good movement 37, the movement, which had 1 in 10,000 chances for utilize, took place when Alphago played for the first time.
To solve this challenge, we stop AI systems ourselves to assist us provide feedback about their answers, for example in debate.
When we can say if the answer is good, we can utilize it to build a safe and sound and even AI system. The challenge is to determine what problems or cases to train the AI system. By working on solid training, respect of uncertainty and more, we can discuss a number of situations in which the AI system will encounter in real scenarios, creating artificial intelligence.
Thanks to effective monitoring and determined computer security measures, we try to alleviate the damage that can occur if our AI systems have sought for the wrong purposes.
Monitoring consists in using the AI system, called a monitor, to detect activities that are not in line with our goals. It is essential that the monitor knows when he did not know if the operation is safe and sound. If it is not certain, he should either reject the action or mark the action for further review.
Enabling transparency
All this becomes easier if making AI decisions becomes more clear. We conduct extensive interpretation research to raise this transparency.
To make it more easily, we design AI systems that are easier to understand.
For example, our research on Optimization of myopia with German approval (Mona) It aims to ensure that all long -term planning made by AI systems remains understandable to people. This is especially essential as technology improves. Our work on Mona was the first to show safety benefits from brief -term optimization in LLM.
Building an ecosystem for Agi’s readiness
Leaded by Shane Legg, co -founder and chief scientist Aga in Google Deepmind, our Agi Safety Council (ASC) analyzes the risk of Aga and the best practices, issuing recommendations on security measures. ASC works closely with the Council of Responsibility and Safety, our internal group of reviewers of our operating director Lil Ibrahim and senior responsibility director Helen King, in order to assess research, projects and cooperation between Aga, against ours AI rulesAdvice and partnership with research and product teams in the field of our work with the highest impact.
Our work on the safety of Aga complements our depth and width of responsibility and security practices, as well as research on a wide range of problems, including harmful content, prejudices and transparency. We still utilize our teachings from security in aggressive, such as the principle of having a man in a loop to check consistent activities to inform our approach to responsible building of Aga.
Externally, we are working on supporting cooperation with experts, industry, governments, non -profit organizations and civil society organizations, and we undertake an informed approach to Aga’s development.
For example, we cooperate with non -profile organizations AI Safety Research, including Apollo and Redwood Research, which they advised in the dedicated section of non -sociality in the latest version of our Frontier security framework.
Thanks to the constant dialogue with interested parties around the world, we hope to contribute to the international consensus on critical problems related to security and safety at the border, including the best to predict and prepare for a up-to-date risk.
Our efforts include cooperation with others in the industry – through organizations such as Border model forum – to share and develop the best practices, as well as valuable cooperation with AI Institutes in security tests. Ultimately, we believe that the coordinated international approach to management is of key importance to ensuring society to utilize advanced AI systems.
The education of AI researchers and AGA safety experts is of fundamental importance for creating a sturdy basis for its development. As such, we launched New course In the field of Aga safety for students, researchers and specialists interested in this topic.
Ultimately, our approach to safety and protection of AGA serves as an essential road map to solve many challenges that remain open. We are looking forward to cooperation with a wider AI research community for responsible Aga development and helping to unlock the huge benefits of this technology for everyone.