Introducing a contextual framework for comprehensive assessment of social and ethical risks associated with artificial intelligence systems
Generative AI systems are already used to write books, create graphic designs, help doctorsand they become more and more capable. Ensuring the responsible development and implementation of these systems requires careful assessment of the potential ethical and social risks they may pose.
In our new paperwe propose a three-layered framework for assessing the social and ethical risks associated with artificial intelligence systems. This framework includes assessments of AI system capabilities, human interactions, and systemic impacts.
We also map the current state of security assessments and find three main gaps: context, specific risk, and multimodality. To support fill these gaps, we call for repurposing existing evaluation methods for generative AI and implementing a comprehensive evaluation approach, as in our disinformation case study. This approach combines findings about, for example, the likelihood that an AI system will provide factually incorrect information, with insights into how people apply the system and in what context. Multi-layered assessments can draw conclusions beyond the capabilities of the model and indicate whether harm – in this case, disinformation – is actually occurring and spreading.
For any technology to work as intended, social and technical challenges must be solved. So, to better assess the security of an AI system, these different layers of context need to be taken into account. We rely here on previous identifying research potential risks associated with large-scale language modelssuch as personal data leaks, work automation, disinformation and more – and introduce a way to comprehensively assess these threats in the future.
Context is key to assessing AI risk
The capabilities of AI systems are an critical indicator of the types of broader threats that may arise. For example, artificial intelligence systems that are more likely to produce factually incorrect or misleading results may be more susceptible to creating misinformation risks, causing problems such as lack of public trust.
Measuring these capabilities is critical to AI security assessments, but these assessments alone cannot guarantee that AI systems are secure. Whether harm is revealed further down the supply chain – for example, whether people are held in false beliefs based on incorrect model results – depends on context. Specifically, who uses the AI system and for what purpose? Does the AI system work as intended? Does this cause unexpected externalities? All of these questions form the basis for an overall security assessment of an AI system.
Expanding beyond capacity evaluation, we propose an evaluation that can assess two additional points where downstream risks manifest themselves: human interaction at the point of apply and systemic impact, as the AI system is embedded in broader systems and widely used. Integrating assessments of a given risk of harm at all levels provides a comprehensive assessment of the safety of the AI system.
Human interaction the evaluation focuses on the experiences of people using the AI system. How do people apply the AI system? Does the system perform as expected at the time of apply and what are the differences in experience between demographic groups and users? Could we see unexpected side effects from using this technology or being exposed to its products?
Systemic impact the evaluation focuses on the broader structures in which the AI system is embedded, such as social institutions, labor markets and the natural environment. An assessment at this level can shed lightweight on risks of harm that will only become apparent after large-scale adoption of an AI system.
Safety assessment is a shared responsibility
AI developers must ensure that their technologies are developed and released responsibly. Public entities such as governments are tasked with maintaining public safety. As generative AI systems become more widely used and deployed, ensuring their security is a shared responsibility among many entities:
- Creators of artificial intelligence are well prepared to explore the capabilities of the systems they produce.
- Application developers and designated public authorities are able to assess the functionality of different functions and applications and possible externalities for different user groups.
- Wider public stakeholders are uniquely positioned to forecast and assess the social, economic and environmental impacts of up-to-date technologies such as generative artificial intelligence.
The three layers of evaluation in our proposed framework are a matter of degree, not neatly divided. While none of them is solely the responsibility of a single entity, primary responsibility depends on who is best equipped to conduct assessments at each level.
Gaps in current security assessments of generative multimodal artificial intelligence
Given the importance of this additional context for assessing the security of AI systems, it is critical to understand the availability of such tests. To better understand the broader landscape, we have undertaken extensive efforts to collect assessments applied to generative AI systems in the most comprehensive way possible.
By mapping the current state of generative AI security assessments, we found three main security assessment gaps:
- Context: Most security assessments consider the capabilities of a generative AI system separately. Relatively little work has been done on assessing potential risk at the point of human interaction or systemic impact.
- Risk-specific assessments: Assessments of the generative capabilities of AI systems are circumscribed in the areas of risk they cover. For many risk areas, there are few assessments. Where they exist, evaluations often operationalize harms in narrow ways. For example, representational harm is typically defined as stereotypical associations of occupations with different genders, leaving other instances of harm and areas of risk undetected.
- Multimodality: The huge majority of existing safety assessments of generative AI systems focus solely on text-based results – there are still enormous gaps in assessing the risk of harm across images, audio or video. This gap only widens with the introduction of multiple modalities in a single model, such as artificial intelligence systems that can take images as input or generate output that interleaves audio, text, and video. Although some text-based assessments can be applied to other modalities, up-to-date modalities introduce up-to-date ways in which risk manifests. For example, the description of an animal is not harmful, but if the description is applied to the image of a person, it is.
We are compiling a list of links to publications available in detail at assessing the security of generative artificial intelligence systems this repository. If you would like to contribute, please add your ratings by completing this form.
Putting more comprehensive assessments into practice
Generational AI systems are driving a wave of up-to-date applications and innovations. To ensure that the potential risks associated with these systems are understood and minimized, we urgently need demanding and comprehensive security assessments of AI systems that take into account how these systems may be used and embedded in society.
A practical first step is to repurpose existing evaluations and apply enormous models alone for evaluation – although this has critical limitations. To make a more comprehensive assessment, we also need to develop approaches to assess AI systems at the point of human interaction and their systemic effects. For example, while spreading disinformation through generative AI is an emerging problem, we show that there are many existing methods for assessing public trust and credibility that can be used in other ways.
Ensuring the security of widely deployed generative AI systems is a shared responsibility and priority. AI developers, public entities and other parties must collaborate and jointly build a flourishing and stalwart ecosystem for assessing secure AI systems.