ITIF Logo
ITIF Search
The United States Should Seize Global AI Stage in California to Shift Gears to Post-Deployment Safety

The United States Should Seize Global AI Stage in California to Shift Gears to Post-Deployment Safety

October 2, 2024

Governor Gavin Newsom’s veto of California’s AI safety bill (SB 1047) was a sensible move, but a deeper issue remains unresolved. A fundamental flaw of the bill lies in its approach to AI safety: it attempts to identify risky systems based solely on assessments made before deployment. Even with the most rigorous pre-deployment evaluations, fully anticipating the harm these AI models might cause is impossible because their impact greatly depends on how they’re applied in real-world settings.

Legislators might be tempted to explore and adopt new metrics to better predict harmful models, but this will not be enough to effectively shape regulations that target AI’s most pressing and evolving risks. California may not be the state to impose sweeping AI regulations, but it could be the birthplace of a more effective approach. The upcoming meeting of the International Network of AI Safety Institutes (AISIs) in San Francisco offers a prime opportunity to shift focus toward post-deployment evaluations, providing policymakers with the insights they need to manage risks while fostering innovation.

Post-deployment evaluations periodically assess systems that are already in use and focus on evaluating how they perform in context. For instance, the U.S. National Institute for Standards and Technology (NIST) has routinely conducted evaluations for the capabilities of commercially available facial recognition algorithms since 2000. Most recently, NIST evaluated how well facial recognition systems on the market identify people wearing masks, in light of societal changes since the COVID-19 pandemic. The agency found that some algorithms that were good at identifying unmasked faces did not adapt well to identifying masked faces, demonstrating that systems that can perform well in controlled, pre-deployment tests can face unexpected difficulties in real-world scenarios.

Unfortunately, AISIs are not working on expanding these types of evaluations to assess how well foundation models perform once they are deployed in high-risk areas, even though doing so would better capture the nuances of real-world AI usage and risks. For instance, AISIs could develop vendor tests to evaluate how well large language models (LLMs) summarize complex health information into plain language for public consumption—a project NIST has already developed to evaluate noncommercial systems. In fact, NIST has been developing AI measurement and evaluation projects for many functions that foundation models perform for many years, like retrieving information, processing natural language, and processing speech, it just hasn’t developed these into post-deployment evaluations like it has for facial recognition.

Post-deployment evaluations perform an important function by providing assurances that AI models are functioning safely and effectively in real-world conditions—an essential aspect of fostering AI accountability. AISIs are well-positioned to develop post-deployment evaluations that assess and communicate how AI systems perform in practice, identifying any limitations or risks that emerge once deployed. These evaluations provide the technical insights necessary for policymakers to decide the appropriate and responsible ways to use AI.

While some lament that AISIs do not have enforcement powers, their primary role is to advance the science of AI safety; articulate and disseminate practices of AI safety; and support institutions, communities, and coordination around AI safety. For example, NIST doesn’t dictate where or when facial recognition technologies should be used; instead, it focuses on assessing how well these systems perform under various conditions. The decision of where and how these technologies are deployed should be left to regulatory bodies and policymakers.

To make post-deployment evaluations effective for foundation models and thereby support more targeted regulation, AISIs will need to go beyond simply applying the types of existing evaluation projects NIST has for AI systems.

First, AISIs should establish a global framework for tracking AI incidents—monitoring when and how AI systems fail in real-world conditions. Unlike facial recognition technology, where use cases are more defined, foundation models can be applied in a wide range of unpredictable areas—from summarizing health information to aiding customer service. Even if some of the ways foundation models are used are known, it’s not always clear which applications pose the highest risk. For instance, a model might be able to summarize COVID health information, which could be used to misinform people, and summarize scientific data, which could be used to enable illicit activities, however, it’s unclear which is more likely or risky.

Trying to track every possible use case and guess which ones are most important is not only impractical, but it also won’t be effective for prioritizing risks. Instead, by tracking actual incidents and failures as they occur, AISIs can gather data to guide more precise, targeted evaluations, ensuring their focus remains on the most pressing risks. This would make post-deployment evaluations more relevant and responsive to real-world challenges.

Second, AISIs should expand post-deployment evaluations to include broader societal impacts of foundation models—impacts that lie at the heart of many public concerns. At present, post-deployment evaluations focus primarily on technical performance, even when applied in real-world contexts. For example, even though NIST’s facial recognition evaluations have been contextualized to evaluate masked faces during the COVID-19 pandemic, they remain a technical assessment, measuring the system’s ability to recognize features.

Foundation models, however, differ because the technical accuracy of their outputs alone—even when accompanied by confidence levels—cannot solely determine potential harm. While systems that use facial recognition provide results with associated confidence scores (e.g., a face is recognized with 95 percent certainty), foundation model outputs—such as an LLM summarizing health information—are more complex and open to human interpretation. The risks stem not just from technical errors or confidence thresholds, but from how the public understands, trusts, or misuses these summaries. For instance, even if an LLM provides a highly accurate summary with a stated confidence level, different users may have different interpretations of that summary, leading to misinformation or misunderstandings that could have real-world consequences. This variability makes it challenging to rely solely on technical metrics or confidence levels.

Therefore, AISIs should explore how to assess not only how well these models perform technically but also how they shape public behavior, influence trust, and affect societal outcomes. Additionally, concerns extend beyond user interaction. For example, foundation models’ energy consumption and environmental footprint are growing areas of concern. AISI evaluations could include post-deployment assessments of energy efficiency, measuring how different models perform across various applications. By taking a sociotechnical approach, which considers not just technical performance but also human interpretation and systemic impacts, AISIs can deliver a fuller picture of how foundation models affect public trust, behavior, and broader societal outcomes.

The United States has a lot at stake at November’s convening. With more frontier AI companies than the rest combined, U.S. policymakers have the most to gain and the most to lose if guardrails for AI are built on shaky ground. Policymakers should seize this opportunity on the global stage to cement an approach to AI safety that better accounts for the evolving risks of AI models in the real world.

Image Credit: AP Photo/Manuel Balce Ceneta, File

Back to Top