MLCommons has formed the AI Safety Working Group (AIS) to develop benchmarks for AI safety. Currently, there is no standardized benchmark to compare the safety of different AI models. AIS will build upon the Holistic Evaluation of Language Models (HELM) framework developed by Stanford University to create safety benchmarks for large language models. Several prominent companies and organizations, including Google, Microsoft, and OpenAI, are part of the AIS Working Group. Once agreed upon, safety benchmarks could enhance productivity at AI Safety Summits and influence government regulations.
MLCommons and Big Tech to Develop AI Safety Benchmarks
The AI safety debate is ongoing, but the industry lacks a clear definition of what “safe” AI means and a benchmark to compare different models’ safety levels.
MLCommons has formed working groups with various companies to become the leading AI benchmarking organization.
While benchmarks like MLPerf enable us to compare GPU performance or populate leaderboards, there is currently no industry-standard benchmark for AI safety.
To address this, MLCommons has established the AI Safety Working Group (AIS) to develop a set of AI safety benchmarks.
Some companies and organizations have already made progress in this area. Google’s guardrails for generative AI and the University of Washington’s RealToxicityPrompts are notable examples.
However, these benchmarking tests rely on specific prompts and may not provide a comprehensive assessment of model safety. They also often use open datasets, which could bias the results if the models were trained on the same datasets.
Stanford University’s Center for Research on Foundation Models has developed the Holistic Evaluation of Language Models (HELM), which offers a more comprehensive approach to testing LLM safety.
AIS will build upon the HELM framework to develop safety benchmarks for large language models and invites industry participation.
MLCommons expects companies to share their internal AI safety tests with the community, accelerating innovation.
The AIS Working Group includes Anthropic, Coactive AI, Google, Inflection, Intel, Meta, Microsoft, NVIDIA, OpenAI, Qualcomm Technologies, and AI academics.
Once the AI industry agrees on a safety benchmark, events like the AI Safety Summit will become more productive. Government regulators may also require AI companies to achieve a specific benchmark score before releasing their models.
Having an industry-accepted safety scorecard can also drive engineering budget allocation towards AI safety and serve as a marketing tool through leaderboards.
How AI Can Transform Your Company
To evolve your company with AI and stay competitive, consider leveraging MLCommons and Big Tech’s efforts in developing AI safety benchmarks.
Here are some practical steps to embrace AI:
Identify Automation Opportunities:
Locate key customer interaction points that can benefit from AI.
Define KPIs:
Ensure that your AI initiatives have measurable impacts on business outcomes.
Select an AI Solution:
Choose tools that align with your needs and offer customization.
Implement Gradually:
Start with a pilot, gather data, and expand AI usage judiciously.
For AI KPI management advice, connect with us at hello@itinai.com. For continuous insights into leveraging AI, stay tuned on our Telegram channel t.me/itinainews or follow us on Twitter @itinaicom.
Spotlight on a Practical AI Solution: AI Sales Bot
Consider using the AI Sales Bot from itinai.com/aisalesbot to automate customer engagement 24/7 and manage interactions throughout the customer journey.
Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.