MLCommons and Big Tech to develop AI safety benchmarks

MLCommons has formed the AI Safety Working Group (AIS) to develop benchmarks for AI safety. Currently, there is no standardized benchmark to compare the safety of different AI models. AIS will build upon the Holistic Evaluation of Language Models (HELM) framework developed by Stanford University to create safety benchmarks for large language models. Several prominent companies and organizations, including Google, Microsoft, and OpenAI, are part of the AIS Working Group. Once agreed upon, safety benchmarks could enhance productivity at AI Safety Summits and influence government regulations.

 MLCommons and Big Tech to develop AI safety benchmarks

MLCommons and Big Tech to Develop AI Safety Benchmarks

The AI safety debate is ongoing, but the industry lacks a clear definition of what “safe” AI means and a benchmark to compare different models’ safety levels.

MLCommons has formed working groups with various companies to become the leading AI benchmarking organization.

While benchmarks like MLPerf enable us to compare GPU performance or populate leaderboards, there is currently no industry-standard benchmark for AI safety.

To address this, MLCommons has established the AI Safety Working Group (AIS) to develop a set of AI safety benchmarks.

Some companies and organizations have already made progress in this area. Google’s guardrails for generative AI and the University of Washington’s RealToxicityPrompts are notable examples.

However, these benchmarking tests rely on specific prompts and may not provide a comprehensive assessment of model safety. They also often use open datasets, which could bias the results if the models were trained on the same datasets.

Stanford University’s Center for Research on Foundation Models has developed the Holistic Evaluation of Language Models (HELM), which offers a more comprehensive approach to testing LLM safety.

AIS will build upon the HELM framework to develop safety benchmarks for large language models and invites industry participation.

MLCommons expects companies to share their internal AI safety tests with the community, accelerating innovation.

The AIS Working Group includes Anthropic, Coactive AI, Google, Inflection, Intel, Meta, Microsoft, NVIDIA, OpenAI, Qualcomm Technologies, and AI academics.

Once the AI industry agrees on a safety benchmark, events like the AI Safety Summit will become more productive. Government regulators may also require AI companies to achieve a specific benchmark score before releasing their models.

Having an industry-accepted safety scorecard can also drive engineering budget allocation towards AI safety and serve as a marketing tool through leaderboards.

How AI Can Transform Your Company

To evolve your company with AI and stay competitive, consider leveraging MLCommons and Big Tech’s efforts in developing AI safety benchmarks.

Here are some practical steps to embrace AI:

Identify Automation Opportunities:

Locate key customer interaction points that can benefit from AI.

Define KPIs:

Ensure that your AI initiatives have measurable impacts on business outcomes.

Select an AI Solution:

Choose tools that align with your needs and offer customization.

Implement Gradually:

Start with a pilot, gather data, and expand AI usage judiciously.

For AI KPI management advice, connect with us at hello@itinai.com. For continuous insights into leveraging AI, stay tuned on our Telegram channel t.me/itinainews or follow us on Twitter @itinaicom.

Spotlight on a Practical AI Solution: AI Sales Bot

Consider using the AI Sales Bot from itinai.com/aisalesbot to automate customer engagement 24/7 and manage interactions throughout the customer journey.

Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.

List of Useful Links:

AI Products for Business or Try Custom Development

AI Sales Bot

Welcome AI Sales Bot, your 24/7 teammate! Engaging customers in natural language across all channels and learning from your materials, it’s a step towards efficient, enriched customer interactions and sales

AI Document Assistant

Unlock insights and drive decisions with our AI Insights Suite. Indexing your documents and data, it provides smart, AI-driven decision support, enhancing your productivity and decision-making.

AI Customer Support

Upgrade your support with our AI Assistant, reducing response times and personalizing interactions by analyzing documents and past engagements. Boost your team and customer satisfaction

AI Scrum Bot

Enhance agile management with our AI Scrum Bot, it helps to organize retrospectives. It answers queries and boosts collaboration and efficiency in your scrum processes.