NVIDIA Open Sources Canary 1B and 180M Flash Multilingual Speech Models

NVIDIA Open Sources Canary 1B and 180M Flash Multilingual Speech Models



Enhancing Global Communication Through AI: NVIDIA’s Multilingual Speech Models

Enhancing Global Communication Through AI: NVIDIA’s Multilingual Speech Models

Introduction to Multilingual Speech Recognition

In today’s interconnected world, the ability to communicate across languages is essential for businesses. Multilingual speech recognition and translation tools play a crucial role in breaking down language barriers. However, developing effective models that can accurately transcribe and translate multiple languages in real-time is challenging. Key issues include managing linguistic variations, ensuring high accuracy, and minimizing latency.

NVIDIA’s Solution: Open-Source Models

NVIDIA AI has addressed these challenges by open-sourcing two innovative models: Canary 1B Flash and Canary 180M Flash. These models are designed for multilingual speech recognition and translation, supporting languages such as English, German, French, and Spanish. Released under the permissive CC-BY-4.0 license, they are available for commercial use, promoting innovation within the AI community.

Technical Overview

Both models employ an encoder-decoder architecture. The encoder, based on FastConformer, efficiently processes audio features, while the Transformer Decoder generates text. They utilize task-specific tokens to guide outputs, ensuring flexibility and adaptability. The Canary 1B Flash model features 32 encoder layers and 4 decoder layers, totaling 883 million parameters, while the Canary 180M Flash model includes 17 encoder layers and 4 decoder layers, amounting to 182 million parameters.

Performance Metrics

The performance of these models is impressive:

  • Canary 1B Flash:
    • Inference speed: Over 1000 RTFx
    • Word error rate (WER): 1.48% on Librispeech Clean
    • Multilingual WER: 4.36% (German), 2.69% (Spanish), 4.47% (French)
    • BLEU scores for AST: 32.27 (English to German), 22.6 (Spanish), 41.22 (French)
  • Canary 180M Flash:
    • Inference speed: Over 1200 RTFx
    • WER: 1.87% on Librispeech Clean
    • Multilingual WER: 4.81% (German), 3.17% (Spanish), 4.75% (French)
    • BLEU scores for AST: 28.18 (English to German), 20.47 (Spanish), 36.66 (French)

Advantages for Businesses

Both models support word-level and segment-level timestamping, which is essential for applications requiring precise synchronization between audio and text. Their compact sizes make them ideal for on-device deployment, facilitating offline processing and reducing reliance on cloud services. Additionally, their robustness minimizes errors during translation tasks, leading to more reliable outputs.

Conclusion

NVIDIA’s open-sourcing of the Canary 1B and 180M Flash models marks a significant milestone in multilingual speech recognition and translation. With their high accuracy, real-time processing capabilities, and suitability for on-device deployment, these models effectively address many existing challenges in the field. By making these technologies publicly accessible, NVIDIA is not only advancing AI research but also empowering developers and organizations to create more inclusive and efficient communication tools.

For further insights, explore the Canary 1B Model and Canary 180M Flash. All credit for this research goes to the researchers involved in this project. Stay connected with us on Twitter and join our community of over 80,000 on ML SubReddit.

Transform Your Business with AI

Consider how artificial intelligence can revolutionize your operations:

  • Identify processes suitable for automation.
  • Pinpoint customer interaction moments where AI adds value.
  • Establish key performance indicators (KPIs) to measure AI’s impact.
  • Select customizable tools that align with your business objectives.
  • Start with small projects, gather effectiveness data, and gradually scale AI implementation.

If you need assistance in managing AI within your business, please contact us at hello@itinai.ru or reach out via Telegram at Itinai.


AI Products for Business or Custom Development

AI Sales Bot

Welcome AI Sales Bot, your 24/7 teammate! Engaging customers in natural language across all channels and learning from your materials, it’s a step towards efficient, enriched customer interactions and sales

AI Document Assistant

Unlock insights and drive decisions with our AI Insights Suite. Indexing your documents and data, it provides smart, AI-driven decision support, enhancing your productivity and decision-making.

AI Customer Support

Upgrade your support with our AI Assistant, reducing response times and personalizing interactions by analyzing documents and past engagements. Boost your team and customer satisfaction

AI Scrum Bot

Enhance agile management with our AI Scrum Bot, it helps to organize retrospectives. It answers queries and boosts collaboration and efficiency in your scrum processes.

AI Agents

AI news and solutions