CMU Researchers Propose XEUS: A Cross-lingual Encoder for Universal Speech trained in 4000+ Languages

Practical Solutions for Multilingual Speech Processing

Introducing XEUS: A Cross-lingual Encoder for Universal Speech

Self-supervised learning (SSL) has expanded the reach of speech technologies to many languages by minimizing the need for labeled data. However, current models only support 100-150 of the world’s 7,000+ languages. This limitation is largely due to the scarcity of transcribed speech, as only about half of these languages have formal writing systems, and even fewer have the resources to generate the extensive annotated data needed for training. While SSL models can operate with unlabeled data, they typically cover a narrow range of languages. Projects like MMS have extended coverage to over 1,000 languages but need help with data noise and a lack of diverse recording conditions.

Researchers from Carnegie Mellon University, Shanghai Jiaotong University, and Toyota Technological Institute in Chicago have developed XEUS, a Cross-lingual Encoder for Universal Speech. XEUS is trained on over 1 million hours of data from 4,057 languages, significantly increasing the language coverage of SSL models. This includes a new corpus of 7,413 hours from 4,057 languages, which will be publicly released. XEUS incorporates a novel dereverberation objective for enhanced robustness. It outperforms state-of-the-art models in various benchmarks, including ML-SUPERB. To support further research, the researchers will release XEUS, its code, training configurations, checkpoints, and training logs.

SSL has advanced speech processing by enabling neural networks to learn from large amounts of unlabeled data, which can then be fine-tuned for various tasks. Multilingual SSL models can leverage cross-lingual transfer learning but only scale to cover a few languages. XEUS, however, scales to 4,057 languages, surpassing models like Meta’s MMS. XEUS includes a novel dereverberation objective during training to handle noisy and diverse speech. Unlike state-of-the-art models that often use closed datasets and lack transparency, XEUS is fully open, with publicly available data, training code, and extensive documentation, facilitating further research into large-scale multilingual SSL.

XEUS is pre-trained using a vast dataset of 1.081 million hours across 4,057 languages, compiled from 37 public speech datasets and additional sources like Global Recordings Network, WikiTongues, and Jesus Dramas. Unique data types enhance its robustness, such as accented speech and code-switching. XEUS incorporates new objectives, including dereverberation and noise reduction, during training. The model architecture is based on HuBERT but includes enhancements like E-Branchformer layers and a simplified loss function. The training on 64 NVIDIA A100 GPUs uses advanced augmentation techniques and spans significantly more data than previous models.

The XEUS model is evaluated across various downstream tasks to assess its multilingual and acoustic representation capabilities. It excels in multilingual speech tasks, outperforming state-of-the-art models like XLS-R, MMS, and w2v-BERT on benchmarks such as ML-SUPERB and FLEURS, especially in low-resource language settings. Additionally, XEUS demonstrates strong performance in task universality by matching or exceeding leading models in English-only tasks like emotion recognition and speaker diarization. In acoustic representation, XEUS surpasses models like WavLM and w2v-BERT in generating high-quality speech, which is evident through metrics like MOS and WER.

XEUS is a robust SSL speech encoder trained on over 1 million hours of data spanning 4,057 languages, demonstrating superior performance across a wide range of multilingual and low-resource tasks. XEUS’s dereverberation task enhances its robustness, and despite the limited data for many languages, it still provides valuable results. XEUS advances multilingual research by offering open access to its data and model. However, ethical considerations are crucial, especially in handling speech data from indigenous communities and preventing misuse, such as generating audio deepfakes. XEUS’s integration with accessible platforms aims to democratize speech model development.

AI Solutions for Business Transformation

If you want to evolve your company with AI, stay competitive, and use CMU Researchers’ XEUS for multilingual speech processing. Discover how AI can redefine your way of work and sales processes. Identify automation opportunities, define KPIs, select an AI solution, and implement gradually. For AI KPI management advice, connect with us at hello@itinai.com. For continuous insights into leveraging AI, stay tuned on our Telegram t.me/itinainews or Twitter @itinaicom. Explore solutions at itinai.com.

List of Useful Links:

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Automation of internal processes.
Optimizing AI costs without huge budgets.
Training staff, developing custom courses for business needs
Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

Get a plan to reduce routine and improve metrics

100% of clients report increased productivity and reduced operati

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…

Itinai.com httpss.mj.runmrqch2uvtvo russian handsome charisma 9fdbb2d5 a55b 425d 8f3b 76d26f86710f 2

AI Business Accelerator

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

Have an audience (even 500+ followers in Instagram, email, etc.)
Have an idea, service, or product you want to scale
Can invest 2–3 hours a day
You’re motivated to earn with AI but don’t want to handle technical setup

AI news and solutions

Meet JoyTag: An Inclusive Image Tagging AI Model with Joyful Vision Model

The latest advancements in Artificial Intelligence have led to the emergence of JoyTag, an inclusive image tagging AI model. JoyTag introduces gender positivity, inclusivity, and an expanded tagging schema to broaden its applicability across various image…

AI Tech News
Advancing Medical AI: Evaluating OpenAI’s o1-Preview Model and Optimizing Inference Strategies

Medprompt: Enhancing AI for Medical Applications What is Medprompt? Medprompt is a strategy that improves general AI models, like GPT-4, for specialized fields such as medicine. It uses structured techniques to guide the AI in making…

AI Tech News
SuperBPE: Enhancing Language Models with Advanced Cross-Word Tokenization

SuperBPE: Enhancing Language Models with Advanced Tokenization SuperBPE: Enhancing Language Models with Advanced Tokenization Introduction to Tokenization Challenges Language models (LMs) encounter significant challenges in processing textual data due to the limitations of traditional tokenization methods.…

AI Tech News
Pumpkin Spice Time Series Analysis

The text discusses a time series analysis of the popularity of the search term “pumpkin spice” in the USA. The author explores different modeling techniques, such as SARIMA and ETS, to predict the seasonal patterns in…

AI Tech News
Learn how to assess the risk of AI systems

Artificial intelligence (AI) has the potential to improve society, and the adoption of AI technologies has accelerated. Amazon has launched generative AI services like Amazon Bedrock and CodeWhisperer to unlock the capabilities of generative AI. Assessing…

AI Tech News
MQRLD: A Groundbreaking Platform for Efficient Multimodal Data Retrieval, Offering Transparent Storage, Learned Indexing, and Superior Query Performance

Practical Solutions for Multimodal Data Retrieval Challenges in Data Retrieval Managing and retrieving data from multiple sources, such as text, audio, video, and images, becomes crucial as data volume and complexity increase, especially in sectors like…

AI Tech News
Researchers from MIT and ETH Zurich Developed a Machine-Learning Technique for Enhanced Mixed Integer Linear Programs (MILP) Solving Through Dynamic Separator Selection

MIT and ETH Zurich researchers have developed a data-driven machine-learning technique to enhance the solving of complex optimization problems. By integrating machine learning into traditional MILP solvers, companies can tailor solutions to specific problems and achieve…

AI Tech News
Researchers from UCSD and Adobe Introduce Presto!: An AI Approach to Inference Acceleration for Score-based Diffusion Transformers via Reducing both Sampling Steps and Cost Per Step

Text-to-Audio and Text-to-Music Innovations Recent advancements in Text-to-Audio (TTA) and Text-to-Music (TTM) technologies have been driven by new audio models. These models outperform older methods like GANs and VAEs in creating high-quality audio. However, they struggle…

AI Tech News
This AI Paper from Google AI Proposes Online AI Feedback (OAIF): A Simple and Effective Way to Make DAP Methods Online via AI Feedback

Large language models (LLMs) aligning with human expectations is crucial for societal benefits. Reinforcement learning from human feedback (RLHF) and direct alignment from preferences (DAP) are approaches discussed. A new study introduces Online AI Feedback (OAIF)…

AI Tech News
Researchers from University of Waterloo and CMU Introduce Critique Fine-Tuning (CFT): A Novel AI Approach for Enhancing LLM Reasoning with Structured Critique Learning

Transforming Language Model Training with Critique Fine-Tuning Limitations of Traditional Training Methods Traditional training for language models often relies on imitating correct answers. While this works for simple tasks, it limits the model’s ability to think…

AI Tech News
Nvidia Open Sources Nemotron-Mini-4B-Instruct: A 4,096 Token Capacity Small Language Model Designed for Roleplaying, Function Calling, and Efficient On-Device Deployment with 32 Attention Heads and 9,216 MLP

Nvidia Unveils Nemotron-Mini-4B-Instruct: A Small Language Model with Big Potential Nvidia has introduced its latest small language model, Nemotron-Mini-4B-Instruct, designed for tasks like roleplaying, retrieval-augmented generation (RAG), and function calls. It is a more compact and…

AI Tech News
Researchers from China Introduce CogVLM: A Powerful Open-Source Visual Language Foundation Model

Researchers from Zhipu AI and Tsinghua University have introduced CogVLM, an open-source visual language model that aims to enhance the integration between language and visual information. This model achieves state-of-the-art or near-best performance on various cross-modal…

AI Tech News
COLLAGE: A New Machine Learning Approach to Deal with Floating-Point Errors in Low-Precision to Make LLM Training Accurate and Efficient

Practical AI Solutions for Language Model Training Introducing COLLAGE: A New Machine Learning Approach Large language models (LLMs) have transformed natural language processing, but their training presents challenges such as high resource requirements and long training…

AI Tech News
UC Berkeley Researchers Released Sky-T1-32B-Preview: An Open-Source Reasoning LLM Trained for Under $450 Surpasses OpenAI-o1 on Benchmarks like Math500, AIME, and Livebench

Unlocking AI for Everyone The rapid growth of artificial intelligence (AI) brings exciting opportunities, but high costs often limit access. Advanced models like GPT-4 and OpenAI’s o1 are powerful but expensive to develop and train. This…

AI Tech News
Researchers at Google AI Innovates Privacy-Preserving Cascade Systems for Enhanced Machine Learning Model Performance

AI Tech News
Avoid Overfitting in Neural Networks: a Deep Dive

Explore regularization methods to enhance Neural Network performance and avoid overfitting. Read more at Towards Data Science.

AI Tech News
EasyJailbreak: A Unified Machine Learning Framework for Enhancing LLM Security by Simplifying Jailbreak Attack Creation and Assessment Against Emerging Threats

AI Tech News
Best AI Tools For Students (March 2026)

AI is revolutionizing education with various applications such as interactive virtual classrooms, customized lesson plans, conversational technology, and more. Innovative AI tools like Gradescope for grading, Undetectable AI for content creation, and Quizgecko for online tests…

AI Tech News
A Comprehensive Guide to Concepts in Fine-Tuning of Large Language Models (LLMs)

Understanding Fine-Tuning of Large Language Models (LLMs) Importance of Fine-Tuning Fine-tuning is essential for enhancing the performance of Large Language Models (LLMs) in specific tasks. It customizes the model to make it more efficient and accurate…

AI Tech News
BrainChip Unveils Second-Generation Akida Platform for Edge AI Advancements

BrainChip has introduced the second-generation Akida platform, a breakthrough in Edge AI that provides edge devices with powerful processing capabilities and reduces dependence on the cloud. The platform features Temporal Event-Based Neural Network (TENN) acceleration and…

AI Tech News