This AI Paper Introduces BEST-STD (Spoken Term Detection): A Novel Bidirectional Mamba-Enhanced Speech Tokenization Framework for Efficient Spoken Term Detection

Spoken Term Detection (STD) Overview

Spoken Term Detection (STD) helps identify specific phrases in large audio collections. It’s used in voice searches, transcription services, and multimedia indexing, making audio data easier to access and use. This is particularly valuable for podcasts, lectures, and broadcast media.

Challenges in Spoken Term Detection

One major challenge is managing out-of-vocabulary (OOV) terms and the heavy computing required by traditional systems. Current methods often rely on automatic speech recognition (ASR), which can be inaccurate and resource-intensive, especially with shorter audio clips or varying sound conditions. They struggle to accurately segment continuous speech, which hampers term identification.

Current Approaches to STD

Current techniques include ASR-based methods, dynamic time warping (DTW), and acoustic word embeddings. However, these methods face issues such as speaker variability, inefficiency, and limitations in adapting to different datasets, especially for unknown terms.

Introducing BEST-STD

A team from the Indian Institute of Technology Kanpur and imec – Ghent University has created a new framework called BEST-STD. This innovative approach converts speech into speaker-neutral semantic tokens, making it easier to retrieve spoken content using text-based algorithms.

How BEST-STD Works

The BEST-STD system uses a bidirectional Mamba encoder that processes audio from both directions, capturing long-range dependencies. It converts audio data into token sequences via a vector quantizer, employing self-supervised learning to ensure consistency in token representations regardless of the speaker.

Performance Benefits of BEST-STD

BEST-STD has shown impressive results in tests, scoring higher in token consistency compared to traditional methods and other tokenization models. For spoken content retrieval, it achieved high mean average precision (MAP) and mean reciprocal rank (MRR) scores, demonstrating its effectiveness with both in-vocabulary and out-of-vocabulary terms.

Advantages of the Framework

BEST-STD stands out for its fast retrieval speeds and efficiency, utilizing an inverted index for tokenized sequences. It reduces the need for heavy computational matching techniques, making it viable for large datasets. The bidirectional encoder also outperforms traditional methods due to its ability to handle fine-grained temporal information.

Conclusion

The BEST-STD framework is a significant improvement in spoken term detection, offering a robust solution for audio retrieval tasks. Its speaker-neutral tokens and advanced encoding technique enhance performance and adaptability, promising better accessibility and searchability in audio processing.

Learn More

Check out the Paper for more insights, and stay connected with us on Twitter, join our Telegram Channel, and be part of our LinkedIn Group. Don’t forget to subscribe to our newsletter and join our 55k+ ML SubReddit.

Stay Competitive with AI

Explore how AI can transform your business. Identify Automation Opportunities, Define KPIs, Select AI Solutions, and Implement Gradually. For consultation, contact us at hello@itinai.com and follow us for ongoing AI insights.

List of Useful Links:

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Automation of internal processes.
Optimizing AI costs without huge budgets.
Training staff, developing custom courses for business needs
Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

Get a plan to reduce routine and improve metrics

100% of clients report increased productivity and reduced operati

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…

Itinai.com httpss.mj.runmrqch2uvtvo russian handsome charisma 9fdbb2d5 a55b 425d 8f3b 76d26f86710f 2

AI Business Accelerator

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

Have an audience (even 500+ followers in Instagram, email, etc.)
Have an idea, service, or product you want to scale
Can invest 2–3 hours a day
You’re motivated to earn with AI but don’t want to handle technical setup

AI news and solutions

This AI Paper from Google Research Introduces Speculative Knowledge Distillation: A Novel AI Approach to Bridging the Gap Between Teacher and Student Models

Understanding Knowledge Distillation (KD) Knowledge Distillation (KD) is a machine learning method that transfers knowledge from a large, complex model (the teacher) to a smaller, more efficient model (the student). This technique helps reduce the computational…

AI Tech News
Researchers from the University of Washington Developed a Deep Learning Method for Protein Sequence Design that Explicitly Models the Full Non-Protein Atomic Context

University of Washington researchers developed LigandMPNN, a deep learning-based protein sequence design method targeting enzymes and small molecule interactions. It explicitly models non-protein atoms and molecules, outperforming existing methods like Rosetta and ProteinMPNN in accuracy, speed,…

AI Tech News
This AI Paper Introduces Llama-3-8B-Instruct-80K-QLoRA: New Horizons in AI Contextual Understanding

Natural Language Processing Advancements Natural language processing (NLP) focuses on enabling computers to understand and generate human language, making interactions more intuitive and efficient. Recent developments in this field have significantly impacted machine translation, chatbots, and…

AI Tech News
Optimizing Computational Resources for Machine Learning and Data Science Projects: A Practical Approach

Optimizing Computational Resources for Machine Learning and Data Science Projects: A Practical Approach Every computation requires computing resources. In machine learning, powerful computing resources are necessary for feeding massive amounts of data to the model, performing…

AI Tech News
How Facebook went all in on AI

Facebook’s introduction of the News Feed in 2006 revolutionized the platform, providing users with a constantly updating stream of posts and status changes. Despite user complaints, engagement doubled. The company then implemented an algorithm called EdgeRank…

AI Tech News
The Inflation of AI: Is More Always Better?

Hypothesis-driven development can mitigate the drawbacks of the rapid emergence of new ML models, as new models are being developed hourly.

AI Tech News
FedFixer: A Machine Learning Algorithm with the Dual Model Structure to Mitigate the Impact of Heterogeneous Noisy Label Samples in Federated Learning

AI Tech News
Optimizing Large Language Models for Concise and Accurate Responses through Constrained Chain-of-Thought Prompting

Optimizing Large Language Models for Concise and Accurate Responses through Constrained Chain-of-Thought Prompting Practical Solutions and Value Recent advancements in Large Language Models (LLMs) have led to impressive abilities in handling complex question-answering tasks. However, challenges…

AI Tech News
Google Pours $2 Billion into AI Firm Anthropic and Inks Cloud Deal

Google has agreed to invest $2 billion in Anthropic, a rising star in the AI industry. The investment will be made in the form of a convertible note, similar to a deal Amazon made earlier this…

AI Tech News
OpenAI considers in-house chip manufacturing amid global shortage

OpenAI is reportedly exploring the possibility of manufacturing its own processing chips to address the global shortage of these components. The company is considering options including acquiring a chip-making company and increasing its collaboration with primary…

AI Tech News
GitHub Spark: Revolutionizing App Development for Developers and Business Managers

Understanding the Target Audience The launch of GitHub Spark presents a game-changing opportunity for various groups in the tech landscape. The primary audience includes: Developers: From novices to seasoned experts, they seek efficient tools to enhance…

AI Tech News
UGround: A Universal GUI Visual Grounding Model Developed with Large-Scale Web-based Synthetic Data

Understanding GUI Agents and Their Importance Graphical User Interface (GUI) agents play a vital role in automating how we interact with software, just like humans do with keyboards and touchscreens. These agents make complex tasks easier…

AI Tech News
Self-Rewarding Reasoning in LLMs for Enhanced Mathematical Error Correction

Enhancing Reasoning in Language Models Large Language Models (LLMs) such as ChatGPT, Claude, and Gemini have shown impressive reasoning abilities, particularly in mathematics and coding. The introduction of GPT-4 has further increased interest in improving these…

AI Tech News
RARE: A Scalable AI Framework for Enhanced Domain-Specific Reasoning

RARE: Enhancing Domain-Specific Reasoning in AI RARE: A Scalable AI Framework for Domain-Specific Reasoning Introduction Recent advancements in Large Language Models (LLMs) have shown impressive capabilities across various tasks, including mathematical reasoning and automation. However, these…

AI Tech News
Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

AWS’s suite of low-code and no-code ML tools, such as Amazon SageMaker Canvas, enables rapid, cost-effective machine learning model development without requiring coding expertise. Deloitte uses these tools to expedite project delivery and take on more…

AI Tech News
CS-Bench: A Bilingual (Chinese-English) Benchmark Dedicated to Evaluating the Performance of LLMs in Computer Science

The Value of CS-Bench in Evaluating LLMs in Computer Science Introduction The emergence of large language models (LLMs) has shown significant potential across various fields. However, effectively utilizing computer science knowledge and enhancing LLMs’ performance remains…

AI Tech News
Top 25 AI Tools to Increase Productivity in 2025

Transforming Daily Tasks with AI Artificial Intelligence (AI) is changing how we handle daily tasks by making processes easier and more efficient. AI tools boost productivity and provide creative solutions for various challenges, such as managing…

AI Tech News
Meet OpenCoder: A Completely Open-Source Code LLM Built on the Transparent Data Process Pipeline and Reproducible Dataset

Meet OpenCoder OpenCoder is a fully open-source code language model designed to enhance transparency and reproducibility in AI code development. What Makes OpenCoder Valuable? Transparency: OpenCoder offers clear insights into its training data and processes, enabling…

AI Tech News
Meta has updated policies to require labeling of AI-generated ads

Meta has implemented new policies regarding political advertising. Advertisers must now disclose the use of third-party AI software in ads featuring synthetic depictions of people and events that could impact politics or social issues. Meta itself…

AI Tech News
Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference

On-device machine learning moves computation to personal devices, enhancing user privacy and experiences. However, optimizing models on limited resources poses challenges. To address this, Talaria, a model visualization and optimization system, aids in compiling models to…

AI Tech News