Itinai.com httpss.mj.rungdy7g1wsaug a cinematic still of a sc e1b0a79b d913 4bbc ab32 d5488e846719 0
Itinai.com httpss.mj.rungdy7g1wsaug a cinematic still of a sc e1b0a79b d913 4bbc ab32 d5488e846719 0

Samsung Introduces ANSE: Enhancing Text-to-Video Diffusion Models with Active Noise Selection

Samsung Researchers Introduce ANSE: Enhancing Text-to-Video Models

Samsung researchers have unveiled a groundbreaking framework named ANSE (Active Noise Selection for Generation) aimed at improving text-to-video (T2V) diffusion models. These models are vital for creating engaging video content from text prompts, yet they face challenges in producing consistent and high-quality outputs. ANSE addresses these challenges by employing model-aware strategies for noise selection, enhancing both video quality and alignment with textual prompts.

The Challenge of Video Generation

Text-to-video models utilize diffusion techniques to convert random noise into coherent video frames. However, the quality of the generated video can vary significantly based on the initial noise seed. This variability can lead to unpredictable results and inefficient use of computational resources. Traditional methods for noise selection often involve complex adjustments that can be expensive and ineffective. Therefore, there is a pressing need for a more systematic approach.

Introducing ANSE

ANSE employs an innovative technique called BANSA (Bayesian Active Noise Selection via Attention) to enhance the video generation process. By leveraging internal model signals, ANSE guides the selection of noise seeds based on their potential to yield high-quality outputs. This method quantifies the confidence of the model’s attention maps during the initial denoising stages, thus optimizing the noise selection process.

How BANSA Works

BANSA evaluates the entropy of attention maps produced during the early phases of video generation. The researchers discovered that certain layers of the model correlate well with overall uncertainty, allowing them to streamline the process. The BANSA score compares the average entropy of individual attention maps to the entropy of their combined average, enabling the selection of the most promising noise seed for final video production.

Performance Improvements

The implementation of ANSE has led to notable enhancements in video generation metrics:

  • On the CogVideoX-2B model, the total VBench score increased from 81.03 to 81.66 (+0.63), with quality and semantic alignment gains of +0.48 and +1.23, respectively.
  • For the larger CogVideoX-5B model, the score improved from 81.52 to 81.71 (+0.25), achieving quality and semantic alignment gains of +0.17 and +0.60.
  • These enhancements were achieved with minimal increases in inference time—8.68% for CogVideoX-2B and 13.78% for CogVideoX-5B—significantly lower than previous methods.

Advantages of ANSE

ANSE stands out due to several key advantages:

  • Significant improvements in VBench scores for both models.
  • Enhanced quality and semantic alignment without substantial increases in processing time.
  • More efficient noise selection compared to random and entropy-based methods.
  • Reduced computational load through targeted layer selection.

Conclusion

In conclusion, the introduction of ANSE represents a significant advancement in the field of text-to-video generation. By utilizing internal attention signals to guide noise selection, ANSE effectively addresses the unpredictability of video outputs, resulting in enhanced quality and alignment with textual prompts. This innovative approach not only optimizes computational resources but also sets a new standard for video generation models.

For further insights into this research, please refer to the Paper and Project Page. To keep updated on developments in artificial intelligence, follow us on social media or join our community of over 95,000 members on our ML SubReddit.

Explore how AI can streamline your business operations. Identify processes for automation, assess the impact of AI on your KPIs, and select tools tailored to your needs. Start small, measure effectiveness, and gradually expand your AI initiatives. For guidance on implementing AI in your business, contact us at hello@itinai.ru or connect with us on Telegram, X, and LinkedIn.

Itinai.com office ai background high tech quantum computing 0002ba7c e3d6 4fd7 abd6 cfe4e5f08aeb 0

Vladimir Dyachkov, Ph.D
Editor-in-Chief itinai.com

I believe that AI is only as powerful as the human insight guiding it.

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

  • Automation of internal processes.
  • Optimizing AI costs without huge budgets.
  • Training staff, developing custom courses for business needs
  • Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

100% of clients report increased productivity and reduced operati

AI news and solutions