SongGen: A Fully Open-Source Single-Stage Auto-Regressive Transformer Designed for Controllable Song Generation

Challenges in Song Generation

Creating songs from text is a complex task that requires generating both vocals and instrumental music simultaneously. This process is more intricate than generating speech or instrumental music alone due to the unique combination of lyrics and melodies that express emotions. A significant barrier to progress in this field is the limited availability of quality open-source data, which hampers research and development.

Current Approaches and Limitations

Most existing text-to-music generation models struggle with realistic vocal generation. While transformer-based models and diffusion models excel in producing high-quality instrumental music, they face challenges when it comes to vocals. Current methods, such as Jukebox and MelodyLM, generate vocals and accompaniment separately, complicating the training and prediction processes and reducing overall control over the final song.

Introducing SongGen

To address these challenges, researchers developed SongGen, an auto-regressive transformer decoder that integrates a neural audio codec. This model predicts audio token sequences that are synthesized into complete songs. SongGen offers two generation modes: Mixed Mode and Dual-Track Mode.

Mixed Mode

In Mixed Mode, X-Codec encodes raw audio into discrete tokens, focusing on earlier codebooks to enhance vocal clarity. The Mixed Pro variant introduces an auxiliary loss specifically for vocals, improving their quality.

Dual-Track Mode

Dual-Track Mode generates vocals and accompaniment separately, synchronizing them through Parallel or Interleaving patterns. Parallel mode aligns tokens frame-by-frame, while Interleaving mode enhances interaction between vocals and accompaniment.

Data Processing and Evaluation

Due to the scarcity of public text-to-song datasets, an automated pipeline was created to process 8,000 hours of audio from various sources, ensuring quality through filtering strategies. SongGen was evaluated against models like Stable Audio Open and MusicGen, demonstrating superior performance in text relevance and vocal control.

Conclusion and Future Directions

SongGen simplifies text-to-song generation with its single-stage, auto-regressive transformer, showcasing strong performance in both mixed and dual-track modes. Its open-source nature makes it accessible for both beginners and experts, allowing for precise control over voice and instrument components. However, ethical considerations regarding voice mimicry must be addressed. As a foundational model in controllable text-to-song generation, SongGen paves the way for future advancements in audio quality and expressive singing synthesis.

Next Steps for Businesses

Explore how artificial intelligence can enhance your business processes:

  • Identify areas for automation to improve efficiency.
  • Determine key performance indicators (KPIs) to measure the impact of AI investments.
  • Select customizable tools that align with your business objectives.
  • Start with a small project, analyze its effectiveness, and gradually expand AI applications.

Contact Us

If you need assistance in managing AI in your business, reach out to us at hello@itinai.ru. Connect with us on Telegram, X, and LinkedIn.


AI Products for Business or Try Custom Development

AI Sales Bot

Welcome AI Sales Bot, your 24/7 teammate! Engaging customers in natural language across all channels and learning from your materials, it’s a step towards efficient, enriched customer interactions and sales

AI Document Assistant

Unlock insights and drive decisions with our AI Insights Suite. Indexing your documents and data, it provides smart, AI-driven decision support, enhancing your productivity and decision-making.

AI Customer Support

Upgrade your support with our AI Assistant, reducing response times and personalizing interactions by analyzing documents and past engagements. Boost your team and customer satisfaction

AI Scrum Bot

Enhance agile management with our AI Scrum Bot, it helps to organize retrospectives. It answers queries and boosts collaboration and efficiency in your scrum processes.