Qilin: A Multimodal Dataset for Enhanced Search and Recommendation Systems

Importance of Search Engines and Recommender Systems

Search engines and recommender systems play a crucial role in online content platforms today. Traditional search methods primarily focus on text, leaving a significant gap in effectively handling images and videos, which are vital in User-Generated Content (UGC) communities.

Challenges in Current Search and Recommendation Systems

Current datasets for search and recommendation tasks are limited to textual information or dense statistical features, hindering the development of effective multimodal services. Additionally, session-level signals contain valuable contextual information that can enhance user satisfaction and retention.

Existing Solutions

Some existing approaches have attempted to tackle multimodal retrieval challenges. Representation learning methods map images into binary spaces or encode them using deep neural networks. While hash-aware methods offer efficient real-time performance, semantic-based approaches focus on understanding different modalities and matching them effectively.

The Qilin Dataset

Researchers from Xiaohongshu Inc. and Tsinghua University have introduced Qilin, a multimodal information retrieval dataset aimed at improving search and recommendation services. Collected from Xiaohongshu, a popular social platform with over 300 million monthly active users, this dataset includes diverse user sessions with image-text notes, video notes, and commercial notes.

Key Features of Qilin

Qilin provides extensive APP-level contextual signals and genuine user feedback, which are essential for modeling user satisfaction and analyzing various user behaviors. The dataset construction involves user sampling, log joining, feature collection, and data filtering, resulting in a comprehensive resource for research.

Performance Insights

Initial results indicate that the BERT cross-encoder outperforms the bi-encoder in relevance matching, and Vision-Language Models (VLM) yield even better results by incorporating visual information. However, the performance gap in recommendation tasks highlights the need for greater model robustness.

Conclusion

The introduction of the Qilin dataset represents a significant advancement in multimodal information retrieval research. It addresses critical gaps in existing datasets and provides a rich framework for exploring various information retrieval tasks. Preliminary experiments demonstrate its versatility and potential applications.

Next Steps for Businesses

Explore how artificial intelligence can enhance your operations:

  • Identify processes that can be automated.
  • Determine key performance indicators (KPIs) to measure the impact of AI investments.
  • Select customizable tools that align with your business objectives.
  • Start with small projects, assess their effectiveness, and gradually expand AI usage.

Contact Us

If you need assistance with managing AI in your business, reach out to us at hello@itinai.ru. Connect with us on Telegram, X, and LinkedIn.


AI Products for Business or Try Custom Development

AI Sales Bot

Welcome AI Sales Bot, your 24/7 teammate! Engaging customers in natural language across all channels and learning from your materials, it’s a step towards efficient, enriched customer interactions and sales

AI Document Assistant

Unlock insights and drive decisions with our AI Insights Suite. Indexing your documents and data, it provides smart, AI-driven decision support, enhancing your productivity and decision-making.

AI Customer Support

Upgrade your support with our AI Assistant, reducing response times and personalizing interactions by analyzing documents and past engagements. Boost your team and customer satisfaction

AI Scrum Bot

Enhance agile management with our AI Scrum Bot, it helps to organize retrospectives. It answers queries and boosts collaboration and efficiency in your scrum processes.