Training large language models (LLMs) in natural language processing (NLP) is widely popular. Yet, the need for flexible and scalable vision models remains. An EPFL and Apple team introduces 4M, a multimodal masked modeling approach. It aims to efficiently handle various input types, from pictures to text, and excels in scalability and shared representations. The framework promises great potential for vision tasks and future advancements. Read more at https://t.co/usE17pnXf9.
EPFL and Apple Researchers Open-Sources 4M: An Artificial Intelligence Framework for Training Multimodal Foundation Models Across Tens of Modalities and Tasks
Introduction
Training large language models (LLMs) for natural language processing (NLP) has gained popularity. However, there is a need for equally flexible and scalable models for vision. Vision models must handle various sensory inputs and perform various tasks.
Scalability Factors
Data, architecture, and training purpose are critical scalability factors. Data scalability refers to leveraging more training samples, architectural scalability means performance improves with increasing model size, and scalable training goals should efficiently handle an increasing number of modalities without increasing computational costs.
4M Approach
The 4M approach involves training a single integrated Transformer encoder-decoder with a multimodal masked modeling goal. It combines the best features of masked modeling and multimodal learning, allowing for strong cross-modal predictive coding abilities and shared scene representations. The approach integrates these advantages while maintaining efficiency through many processes.
Efficient Training
4M can train efficiently by utilizing input and target masking, even though it operates on a vast collection of modalities. This prevents the computational cost from quickly increasing as the number of modalities increases.
Practical Applications
4M models can be fine-tuned to achieve remarkable results on unforeseen downstream tasks and input modalities. It also allows for diverse expression of user intent and various multimodal editing tasks.
AI Solution for Middle Managers
If you want to evolve your company with AI, consider using EPFL and Apple Researchers Open-Sources 4M. Identify automation opportunities, define KPIs, select an AI solution, and implement gradually to stay competitive and redefine your way of work.
For AI KPI management advice, connect with us at hello@itinai.com. Explore the AI Sales Bot from itinai.com/aisalesbot designed to automate customer engagement 24/7 and manage interactions across all customer journey stages.