Is ConvNet Making a Comeback? Unraveling Their Performance on Web-Scale Datasets and Matching Vision Transformers

Researchers challenge the belief that Vision Transformers (ViTs) outperform Convolutional Neural Networks (ConvNets) with large datasets. They introduce NFNet, a ConvNet architecture pre-trained on the JFT-4B dataset. NFNet performs comparably to ViTs, showing that computational resources are crucial for model performance. The study encourages fair evaluation of different architectures considering performance and computational requirements.

 Is ConvNet Making a Comeback? Unraveling Their Performance on Web-Scale Datasets and Matching Vision Transformers

Is ConvNet Making a Comeback? Unraveling Their Performance on Web-Scale Datasets and Matching Vision Transformers

Researchers have conducted a study challenging the prevailing belief that Vision Transformers (ViTs) outperform Convolutional Neural Networks (ConvNets) when given access to large web-scale datasets. They introduce a ConvNet architecture called NFNet, which is pre-trained on a massive dataset called JFT-4B, containing approximately 4 billion labeled images from 30,000 classes. The aim of the study is to evaluate the scaling properties of NFNet models and determine how they perform in comparison to ViTs with similar computational budgets.

The Rise of ViTs and the Need for Evidence

In recent years, ViTs have gained popularity, and there is a widespread belief that they surpass ConvNets in performance, especially when dealing with large datasets. However, this belief lacks substantial evidence, as most studies have compared ViTs to weak ConvNet baselines. Additionally, ViTs have been pre-trained with significantly larger computational budgets, raising questions about the actual performance differences between these architectures.

Introducing NFNet and Evaluating Performance

ConvNets, specifically ResNets, have been the go-to choice for computer vision tasks for years. However, the rise of ViTs has led to a shift in the way performance is evaluated, with a focus on models pre-trained on large, web-scale datasets.

The researchers introduce NFNet, a ConvNet architecture, and pre-train it on the vast JFT-4B dataset without significant modifications. They examine how the performance of NFNet scales with varying computational budgets, ranging from 0.4k to 110k TPU-v4 core compute hours. Their goal is to determine if NFNet can match ViTs in terms of performance with similar computational resources.

Results and Findings

The research team trains different NFNet models with varying depths and widths on the JFT-4B dataset. They fine-tune these pre-trained models on ImageNet and observe a log-log scaling law, finding that larger computational budgets lead to better performance. They also notice that the optimal model size and epoch budget increase in tandem.

The most expensive pre-trained NFNet model, an NFNet-F7+, achieves an ImageNet Top-1 accuracy of 90.3% with 110k TPU-v4 core hours for pre-training and 1.6k TPU-v4 core hours for fine-tuning. By introducing repeated augmentation during fine-tuning, they achieve a remarkable 90.4% Top-1 accuracy. Comparatively, ViT models, which often require more substantial pre-training budgets, achieve similar performance.

Implications and Conclusion

This research challenges the prevailing belief that ViTs significantly outperform ConvNets when trained with similar computational budgets. It demonstrates that NFNet models can achieve competitive results on ImageNet, matching the performance of ViTs. The study emphasizes that compute and data availability are critical factors in model performance. While ViTs have their merits, ConvNets like NFNet remain formidable contenders, especially when trained at a large scale. This work encourages a fair and balanced evaluation of different architectures, considering both their performance and computational requirements.

For more information, you can check out the paper.

If you want to evolve your company with AI, stay competitive, and use it to your advantage, consider the practical solutions offered by Is ConvNet Making a Comeback? Unraveling Their Performance on Web-Scale Datasets and Matching Vision Transformers.

Practical AI Solutions for Middle Managers

Discover how AI can redefine your way of work with the following steps:

1. Identify Automation Opportunities: Locate key customer interaction points that can benefit from AI.
2. Define KPIs: Ensure your AI endeavors have measurable impacts on business outcomes.
3. Select an AI Solution: Choose tools that align with your needs and provide customization.
4. Implement Gradually: Start with a pilot, gather data, and expand AI usage judiciously.

For AI KPI management advice, connect with us at hello@itinai.com. And for continuous insights into leveraging AI, stay tuned on our Telegram or Twitter.

Spotlight on a Practical AI Solution

Consider the AI Sales Bot from itinai.com/aisalesbot. It is designed to automate customer engagement 24/7 and manage interactions across all customer journey stages.

Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.

List of Useful Links:

AI Products for Business or Try Custom Development

AI Sales Bot

Welcome AI Sales Bot, your 24/7 teammate! Engaging customers in natural language across all channels and learning from your materials, it’s a step towards efficient, enriched customer interactions and sales

AI Document Assistant

Unlock insights and drive decisions with our AI Insights Suite. Indexing your documents and data, it provides smart, AI-driven decision support, enhancing your productivity and decision-making.

AI Customer Support

Upgrade your support with our AI Assistant, reducing response times and personalizing interactions by analyzing documents and past engagements. Boost your team and customer satisfaction

AI Scrum Bot

Enhance agile management with our AI Scrum Bot, it helps to organize retrospectives. It answers queries and boosts collaboration and efficiency in your scrum processes.