Is ConvNet Making a Comeback? Unraveling Their Performance on Web-Scale Datasets and Matching Vision Transformers

Researchers challenge the belief that Vision Transformers (ViTs) outperform Convolutional Neural Networks (ConvNets) with large datasets. They introduce NFNet, a ConvNet architecture pre-trained on the JFT-4B dataset. NFNet performs comparably to ViTs, showing that computational resources are crucial for model performance. The study encourages fair evaluation of different architectures considering performance and computational requirements.

Is ConvNet Making a Comeback? Unraveling Their Performance on Web-Scale Datasets and Matching Vision Transformers

Researchers have conducted a study challenging the prevailing belief that Vision Transformers (ViTs) outperform Convolutional Neural Networks (ConvNets) when given access to large web-scale datasets. They introduce a ConvNet architecture called NFNet, which is pre-trained on a massive dataset called JFT-4B, containing approximately 4 billion labeled images from 30,000 classes. The aim of the study is to evaluate the scaling properties of NFNet models and determine how they perform in comparison to ViTs with similar computational budgets.

The Rise of ViTs and the Need for Evidence

In recent years, ViTs have gained popularity, and there is a widespread belief that they surpass ConvNets in performance, especially when dealing with large datasets. However, this belief lacks substantial evidence, as most studies have compared ViTs to weak ConvNet baselines. Additionally, ViTs have been pre-trained with significantly larger computational budgets, raising questions about the actual performance differences between these architectures.

Introducing NFNet and Evaluating Performance

ConvNets, specifically ResNets, have been the go-to choice for computer vision tasks for years. However, the rise of ViTs has led to a shift in the way performance is evaluated, with a focus on models pre-trained on large, web-scale datasets.

The researchers introduce NFNet, a ConvNet architecture, and pre-train it on the vast JFT-4B dataset without significant modifications. They examine how the performance of NFNet scales with varying computational budgets, ranging from 0.4k to 110k TPU-v4 core compute hours. Their goal is to determine if NFNet can match ViTs in terms of performance with similar computational resources.

Results and Findings

The research team trains different NFNet models with varying depths and widths on the JFT-4B dataset. They fine-tune these pre-trained models on ImageNet and observe a log-log scaling law, finding that larger computational budgets lead to better performance. They also notice that the optimal model size and epoch budget increase in tandem.

The most expensive pre-trained NFNet model, an NFNet-F7+, achieves an ImageNet Top-1 accuracy of 90.3% with 110k TPU-v4 core hours for pre-training and 1.6k TPU-v4 core hours for fine-tuning. By introducing repeated augmentation during fine-tuning, they achieve a remarkable 90.4% Top-1 accuracy. Comparatively, ViT models, which often require more substantial pre-training budgets, achieve similar performance.

Implications and Conclusion

This research challenges the prevailing belief that ViTs significantly outperform ConvNets when trained with similar computational budgets. It demonstrates that NFNet models can achieve competitive results on ImageNet, matching the performance of ViTs. The study emphasizes that compute and data availability are critical factors in model performance. While ViTs have their merits, ConvNets like NFNet remain formidable contenders, especially when trained at a large scale. This work encourages a fair and balanced evaluation of different architectures, considering both their performance and computational requirements.

For more information, you can check out the paper.

If you want to evolve your company with AI, stay competitive, and use it to your advantage, consider the practical solutions offered by Is ConvNet Making a Comeback? Unraveling Their Performance on Web-Scale Datasets and Matching Vision Transformers.

Practical AI Solutions for Middle Managers

Discover how AI can redefine your way of work with the following steps:

1. Identify Automation Opportunities: Locate key customer interaction points that can benefit from AI.
2. Define KPIs: Ensure your AI endeavors have measurable impacts on business outcomes.
3. Select an AI Solution: Choose tools that align with your needs and provide customization.
4. Implement Gradually: Start with a pilot, gather data, and expand AI usage judiciously.

For AI KPI management advice, connect with us at hello@itinai.com. And for continuous insights into leveraging AI, stay tuned on our Telegram or Twitter.

Spotlight on a Practical AI Solution

Consider the AI Sales Bot from itinai.com/aisalesbot. It is designed to automate customer engagement 24/7 and manage interactions across all customer journey stages.

Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.

List of Useful Links:

AI Lab in Telegram @aiscrumbot – free consultation

Is ConvNet Making a Comeback? Unraveling Their Performance on Web-Scale Datasets and Matching Vision Transformers

MarkTechPost

Twitter – @itinaicom

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Automation of internal processes.
Optimizing AI costs without huge budgets.
Training staff, developing custom courses for business needs
Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

Get a plan to reduce routine and improve metrics

100% of clients report increased productivity and reduced operati

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…

Itinai.com httpss.mj.runmrqch2uvtvo russian handsome charisma 9fdbb2d5 a55b 425d 8f3b 76d26f86710f 2

AI Business Accelerator

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

Have an audience (even 500+ followers in Instagram, email, etc.)
Have an idea, service, or product you want to scale
Can invest 2–3 hours a day
You’re motivated to earn with AI but don’t want to handle technical setup

AI news and solutions

Meet MFLES: A Python Library Designed to Enhance Forecasting Accuracy in the Face of Multiple Seasonality Challenges

The MFLES Python library enhances forecasting accuracy by recognizing and decomposing multiple seasonal patterns in data, providing conformal prediction intervals and optimizing parameters. Its superiority in benchmarks suggests it as a sophisticated and reliable tool for…

AI Tech News
Cache-Augmented Generation: Leveraging Extended Context Windows in Large Language Models for Retrieval-Free Response Generation

Enhancing Large Language Models with Cache-Augmented Generation Overview of Cache-Augmented Generation (CAG) Large language models (LLMs) have improved with a method called retrieval-augmented generation (RAG), which uses external knowledge to enhance responses. However, RAG has challenges…

AI Tech News
Agile Alliance New Zealand: Who we are and where we’re going

Agile Alliance New Zealand, established in 2016, is a volunteer-led society aimed at promoting Agility across industries and assisting local Agile communities in adapting to changing practices. The organization’s focus is on fostering Agility and supporting…

Scrum Agile News
NASA and IBM Researchers Introduce INDUS: A Suite of Domain-Specific Large Language Models (LLMs) for Advanced Scientific Research

Introducing INDUS: Domain-Specific Large Language Models (LLMs) for Advanced Scientific Research Practical Solutions and Value Large Language Models (LLMs) like INDUS, trained on specialized corpora, excel in natural language understanding and generation for scientific domains such…

AI Tech News
Google reveals Lumiere, a text-to-video diffusion model

Google Research has introduced Lumiere, a revolutionary text-to-video diffusion model. It can generate realistic videos from text or image inputs, outperforming other models in motion coherence and visual consistency. Lumiere offers various features including text-to-video, image-to-video,…

AI Tech News
ProteinZen: An All-Atom Protein Structure Generation Method Using Machine Learning

ProteinZen: A New Approach to All-Atom Protein Structure Generation The Challenge Generating accurate all-atom protein structures is a complex task in protein design. While current models have improved in creating backbone structures, they struggle to achieve…

AI Tech News
Microsoft AI Research Introduces OLA-VLM: A Vision-Centric Approach to Optimizing Multimodal Large Language Models

Advancements in Multimodal Large Language Models (MLLMs) Understanding MLLMs Multimodal large language models (MLLMs) are rapidly evolving technology that allows machines to understand both text and images at the same time. This capability is transforming fields…

AI Tech News
Understanding Deep Learning Optimizers: Momentum, AdaGrad, RMSProp & Adam

Accelerating training techniques in neural networks is crucial due to the complex nature of deep learning models with millions of parameters. Optimization algorithms such as Momentum, AdaGrad, RMSProp, and Adam address slow convergence and varying gradients,…

AI Tech News
Prometheus-Eval and Prometheus 2: Setting New Standards in LLM Evaluation and Open-Source Innovation with State-of-the-art Evaluator Language Model

Prometheus-Eval & Prometheus 2: Advancing NLP Evaluation Overview In natural language processing (NLP), the need to enhance language models’ capabilities for text generation, translation, and sentiment analysis is crucial. Prometheus-Eval and Prometheus 2 provide advanced evaluation…

AI Tech News
Efficient Context Management for LLMs: A Coding Tutorial on Model Context Protocol

Model Context Protocol: Enhancing AI Interactions Model Context Protocol: Enhancing AI Interactions Introduction Effectively managing context is essential when utilizing large language models (LLMs), particularly in resource-constrained environments like Google Colab. This guide presents a practical…

AI Tech News
This AI Paper from Harvard Introduces Q-Probing: A New Frontier in Machine Learning for Adapting Pre-Trained Language Models

Q-Probe, a new method from Harvard, efficiently adapts pre-trained language models for specific tasks. It balances between extensive finetuning and simple prompting, reducing computational overhead while maintaining model adaptability. Showing promise in various domains, it outperforms…

AI Tech News
Google’s Gemini AI is going to surpass ChatGPT

Gemini AI, an advanced NLP model, is designed to exceed current benchmarks due to its multimodal capabilities, scalability, and potential for integration with Google’s ecosystem, marking a substantial advancement in AI technology.

AI Tech News
ByteDance Introduces VGR: A Groundbreaking MLLM for Enhanced Visual Reasoning

Understanding the Target Audience The research on the Visual Grounded Reasoning (VGR) model primarily targets AI researchers, technology business leaders, data scientists, and machine learning professionals. These individuals are keen on advancing AI capabilities, particularly in…

AI Tech News
Apple is Planning a Revolutionary AI Leap: In Talks to Integrate Google’s Gemini Engine into iPhones

Apple is exploring a partnership with Google to bring Gemini AI to the iPhone, potentially revolutionizing smartphone capabilities. This move signals Apple’s commitment to staying at the forefront of the AI revolution, with a focus on…

AI Tech News
What If Game Engines Could Run on Neural Networks? This AI Paper from Google Unveils GameNGen and Explores How Diffusion Models Are Revolutionizing Real-Time Gaming

Revolutionizing Real-Time Gaming with GameNGen A significant challenge in AI-driven game simulation is the ability to accurately simulate complex, real-time interactive environments using neural models. Traditional game engines rely on manually crafted loops that gather user…

AI Tech News
LMMS-EVAL: A Unified and Standardized Multimodal AI Benchmark Framework for Transparent and Reproducible Evaluations

Practical AI Solutions for Your Business LMMS-EVAL: A Unified and Standardized Multimodal AI Benchmark Framework Fundamental Large Language Models (LLMs) like GPT-4, Gemini, and Claude have shown remarkable capabilities, rivaling or surpassing human performance. To address…

AI Tech News
SAS Viya vs H2O.ai: Accelerate Data-Driven Product Decisions

Technical Relevance: Why SAS Viya is Important for Modern Development Workflows In today’s fast-paced business environment, industries such as finance and healthcare are increasingly relying on data-driven decisions to enhance operational efficiency and profitability. SAS Viya…

Tools
Microsoft Introduces Multilingual E5 Text Embedding: A Step Towards Multilingual Processing Excellence

Microsoft has introduced the multilingual E5 text embedding models, addressing the challenge of developing NLP models that can perform well across different languages. They utilize a two-stage training process and show exceptional performance across multiple languages…

AI Tech News
G-Retriever: Advancing Real-World Graph Question Answering with RAG and LLMs

Advancing Real-World Graph Question Answering with G-Retriever Practical Solutions and Value Large Language Models (LLMs) have made significant strides in artificial intelligence, but their ability to process complex structured data, particularly graphs, remains challenging. In our…

AI Tech News
Roadmap for Transitioning to Data Analytics

To transition to data analytics from another field, pursue relevant education or training, gain practical experience, and engage with the data science community through platforms like Towards Data Science.

AI Tech News