Researchers from the University of Washington and Google have Developed Distilling Step-by-Step Technology to Train a Dedicated Small Machine Learning Model with Less Data

Researchers from the University of Washington and Google have developed a new technology called “Distilling Step-by-Step” to train small machine learning models with less data. This approach involves extracting informative natural language rationales from large language models and using them as additional supervision during training. The method showed significant performance gains with reduced data requirements, making advanced language models more accessible for various applications.

Review: Distilling Step-by-Step Technology for Training Small Machine Learning Models

In recent years, large language models (LLMs) have revolutionized the field of natural language processing, enabling unprecedented zero-shot and few-shot learning capabilities. However, their deployment in real-world applications has been hindered by their immense computational demands. A single 175 billion parameter LLM necessitates a staggering 350GB of GPU memory and specialized infrastructure. With today’s state-of-the-art models boasting over 500 billion parameters, these requirements render LLMs inaccessible to many research teams, particularly those with low-latency performance needs.

To address this deployment challenge, researchers have turned to smaller specialized models, trained through either fine-tuning or distillation. Fine-tuning, while effective, relies on costly and time-consuming human-generated labels. Distillation, on the other hand, demands copious amounts of unlabeled data, which can be difficult to obtain.

In a groundbreaking study by a research team from Google and the University of Washington presented at ACL2023, the authors introduced “Distilling Step-by-Step,” a novel mechanism designed to mitigate the trade-off between model size and the cost of data collection. This innovative approach hinges on extracting informative natural language rationales, or intermediate reasoning steps, from LLMs. These rationales serve as additional, richer supervision in training smaller task-specific models alongside standard task labels.

The researchers outline a two-stage process for implementing Distilling Step-by-Step. First, they employ CoT prompting to extract rationales from an LLM, enabling the model to generate rationales for unseen inputs. Subsequently, these rationales are integrated into the training of small models using a multi-task learning framework, with task prefixes guiding the model’s differentiation between label prediction and rationale generation.

In a series of experiments, a 540B parameter LLM was utilized, along with T5 models for task-specific downstream tasks. Distilling Step-by-Step exhibited remarkable performance gains with significantly reduced data requirements. For instance, on the e-SNLI dataset, the method outperformed standard fine-tuning with just 12.5% of the full dataset. Similar reductions in dataset size were observed across various NLP tasks, including ANLI, CQA, and SVAMP.

Furthermore, Distilling Step-by-Step achieved superior performance using considerably smaller model sizes compared to few-shot CoT-prompted LLMs. For instance, on the e-SNLI dataset, a 220M T5 model surpassed the performance of a 540B PaLM. On ANLI, a 770M T5 model outperformed a 540B PaLM by over 700 times, demonstrating the immense potential for efficiency gains.

Notably, Distilling Step-by-Step showcased its ability to outperform few-shot LLMs using significantly smaller models and less data. For instance, on ANLI, a 770M T5 model surpassed the performance of a 540B PaLM using only 80% of the full dataset, a feat unattainable through standard fine-tuning.

In conclusion, Distilling Step-by-Step presents a groundbreaking paradigm for training small, task-specific models. By extracting rationales from LLMs, this approach not only reduces the data required for model training but also enables the use of significantly smaller models. This innovative technique stands to revolutionize the field of natural language processing, making advanced language models more accessible and practical for a broader range of applications.

Check out the Paper and Google AI Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Action Items:

1. Research and analyze the “Distilling Step-by-Step” technology developed by researchers from the University of Washington and Google.
2. Identify potential applications and benefits of the “Distilling Step-by-Step” approach in real-world scenarios.
3. Explore the feasibility of implementing the “Distilling Step-by-Step” technique within our organization.
4. Investigate the requirements and resources needed for training small task-specific models using the distillation approach.
5. Compare the performance and efficiency of the “Distilling Step-by-Step” technique with other existing methods in the field of natural language processing.
6. Share the research findings and insights with relevant stakeholders within the organization.
7. Consider the potential collaboration with the research team at the University of Washington and Google to further explore the application of the “Distilling Step-by-Step” technique.
8. Stay updated with the latest advancements in machine learning and natural language processing by subscribing to the MarkTechPost newsletter and joining the ML subreddit, Facebook community, and Discord channel.

Please assign owners to these action items based on the relevant individuals or teams within our organization.

List of Useful Links:

AI Scrum Bot – ask about AI scrum and agile

Researchers from the University of Washington and Google have Developed Distilling Step-by-Step Technology to Train a Dedicated Small Machine Learning Model with Less Data

MarkTechPost

Twitter – @itinaicom

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Automation of internal processes.
Optimizing AI costs without huge budgets.
Training staff, developing custom courses for business needs
Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

Get a plan to reduce routine and improve metrics

100% of clients report increased productivity and reduced operati

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…

Itinai.com httpss.mj.runmrqch2uvtvo russian handsome charisma 9fdbb2d5 a55b 425d 8f3b 76d26f86710f 2

AI Business Accelerator

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

Have an audience (even 500+ followers in Instagram, email, etc.)
Have an idea, service, or product you want to scale
Can invest 2–3 hours a day
You’re motivated to earn with AI but don’t want to handle technical setup

AI news and solutions

Shanghai AI Lab Releases OREAL-7B and OREAL-32B: Advancing Mathematical Reasoning with Outcome Reward-Based Reinforcement Learning

Mathematical Reasoning in AI: New Solutions from Shanghai AI Laboratory Understanding the Challenges Mathematical reasoning is a complex area for artificial intelligence (AI). While large language models (LLMs) have improved, they often struggle with tasks that…

AI Tech News
Advancing Parallel Programming with HPC-INSTRUCT: Optimizing Code LLMs for High-Performance Computing

Revolutionizing Software Development with LLMs Large Language Models (LLMs) have transformed how software is developed by automating coding tasks. They help bridge the gap between natural language and programming languages. However, they face challenges in specialized…

AI Tech News
Researchers from Stanford and Amazon Developed STARK: A Large-Scale Semi-Structure Retrieval AI Benchmark on Textual and Relational Knowledge Bases

STARK: A Large-Scale Semi-Structure Retrieval AI Benchmark Researchers from Stanford and Amazon have developed STARK, a benchmark for advanced retrieval systems on textual and relational knowledge bases. This AI solution addresses the challenge of understanding complex,…

AI Tech News
Exploring Adaptivity in AI: A Deep Dive into ALAMA’s Mechanisms

Understanding Language Agents and Their Evolution Language Agents (LAs) are gaining attention due to advancements in large language models (LLMs). These models excel at understanding and generating human-like text, performing various tasks with high accuracy. Limitations…

AI Tech News
Introducing the Agile Alliance Annual Partner Program

Agile Alliance introduces the Agile Alliance Official Partner program, offering a heightened level of engagement beyond event sponsorship. This program promises a new and exciting opportunity for partners. [Total words: 35]

Scrum Agile News
Claude Engineer: An Interactive Command-Line Interface (CLI) that Leverages the Power of Anthropic’s Claude-3.5-Sonnet Model to Assist with Software Development Tasks

Introducing Claude Engineer: Simplifying Software Development with AI Software development can be complex and time-consuming, often leading to challenges in managing project structures, file operations, and code quality. This can hinder innovation and development. Practical Solutions…

AI Tech News
Learn AI for Free: 10 Best AI Courses to Take Right Now (2023)

Artificial intelligence (AI) is revolutionizing various industries and daily life. Learning about AI is essential for professionals in many fields, and luckily, there are free resources available online. This article presents the top five free AI…

AI Tech News
Embodied Agent Interface: An AI Framework for Benchmarking Large Language Models (LLMs) for Embodied Decision Making

Understanding Large Language Models (LLMs) Large Language Models (LLMs) are powerful tools, but we need to evaluate them based on their ability to make decisions in real or digital environments. Current research shows that there is…

AI Tech News
Microsoft Creates Custom AI Chips

Microsoft has introduced two new chips, the Azure Maia AI Accelerator and the Azure Cobalt CPU, as part of its efforts to enhance AI infrastructure. The chips have been carefully designed to cater to the growing…

AI Tech News
“Unlocking Multimodal Reasoning: VL-Cogito’s Progressive Curriculum Reinforcement Learning”

Understanding the Target Audience The primary audience for VL-Cogito consists of AI researchers, technology business leaders, and educators keen on the advancements in multimodal reasoning and reinforcement learning. These individuals often face challenges when integrating diverse…

AI Tech News
Arcee AI Introduces Arcee-Nova: A New Open-Sourced Language Model based on Qwen2-72B and Approaches GPT-4 Performance Level

Arcee AI Introduces Arcee-Nova: A New Open-Sourced Language Model based on Qwen2-72B and Approaches GPT-4 Performance Level Practical Solutions and Value Arcee-Nova, a groundbreaking open-source AI, excels in various domains and offers advanced capabilities, rivaling some…

AI Tech News
The Creative, Occasionally Messy World of Textual Data

This article discusses the emergence of large language models in the field of natural language processing (NLP) and the innovative ways in which they are being used. It highlights various applications such as text-to-image and text-to-speech,…

AI Tech News
LLMs improve when assuming gender-neutral or male roles

The University of Michigan researchers found that prompting Large Language Models (LLMs) with gender-neutral or male roles led to better responses. They experimented with different role prompts using open-source models and discovered that specifying roles can…

AI Tech News
LG AI Research Open-Sources EXAONE 3.0: A 7.8B Bilingual Language Model Excelling in English and Korean with Top Performance in Real-World Applications and Complex Reasoning

Introduction to EXAONE 3.0: The Vision and Objectives EXAONE 3.0 is a significant advancement in LG AI Research’s language models, designed to democratize access to expert-level AI capabilities. Its release marked the introduction of the EXAONE…

AI Tech News
DeepSeek AI Releases JanusFlow: A Unified Framework for Image Understanding and Generation

AI-Driven Image Generation and Understanding The AI field for image generation and understanding is advancing quickly, but there are still major challenges. Models that are good at understanding images often do not produce high-quality images, and…

AI Tech News
Reimagine Agile: Back to Basics, Forward to the Future

Agile Alliance is encouraging people to participate in reimagining and updating the Agile approach. They are inviting individuals to join their efforts in modernizing and reshaping the future of Agile. The initiative is discussed in the…

Scrum Agile News
Researchers from AWS AI Labs and USC Propose DeAL: A Machine Learning Framework that Allows the User to Customize Reward Functions and Enables Decoding-Time Alignment of LLMs

Researchers from AWS AI Labs and USC have introduced DeAL (Decoding-time Alignment for Large Language Models), a framework that allows customized reward functions during the decoding stage, enhancing alignment with specific user objectives. DeAL’s versatility and…

AI Tech News
AI’s Proactive Role in Outsmarting Corruption in Government

Synthetic data and generative AI, specifically Generative Adversarial Networks (GANs), can be used to address government corruption and systemic bias. AI systems trained on synthetic data can identify patterns of corruption and detect suspicious behavior. GANs…

AI Tech News
A New Study by OpenAI Explores How Users’ Names can Impact ChatGPT’s Responses

Addressing Bias in AI Chatbots Bias in AI systems, especially chatbots, is a significant issue as they become more common in our lives. One major concern is that chatbots may respond differently based on users’ names,…

AI Tech News
Revolutionizing Language Model Fine-Tuning: Achieving Unprecedented Gains with NEFTune’s Noisy Embeddings

The NEFTune method is proposed as a way to improve the performance of language models on instruction-based tasks. By adding random noise to the embedding vectors during fine-tuning, the model’s performance is significantly enhanced without needing…

AI Tech News

Researchers from the University of Washington and Google have Developed Distilling Step-by-Step Technology to Train a Dedicated Small Machine Learning Model with Less Data

Review: Distilling Step-by-Step Technology for Training Small Machine Learning Models

List of Useful Links:

AI Scrum Bot – ask about AI scrum and agile

Researchers from the University of Washington and Google have Developed Distilling Step-by-Step Technology to Train a Dedicated Small Machine Learning Model with Less Data

MarkTechPost

Twitter – @itinaicom

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

AI news and solutions

Shanghai AI Lab Releases OREAL-7B and OREAL-32B: Advancing Mathematical Reasoning with Outcome Reward-Based Reinforcement Learning

Advancing Parallel Programming with HPC-INSTRUCT: Optimizing Code LLMs for High-Performance Computing

Researchers from Stanford and Amazon Developed STARK: A Large-Scale Semi-Structure Retrieval AI Benchmark on Textual and Relational Knowledge Bases

Exploring Adaptivity in AI: A Deep Dive into ALAMA’s Mechanisms

Introducing the Agile Alliance Annual Partner Program

Claude Engineer: An Interactive Command-Line Interface (CLI) that Leverages the Power of Anthropic’s Claude-3.5-Sonnet Model to Assist with Software Development Tasks

Learn AI for Free: 10 Best AI Courses to Take Right Now (2023)

Embodied Agent Interface: An AI Framework for Benchmarking Large Language Models (LLMs) for Embodied Decision Making

Microsoft Creates Custom AI Chips

“Unlocking Multimodal Reasoning: VL-Cogito’s Progressive Curriculum Reinforcement Learning”

Arcee AI Introduces Arcee-Nova: A New Open-Sourced Language Model based on Qwen2-72B and Approaches GPT-4 Performance Level

The Creative, Occasionally Messy World of Textual Data

LLMs improve when assuming gender-neutral or male roles

LG AI Research Open-Sources EXAONE 3.0: A 7.8B Bilingual Language Model Excelling in English and Korean with Top Performance in Real-World Applications and Complex Reasoning

DeepSeek AI Releases JanusFlow: A Unified Framework for Image Understanding and Generation

Reimagine Agile: Back to Basics, Forward to the Future

Researchers from AWS AI Labs and USC Propose DeAL: A Machine Learning Framework that Allows the User to Customize Reward Functions and Enables Decoding-Time Alignment of LLMs

AI’s Proactive Role in Outsmarting Corruption in Government

A New Study by OpenAI Explores How Users’ Names can Impact ChatGPT’s Responses

Revolutionizing Language Model Fine-Tuning: Achieving Unprecedented Gains with NEFTune’s Noisy Embeddings

Comment Policy

Availability

Sitemap, API and other feed

Editorial Policy

Vacancies

Copyright

Researchers from the University of Washington and Google have Developed Distilling Step-by-Step Technology to Train a Dedicated Small Machine Learning Model with Less Data

Review: Distilling Step-by-Step Technology for Training Small Machine Learning Models

List of Useful Links:

AI Scrum Bot – ask about AI scrum and agile Researchers from the University of Washington and Google have Developed Distilling Step-by-Step Technology to Train a Dedicated Small Machine Learning Model with Less Data MarkTechPost Twitter – @itinaicom

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

AI news and solutions

AI Scrum Bot – ask about AI scrum and agile

Researchers from the University of Washington and Google have Developed Distilling Step-by-Step Technology to Train a Dedicated Small Machine Learning Model with Less Data

MarkTechPost

Twitter – @itinaicom