Researchers from the University of Washington and Google have developed a new technology called “Distilling Step-by-Step” to train small machine learning models with less data. This approach involves extracting informative natural language rationales from large language models and using them as additional supervision during training. The method showed significant performance gains with reduced data requirements, making advanced language models more accessible for various applications.
Review: Distilling Step-by-Step Technology for Training Small Machine Learning Models
In recent years, large language models (LLMs) have revolutionized the field of natural language processing, enabling unprecedented zero-shot and few-shot learning capabilities. However, their deployment in real-world applications has been hindered by their immense computational demands. A single 175 billion parameter LLM necessitates a staggering 350GB of GPU memory and specialized infrastructure. With today’s state-of-the-art models boasting over 500 billion parameters, these requirements render LLMs inaccessible to many research teams, particularly those with low-latency performance needs.
To address this deployment challenge, researchers have turned to smaller specialized models, trained through either fine-tuning or distillation. Fine-tuning, while effective, relies on costly and time-consuming human-generated labels. Distillation, on the other hand, demands copious amounts of unlabeled data, which can be difficult to obtain.
In a groundbreaking study by a research team from Google and the University of Washington presented at ACL2023, the authors introduced “Distilling Step-by-Step,” a novel mechanism designed to mitigate the trade-off between model size and the cost of data collection. This innovative approach hinges on extracting informative natural language rationales, or intermediate reasoning steps, from LLMs. These rationales serve as additional, richer supervision in training smaller task-specific models alongside standard task labels.
The researchers outline a two-stage process for implementing Distilling Step-by-Step. First, they employ CoT prompting to extract rationales from an LLM, enabling the model to generate rationales for unseen inputs. Subsequently, these rationales are integrated into the training of small models using a multi-task learning framework, with task prefixes guiding the model’s differentiation between label prediction and rationale generation.
In a series of experiments, a 540B parameter LLM was utilized, along with T5 models for task-specific downstream tasks. Distilling Step-by-Step exhibited remarkable performance gains with significantly reduced data requirements. For instance, on the e-SNLI dataset, the method outperformed standard fine-tuning with just 12.5% of the full dataset. Similar reductions in dataset size were observed across various NLP tasks, including ANLI, CQA, and SVAMP.
Furthermore, Distilling Step-by-Step achieved superior performance using considerably smaller model sizes compared to few-shot CoT-prompted LLMs. For instance, on the e-SNLI dataset, a 220M T5 model surpassed the performance of a 540B PaLM. On ANLI, a 770M T5 model outperformed a 540B PaLM by over 700 times, demonstrating the immense potential for efficiency gains.
Notably, Distilling Step-by-Step showcased its ability to outperform few-shot LLMs using significantly smaller models and less data. For instance, on ANLI, a 770M T5 model surpassed the performance of a 540B PaLM using only 80% of the full dataset, a feat unattainable through standard fine-tuning.
In conclusion, Distilling Step-by-Step presents a groundbreaking paradigm for training small, task-specific models. By extracting rationales from LLMs, this approach not only reduces the data required for model training but also enables the use of significantly smaller models. This innovative technique stands to revolutionize the field of natural language processing, making advanced language models more accessible and practical for a broader range of applications.
Check out the Paper and Google AI Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
Action Items:
1. Research and analyze the “Distilling Step-by-Step” technology developed by researchers from the University of Washington and Google.
2. Identify potential applications and benefits of the “Distilling Step-by-Step” approach in real-world scenarios.
3. Explore the feasibility of implementing the “Distilling Step-by-Step” technique within our organization.
4. Investigate the requirements and resources needed for training small task-specific models using the distillation approach.
5. Compare the performance and efficiency of the “Distilling Step-by-Step” technique with other existing methods in the field of natural language processing.
6. Share the research findings and insights with relevant stakeholders within the organization.
7. Consider the potential collaboration with the research team at the University of Washington and Google to further explore the application of the “Distilling Step-by-Step” technique.
8. Stay updated with the latest advancements in machine learning and natural language processing by subscribing to the MarkTechPost newsletter and joining the ML subreddit, Facebook community, and Discord channel.Please assign owners to these action items based on the relevant individuals or teams within our organization.