Merge Large Language Models with mergekit

The text discusses different methods of merging large language models using mergekit and how to use them to create new combined models without requiring a GPU. It provides examples of configurations for four merging methods: SLERP, TIES, DARE, and Passthrough, and details the steps for implementing each method. The tutorial also explains how to use mergekit to merge and upload models to the Hugging Face Hub for further evaluation and integration.

 Merge Large Language Models with mergekit

Create your own models easily, no GPU required!

Model merging is a technique that combines two or more LLMs into a single model, without the need for a GPU. This method has proven to be effective and has produced state-of-the-art models on the Open LLM Leaderboard.

Implementing Model Merging

In this tutorial, we will implement model merging using the mergekit library by Charles Goddard. We will review four merge methods and provide examples of configurations. Then, we will use mergekit to create our own model, Marcoro14–7B-slerp, which became the best-performing model on the Open LLM Leaderboard.

Merge Algorithms

We will focus on four methods currently implemented in mergekit: SLERP, TIES, DARE, and Passthrough.

SLERP

Spherical Linear Interpolation (SLERP) is a method used to smoothly interpolate between two vectors. It maintains a constant rate of change and preserves the geometric properties of the spherical space in which the vectors reside.

TIES

TIES-Merging is designed to efficiently merge multiple task-specific models into a single multitask model. It addresses redundancy in model parameters and disagreement between parameter signs.

DARE

DARE uses an approach similar to TIES with the addition of pruning and rescaling weights.

Passthrough

The passthrough method differs significantly from the previous ones. By concatenating layers from different LLMs, it can produce models with an exotic number of parameters.

Merge Your Own Models

We will use mergekit to load a merge configuration, run it, and upload the resulting model to the Hugging Face Hub.

Conclusion

In this article, we introduced the concept of merging LLMs with four different methods. We detailed how SLERP, TIES, DARE, and passthrough work and provided examples of configurations. Finally, we ran SLERP with mergekit to create Marcoro14–7B-slerp and upload it to the Hugging Face Hub. We obtained excellent performance on two benchmark suites: Open LLM Leaderboard (best-performing 7B model) and NousResearch. If you want to create your own merges, we recommend using the automated notebook 🥱 LazyMergekit.

If you want to evolve your company with AI, stay competitive, and use for your advantage Merge Large Language Models with mergekit. If you want to learn more about machine learning and AI, follow us on Medium and Twitter @mlabonne.

For AI KPI management advice, connect with us at hello@itinai.com. And for continuous insights into leveraging AI, stay tuned on our Telegram t.me/itinainews or Twitter @itinaicom.

Spotlight on a Practical AI Solution:
Consider the AI Sales Bot from itinai.com/aisalesbot designed to automate customer engagement 24/7 and manage interactions across all customer journey stages.

List of Useful Links:

AI Products for Business or Try Custom Development

AI Sales Bot

Welcome AI Sales Bot, your 24/7 teammate! Engaging customers in natural language across all channels and learning from your materials, it’s a step towards efficient, enriched customer interactions and sales

AI Document Assistant

Unlock insights and drive decisions with our AI Insights Suite. Indexing your documents and data, it provides smart, AI-driven decision support, enhancing your productivity and decision-making.

AI Customer Support

Upgrade your support with our AI Assistant, reducing response times and personalizing interactions by analyzing documents and past engagements. Boost your team and customer satisfaction

AI Scrum Bot

Enhance agile management with our AI Scrum Bot, it helps to organize retrospectives. It answers queries and boosts collaboration and efficiency in your scrum processes.