LLMs for Everyone: Running the HuggingFace Text Generation Inference in Google Colab

The text discusses using the HuggingFace Text Generation Inference (TGI) toolkit to run large language models in a free Google Colab instance. It details the challenges of system requirements and installation, along with examples of running TGI as a web service and using different clients for interaction. Overall, the article demonstrates the feasibility and benefits of testing large language models on a budget GPU or in a free Colab instance.

 LLMs for Everyone: Running the HuggingFace Text Generation Inference in Google Colab

“`html





AI Solutions for Middle Managers

Experimenting with Large Language Models for free (Part 3)

Image by Markus Spiske, Unsplash

Text Generation Inference

Text Generation Inference (TGI) is a production-ready toolkit for deploying and serving large language models (LLMs). Running LLM as a service allows us to use it with different clients, from Python notebooks to mobile apps.

Install

Before running a Text Generation Inference, we need to install it, and the first step is to install Rust:

    
      import locale
      locale.getpreferredencoding = lambda: "UTF-8"

      !curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
      !/root/.cargo/bin/rustup component add rust-src
      !cp /root/.cargo/bin/cargo /usr/local/sbin
    
  

The command itself is self-explanatory; the tricky part here is to specify the “-y” key to start the process automatically and to copy the “cargo” module into /usr/local/sbin so the Colab instance can find it.

After that, we can download and compile the TGI itself. I will be using version 1.3.4, which is the latest at the time of writing this article:

    
      !pip install accelerate autoawq vllm==0.2.2 -U
      !wget https://github.com/huggingface/text-generation-inference/archive/refs/tags/v1.3.4.tar.gz
      !tar -xf v1.3.4.tar.gz
      !source ~/.cargo/env && cd text-generation-inference-1.3.4 && BUILD_EXTENSIONS=False make install
    
  

The process is not fast, and it takes about 10 minutes. At least this Google Colab instance is free, and it does not charge us for every minute of access.

TGI Test

When the Text Generation Inference server is running, we can try to use it. Because TGI runs as a web service, we can connect to it using different clients. For example, let’s use the Python requests library:

    
      import requests

      data = {
          'inputs': 'What is the distance to the Moon?',
          'parameters': {'max_new_tokens': 512}
      }

      response = requests.post('http://127.0.0.1:5000/generate', json=data)
      print(response.json())
    
  

We can also use an InferenceClient, made by HuggingFace:

    
      from huggingface_hub import InferenceClient

      client = InferenceClient(model="http://127.0.0.1:5000")
      client.text_generation(prompt="What is the distance to the Moon?",
                            max_new_tokens=512)
    
  

With this client, we can use streaming, so new tokens will appear one by one:

    
      from huggingface_hub import InferenceClient

      client = InferenceClient(model="http://127.0.0.1:5000")
      for token in client.text_generation(prompt="What is the distance to the Moon?",
                                        max_new_tokens=512,
                                        stream=True):
          print(token)
    
  

This will provide the answer in the “ChatGPT” style.

Conclusion

In this article, we were able to run a Text Generation Inference toolkit from 🤗 in a free Google Colab instance. This toolkit is designed to deploy and serve large language models. It is originally made for high-end hardware, and running it on a budget GPU or in a free Google Colab instance can be tricky. But as we can see, it is doable, and it is great for testing and self-education.

Those who are interested in using language models and natural language processing are also welcome to read other articles:

  • LLMs for Everyone: Running LangChain and a MistralAI 7B Model in Google Colab
  • Natural Language Processing For Absolute Beginners
  • 16, 8, and 4-bit Floating Point Formats — How Does it Work?
  • Python Data Analysis: What Do We Know About Pop Songs?

If you enjoyed this story, feel free to subscribe to Medium, and you will get notifications when my new articles will be published, as well as full access to thousands of stories from other authors. If you want to get the full source code for this and my next posts, feel free to visit my Patreon page.

Spotlight on a Practical AI Solution

Consider the AI Sales Bot from itinai.com/aisalesbot designed to automate customer engagement 24/7 and manage interactions across all customer journey stages.

Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.

Connect with Us

For AI KPI management advice, connect with us at hello@itinai.com. And for continuous insights into leveraging AI, stay tuned on our Telegram or Twitter.



“`

List of Useful Links:

AI Products for Business or Try Custom Development

AI Sales Bot

Welcome AI Sales Bot, your 24/7 teammate! Engaging customers in natural language across all channels and learning from your materials, it’s a step towards efficient, enriched customer interactions and sales

AI Document Assistant

Unlock insights and drive decisions with our AI Insights Suite. Indexing your documents and data, it provides smart, AI-driven decision support, enhancing your productivity and decision-making.

AI Customer Support

Upgrade your support with our AI Assistant, reducing response times and personalizing interactions by analyzing documents and past engagements. Boost your team and customer satisfaction

AI Scrum Bot

Enhance agile management with our AI Scrum Bot, it helps to organize retrospectives. It answers queries and boosts collaboration and efficiency in your scrum processes.