Itinai.com llm large language model graph clusters quant comp c6b83a0d 612d 42cd a727 844897af033a 1
Itinai.com llm large language model graph clusters quant comp c6b83a0d 612d 42cd a727 844897af033a 1

LLMs for Everyone: Running the HuggingFace Text Generation Inference in Google Colab

The text discusses using the HuggingFace Text Generation Inference (TGI) toolkit to run large language models in a free Google Colab instance. It details the challenges of system requirements and installation, along with examples of running TGI as a web service and using different clients for interaction. Overall, the article demonstrates the feasibility and benefits of testing large language models on a budget GPU or in a free Colab instance.

 LLMs for Everyone: Running the HuggingFace Text Generation Inference in Google Colab

“`html





AI Solutions for Middle Managers

Experimenting with Large Language Models for free (Part 3)

Image by Markus Spiske, Unsplash

Text Generation Inference

Text Generation Inference (TGI) is a production-ready toolkit for deploying and serving large language models (LLMs). Running LLM as a service allows us to use it with different clients, from Python notebooks to mobile apps.

Install

Before running a Text Generation Inference, we need to install it, and the first step is to install Rust:

    
      import locale
      locale.getpreferredencoding = lambda: "UTF-8"

      !curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
      !/root/.cargo/bin/rustup component add rust-src
      !cp /root/.cargo/bin/cargo /usr/local/sbin
    
  

The command itself is self-explanatory; the tricky part here is to specify the “-y” key to start the process automatically and to copy the “cargo” module into /usr/local/sbin so the Colab instance can find it.

After that, we can download and compile the TGI itself. I will be using version 1.3.4, which is the latest at the time of writing this article:

    
      !pip install accelerate autoawq vllm==0.2.2 -U
      !wget https://github.com/huggingface/text-generation-inference/archive/refs/tags/v1.3.4.tar.gz
      !tar -xf v1.3.4.tar.gz
      !source ~/.cargo/env && cd text-generation-inference-1.3.4 && BUILD_EXTENSIONS=False make install
    
  

The process is not fast, and it takes about 10 minutes. At least this Google Colab instance is free, and it does not charge us for every minute of access.

TGI Test

When the Text Generation Inference server is running, we can try to use it. Because TGI runs as a web service, we can connect to it using different clients. For example, let’s use the Python requests library:

    
      import requests

      data = {
          'inputs': 'What is the distance to the Moon?',
          'parameters': {'max_new_tokens': 512}
      }

      response = requests.post('http://127.0.0.1:5000/generate', json=data)
      print(response.json())
    
  

We can also use an InferenceClient, made by HuggingFace:

    
      from huggingface_hub import InferenceClient

      client = InferenceClient(model="http://127.0.0.1:5000")
      client.text_generation(prompt="What is the distance to the Moon?",
                            max_new_tokens=512)
    
  

With this client, we can use streaming, so new tokens will appear one by one:

    
      from huggingface_hub import InferenceClient

      client = InferenceClient(model="http://127.0.0.1:5000")
      for token in client.text_generation(prompt="What is the distance to the Moon?",
                                        max_new_tokens=512,
                                        stream=True):
          print(token)
    
  

This will provide the answer in the “ChatGPT” style.

Conclusion

In this article, we were able to run a Text Generation Inference toolkit from 🤗 in a free Google Colab instance. This toolkit is designed to deploy and serve large language models. It is originally made for high-end hardware, and running it on a budget GPU or in a free Google Colab instance can be tricky. But as we can see, it is doable, and it is great for testing and self-education.

Those who are interested in using language models and natural language processing are also welcome to read other articles:

  • LLMs for Everyone: Running LangChain and a MistralAI 7B Model in Google Colab
  • Natural Language Processing For Absolute Beginners
  • 16, 8, and 4-bit Floating Point Formats — How Does it Work?
  • Python Data Analysis: What Do We Know About Pop Songs?

If you enjoyed this story, feel free to subscribe to Medium, and you will get notifications when my new articles will be published, as well as full access to thousands of stories from other authors. If you want to get the full source code for this and my next posts, feel free to visit my Patreon page.

Spotlight on a Practical AI Solution

Consider the AI Sales Bot from itinai.com/aisalesbot designed to automate customer engagement 24/7 and manage interactions across all customer journey stages.

Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.

Connect with Us

For AI KPI management advice, connect with us at hello@itinai.com. And for continuous insights into leveraging AI, stay tuned on our Telegram or Twitter.



“`

List of Useful Links:

Itinai.com office ai background high tech quantum computing 0002ba7c e3d6 4fd7 abd6 cfe4e5f08aeb 0

Vladimir Dyachkov, Ph.D
Editor-in-Chief itinai.com

I believe that AI is only as powerful as the human insight guiding it.

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

  • Automation of internal processes.
  • Optimizing AI costs without huge budgets.
  • Training staff, developing custom courses for business needs
  • Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

100% of clients report increased productivity and reduced operati

AI news and solutions