The text discusses using the HuggingFace Text Generation Inference (TGI) toolkit to run large language models in a free Google Colab instance. It details the challenges of system requirements and installation, along with examples of running TGI as a web service and using different clients for interaction. Overall, the article demonstrates the feasibility and benefits of testing large language models on a budget GPU or in a free Colab instance.
“`html
Experimenting with Large Language Models for free (Part 3)
Text Generation Inference
Text Generation Inference (TGI) is a production-ready toolkit for deploying and serving large language models (LLMs). Running LLM as a service allows us to use it with different clients, from Python notebooks to mobile apps.
Install
Before running a Text Generation Inference, we need to install it, and the first step is to install Rust:
import locale
locale.getpreferredencoding = lambda: "UTF-8"
!curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
!/root/.cargo/bin/rustup component add rust-src
!cp /root/.cargo/bin/cargo /usr/local/sbin
The command itself is self-explanatory; the tricky part here is to specify the “-y” key to start the process automatically and to copy the “cargo” module into /usr/local/sbin so the Colab instance can find it.
After that, we can download and compile the TGI itself. I will be using version 1.3.4, which is the latest at the time of writing this article:
!pip install accelerate autoawq vllm==0.2.2 -U
!wget https://github.com/huggingface/text-generation-inference/archive/refs/tags/v1.3.4.tar.gz
!tar -xf v1.3.4.tar.gz
!source ~/.cargo/env && cd text-generation-inference-1.3.4 && BUILD_EXTENSIONS=False make install
The process is not fast, and it takes about 10 minutes. At least this Google Colab instance is free, and it does not charge us for every minute of access.
TGI Test
When the Text Generation Inference server is running, we can try to use it. Because TGI runs as a web service, we can connect to it using different clients. For example, let’s use the Python requests library:
import requests
data = {
'inputs': 'What is the distance to the Moon?',
'parameters': {'max_new_tokens': 512}
}
response = requests.post('http://127.0.0.1:5000/generate', json=data)
print(response.json())
We can also use an InferenceClient, made by HuggingFace:
from huggingface_hub import InferenceClient
client = InferenceClient(model="http://127.0.0.1:5000")
client.text_generation(prompt="What is the distance to the Moon?",
max_new_tokens=512)
With this client, we can use streaming, so new tokens will appear one by one:
from huggingface_hub import InferenceClient
client = InferenceClient(model="http://127.0.0.1:5000")
for token in client.text_generation(prompt="What is the distance to the Moon?",
max_new_tokens=512,
stream=True):
print(token)
This will provide the answer in the “ChatGPT” style.
Conclusion
In this article, we were able to run a Text Generation Inference toolkit from 🤗 in a free Google Colab instance. This toolkit is designed to deploy and serve large language models. It is originally made for high-end hardware, and running it on a budget GPU or in a free Google Colab instance can be tricky. But as we can see, it is doable, and it is great for testing and self-education.
Those who are interested in using language models and natural language processing are also welcome to read other articles:
- LLMs for Everyone: Running LangChain and a MistralAI 7B Model in Google Colab
- Natural Language Processing For Absolute Beginners
- 16, 8, and 4-bit Floating Point Formats — How Does it Work?
- Python Data Analysis: What Do We Know About Pop Songs?
If you enjoyed this story, feel free to subscribe to Medium, and you will get notifications when my new articles will be published, as well as full access to thousands of stories from other authors. If you want to get the full source code for this and my next posts, feel free to visit my Patreon page.
Spotlight on a Practical AI Solution
Consider the AI Sales Bot from itinai.com/aisalesbot designed to automate customer engagement 24/7 and manage interactions across all customer journey stages.
Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.
Connect with Us
For AI KPI management advice, connect with us at hello@itinai.com. And for continuous insights into leveraging AI, stay tuned on our Telegram or Twitter.
“`