LLMs for Everyone: Running the HuggingFace Text Generation Inference in Google Colab

The text discusses using the HuggingFace Text Generation Inference (TGI) toolkit to run large language models in a free Google Colab instance. It details the challenges of system requirements and installation, along with examples of running TGI as a web service and using different clients for interaction. Overall, the article demonstrates the feasibility and benefits of testing large language models on a budget GPU or in a free Colab instance.

“`html

AI Solutions for Middle Managers

Experimenting with Large Language Models for free (Part 3)

Image by Markus Spiske, Unsplash

Text Generation Inference

Text Generation Inference (TGI) is a production-ready toolkit for deploying and serving large language models (LLMs). Running LLM as a service allows us to use it with different clients, from Python notebooks to mobile apps.

Install

Before running a Text Generation Inference, we need to install it, and the first step is to install Rust:

    
      import locale
      locale.getpreferredencoding = lambda: "UTF-8"

      !curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
      !/root/.cargo/bin/rustup component add rust-src
      !cp /root/.cargo/bin/cargo /usr/local/sbin

The command itself is self-explanatory; the tricky part here is to specify the “-y” key to start the process automatically and to copy the “cargo” module into /usr/local/sbin so the Colab instance can find it.

After that, we can download and compile the TGI itself. I will be using version 1.3.4, which is the latest at the time of writing this article:

    
      !pip install accelerate autoawq vllm==0.2.2 -U
      !wget https://github.com/huggingface/text-generation-inference/archive/refs/tags/v1.3.4.tar.gz
      !tar -xf v1.3.4.tar.gz
      !source ~/.cargo/env && cd text-generation-inference-1.3.4 && BUILD_EXTENSIONS=False make install

The process is not fast, and it takes about 10 minutes. At least this Google Colab instance is free, and it does not charge us for every minute of access.

TGI Test

When the Text Generation Inference server is running, we can try to use it. Because TGI runs as a web service, we can connect to it using different clients. For example, let’s use the Python requests library:

    
      import requests

      data = {
          'inputs': 'What is the distance to the Moon?',
          'parameters': {'max_new_tokens': 512}
      }

      response = requests.post('http://127.0.0.1:5000/generate', json=data)
      print(response.json())

We can also use an InferenceClient, made by HuggingFace:

    
      from huggingface_hub import InferenceClient

      client = InferenceClient(model="http://127.0.0.1:5000")
      client.text_generation(prompt="What is the distance to the Moon?",
                            max_new_tokens=512)

With this client, we can use streaming, so new tokens will appear one by one:

    
      from huggingface_hub import InferenceClient

      client = InferenceClient(model="http://127.0.0.1:5000")
      for token in client.text_generation(prompt="What is the distance to the Moon?",
                                        max_new_tokens=512,
                                        stream=True):
          print(token)

This will provide the answer in the “ChatGPT” style.

Conclusion

In this article, we were able to run a Text Generation Inference toolkit from 🤗 in a free Google Colab instance. This toolkit is designed to deploy and serve large language models. It is originally made for high-end hardware, and running it on a budget GPU or in a free Google Colab instance can be tricky. But as we can see, it is doable, and it is great for testing and self-education.

Those who are interested in using language models and natural language processing are also welcome to read other articles:

LLMs for Everyone: Running LangChain and a MistralAI 7B Model in Google Colab
Natural Language Processing For Absolute Beginners
16, 8, and 4-bit Floating Point Formats — How Does it Work?
Python Data Analysis: What Do We Know About Pop Songs?

If you enjoyed this story, feel free to subscribe to Medium, and you will get notifications when my new articles will be published, as well as full access to thousands of stories from other authors. If you want to get the full source code for this and my next posts, feel free to visit my Patreon page.

Spotlight on a Practical AI Solution

Consider the AI Sales Bot from itinai.com/aisalesbot designed to automate customer engagement 24/7 and manage interactions across all customer journey stages.

Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.

Connect with Us

For AI KPI management advice, connect with us at hello@itinai.com. And for continuous insights into leveraging AI, stay tuned on our Telegram or Twitter.

“`

List of Useful Links:

AI Lab in Telegram @aiscrumbot – free consultation

LLMs for Everyone: Running the HuggingFace Text Generation Inference in Google Colab

Towards Data Science – Medium

Twitter – @itinaicom

Unleash Your Creative Potential with AI Agents

Competitors are already using AI Agents

Business Problems We Solve

Automation of internal processes.
Optimizing AI costs without huge budgets.
Training staff, developing custom courses for business needs
Integrating AI into client work, automating first lines of contact

Large and Medium Businesses

Startups

Offline Business

Get a plan to reduce routine and improve metrics

100% of clients report increased productivity and reduced operati

AI Agents

Localization Project Manager – Coordinating translation workflows, answering vendor or process-related questions.

Job Title: Localization Project Manager Overview The Localization Project Manager plays a vital role in coordinating translation workflows while addressing vendor and process-related queries. This position is crucial for ensuring that translation projects are executed efficiently…
AI Agents

Environmental Health & Safety Officer – Answering compliance-related questions, retrieving safety protocols or audit histories.

Professional Summary The AI-driven Environmental Health & Safety Officer is a reliable and effective digital team member that performs repetitive and time-consuming tasks with remarkable speed, accuracy, and stability. By automating these tasks, it frees up…
AI Agents

Legal Contract Reviewer – Auto-flagging clause inconsistencies or retrieving precedent cases for review.

Job Title: Legal Contract Reviewer – Auto-flagging Clause Inconsistencies or Retrieving Precedent Cases for Review The AI functions as a reliable and effective digital team member that excels in performing repetitive and time-consuming tasks. With remarkable…
AI Agents

Customer Retention Analyst – Creating customer summaries, identifying churn risk patterns, and suggesting retention steps.

Customer Retention Analyst Professional Summary A highly analytical and detail-oriented Customer Retention Analyst with a proven track record in creating comprehensive customer summaries, identifying churn risk patterns, and suggesting effective retention strategies. Adept at leveraging data-driven…

Itinai.com httpss.mj.runmrqch2uvtvo russian handsome charisma 9fdbb2d5 a55b 425d 8f3b 76d26f86710f 2

AI Business Accelerator

Start Your AI Business in Just a Week with itinai.com

You’re a great fit if you:

Have an audience (even 500+ followers in Instagram, email, etc.)
Have an idea, service, or product you want to scale
Can invest 2–3 hours a day
You’re motivated to earn with AI but don’t want to handle technical setup

AI news and solutions

2023 in Review: Recapping the Post-ChatGPT Era and What to Expect for 2024

The year 2023 saw significant developments in the Generative AI landscape, marked by the release of multiple LLMs and the emergence of LLMOps. While there were challenges in production, it was a year of experimentation and…

AI Tech News
Agentic-RAG: A Hierarchical Multi-Agent Framework for Enhanced Time Series Analysis

Practical Solutions for Time Series Analysis Enhancing Time Series Analysis with Agentic-RAG Framework Time series modeling is crucial for various applications such as demand planning and anomaly detection. However, it faces challenges like high dimensionality and…

AI Tech News
LLM to Replace FinTech Manager? GPU-free Corporate Analysis

The text discusses the development of a zero-cost LLM wrapper for corporate context analysis using open-source frameworks. It focuses on mitigating privacy and cost concerns associated with traditional LLM models. The project aims to leverage small…

AI Tech News
Google AI Research Examines Random Circuit Sampling (RCS) for Evaluating Quantum Computer Performance in the Presence of Noise

Understanding Quantum Computers and Their Evaluation What Are Quantum Computers? Quantum computers use quantum mechanics to perform calculations that traditional computers cannot handle efficiently. However, evaluating their performance is challenging due to issues like noise and…

AI Tech News
Simplifying Self-Supervised Vision: How Coding Rate Regularization Transforms DINO & DINOv2

Understanding DINO and DINOv2 Learning valuable features from large sets of unlabeled images is crucial for various applications. Models such as DINO and DINOv2 excel in tasks like image classification and segmentation. However, their training processes…

AI Tech News
Yandex Launches Yambda: Largest Event Dataset for Recommender Systems

Introduction to Yandex’s Yambda Dataset Yandex has recently launched Yambda, a groundbreaking dataset that significantly enhances the capabilities of recommender systems. This dataset is the largest publicly available resource for recommender system research, containing nearly 5…

AI News
My Amazon Economist Interview

Amazon, a major employer of Ph.D. graduates in economics and related fields, offers economist roles close to data science and machine learning. The Amazon Economist interview process blends insights applicable across both domains, covering behavioral questions…

AI Tech News
PRIME: An Open-Source Solution for Online Reinforcement Learning with Process Rewards to Advance Reasoning Abilities of Language Models Beyond Imitation or Distillation

Challenges with Large Language Models (LLMs) Large Language Models (LLMs) struggle to improve reasoning due to a need for more high-quality training data. To address this, exploration-based methods like reinforcement learning (RL) provide a better path…

AI Tech News
Understanding AI Agents: The Three Main Components – Conversation, Chain, and Agent

AI Agents: Practical Solutions and Value Conversation: The Interaction Mechanism The conversation component enables AI agents to communicate effectively, gather information, and provide relevant responses through text-based or voice-based interactions. Natural Language Processing (NLP) underpins this…

AI Tech News
This AI Paper Explores How Code Integration Elevates Large Language Models to Intelligent Agents

A recent study from the University of Illinois Urbana-Champaign has highlighted the transformative impact of integrating code into Large Language Models (LLMs) like Llama2, GPT3.5, and GPT-4. This integration enhances LLMs’ comprehension of code, improves reasoning…

AI Tech News
Kyutai Releases Hibiki: A 2.7B Real-Time Speech-to-Speech and Speech-to-Text Translation with Near-Human Quality and Voice Transfer

Real-Time Speech Translation Made Simple Understanding the Challenge Real-time speech translation combines three complex technologies: speech recognition, machine translation, and text-to-speech. Traditional methods often face issues like errors, loss of speaker identity, and slow processing speeds,…

AI Tech News
Byaldi: A ColPali-Powered RAGatouille’s Mini Sister Project by Answer.AI

Byaldi: Simplifying Access to the ColPALI Model Practical Solutions and Value Researchers from Answer.AI have introduced the Byaldi project to address the challenge of making the complex ColPALI model more accessible for developers and researchers. Byaldi…

AI Tech News
Generative AI’s plagiarism problem a legal risk to users

AI art generators present a growing legal risk due to potential copyright infringements. Dr. Gary Marcus and Reid Southen noted that prompts can lead to AI-generated images resembling copyrighted material, posing legal challenges for end users.…

AI Tech News
Stability AI Releases Arabic Stable LM 1.6B Base and Chat Models: A State-of-the-Art Arabic-Centric LLMs

Introduction to Arabic Stable LM 1.6B Large language models (LLMs) have greatly impacted natural language processing (NLP), especially in text generation and understanding. However, the Arabic language is often overlooked due to its complexity and cultural…

AI Tech News
Meet DiscoveryWorld: A Virtual Environment for Developing and Benchmarking An Agent’s Ability to Perform Complete Cycles of Novel Scientific Discovery

Automated Scientific Discovery: Enhancing Scientific Progress Automated scientific discovery can greatly advance various scientific fields. However, evaluating an AI’s ability to perform thorough scientific reasoning is challenging, as real-world experiments can be expensive and impractical. Recent…

AI Tech News
This AI Research from Apple Unveils a Breakthrough in Running Large Language Models on Devices with Limited Memory

Apple researchers have developed an innovative approach to efficiently run large language models (LLMs) on devices with limited memory. Their method involves storing LLM parameters on flash memory and selectively transferring data to DRAM as needed,…

AI Tech News
Benchmarking Large Language Models in Biomedical Classification and Named Entity Recognition: Evaluating the Impact of Prompting Techniques and Domain Knowledge

Practical Solutions and Value of Benchmarking Large Language Models in Biomedical Classification and Named Entity Recognition Research Findings LLMs in healthcare are increasingly effective for tasks like question answering and document summarization, performing on par with…

AI Tech News
Researchers at Northeastern University Propose NeuFlow: A Highly Efficient Optical Flow Architecture that Addresses both High Accuracy and Computational Cost Concerns

AI Tech News
Meet circ2CBA: A Novel Deep Learning Model that Revolutionizes the Prediction of circRNA-RBP Binding Sites

Chinese researchers have developed a deep learning model called circ2CBA that can predict binding sites between circular RNAs and RNA-binding proteins. This has significant implications for understanding diseases, particularly cancer. The model uses sequence information and…

AI Tech News
Frontier Model Forum updates

We are pleased to announce the appointment of the new Executive Director of the Frontier Model Forum, in collaboration with Anthropic, Google, and Microsoft. Additionally, we are launching a $10 million AI Safety Fund.

AI Tech News