The text outlines the LangChain framework, demonstrating the ability to query SQL databases using human language. It describes how LangChain allows the integration of Large Language Models (LLMs) with other tools, enabling the creation of interactive applications. The sample application, a simple Q&A agent, exemplifies LangChain’s potential for complex business analytics with LLMs.
“`html
GENERATIVE AI
A step-by-step tutorial on query SQL databases with human language
Many businesses have a lot of proprietary data stored in their databases. If there’s a virtual agent that understands human language and can query these databases, it opens up big opportunities for these businesses. Think of customer service chatbots, they’re a common example. These agents can take customer requests, ask the database for information, and give the customer what they need.
LLM-backed Applications
LangChain is an open-source framework for building interactive applications using Large Language Models (LLMs). It’s a tool that helps LLMs connect with other sources of information and lets them talk to the world around them. One important concept in such frameworks is the Chain. Let’s take a look at this concept.
What are Chains?
Chains are advanced tools in this framework that combine LLMs with other tools to perform more complicated tasks. Specifically, chains are interfaces that use a sequence of LLMs along with other tools, such as SQL databases, API calls, bash operators, or math calculators, to complete a complex job.
Getting Our Hands Dirty
Now it’s time to get our hands dirty and start coding a simple LLM-backed application. For this application, we are going to make a simple Q&A agent that takes our question and queries a SQL database to find the answer for us.
Set Up A PostgreSQL Sample Database
We use DVD Rental Sample Database from postgresqltutorial.com (License Information). To install the database on your local system, you need to have PostgreSQL installed. Simply type psql in your terminal and see if it runs.
Then you should follow the instructions here. Here I walk through the database installation with you quickly.
In your terminal launch PostgreSQL via:
psql -U <USERNAME>
After creating the database, you can check it by typing \list
command, You should see your dvdrental database in the returned list. Then simply exit Postgres (via \q
).
Now we need to download the tables and data. You can download everything as a tar file here: Download DVD Rental Sample Database
Go to the folder that you downloaded the tar file and using pg_restore, we can load the tables into our Postgres Database.
Before going further, let’s check if the tables are loaded. Launch Postgres via psql -U <USERNAME>
and enter your password.
Then type the following commands to list tables inside the dvdrental database.
postgres=# \c dvdrental
dvdrental=# \dt
You must see a list of tables like this:
Set Up .env File
After building the sample database, we need to create a .env file. We use this file to store our secrets and keys as well as environmental variables.
Creating a LangChain Application
After saving your .env file in your project folder, we can start the actual code. First, we import the required libraries into our Python code.
For this tutorial, I recommend using a Jupyter Notebook to test it step by step.
We need to import the required libraries at the beginning.
Using load_dotenv(), we load the environment variables that we defined in the .env file. Now, safely we can access them in the code.
In addition to a model, we need a SQL Database connection. That connection enables our chain to query against the database and get the results back.
In case you are wondering about sample_rows_in_table_info parameter that we used in our SQL database connection, Rajkumar et. al showed in their paper (https://arxiv.org/abs/2204.00498) that including a few sample rows from the table increases the performance of the model in creating mode affecting querying the data. In LangChain, simply you can set sample_rows_in_table_info and determine the number of sample rows from each table that will be appended to each table description.
To test the SQL database connection, I printed the list of available tables using db.get_usable_table_names(). It should return the following list of tables for you too.
The next step is the most important one. With our LLM model and a SQL database connection, now we should be able to instantiate our chain. Before showing the instantiation, let’s get a little bit more sense about a chain.
Finally, we use the run() method to pass our input/question to the chain and get the final response back.
Now it is time to test our code. We start with a simple query that requires querying a single table.
The second question is a little bit more complicated and requires joining three tables.
Summary
In this article, we introduced a powerful open-source tool called LangChain that enables us to build LLM-based applications.
Then we used SQLDatabaseChain to build an application that queries a SQL database based on user questions and returns the results.
This simple Q&A application can be expanded into a more complex business analytics assistant that anyone inside a business can use daily to get the latest insights from the proprietary data.
Spotlight on a Practical AI Solution
Consider the AI Sales Bot from itinai.com/aisalesbot designed to automate customer engagement 24/7 and manage interactions across all customer journey stages.
Discover how AI can redefine your sales processes and customer engagement. Explore solutions at itinai.com.
“`