In today’s fast-paced world of artificial intelligence, performance is key. When working with Large Language Models (LLMs), developers often find themselves waiting for API responses or multiple calls to finish. This is where asyncio comes in. Many developers use LLMs without realizing that asynchronous programming can significantly enhance their applications.
What is Asyncio?
Python’s asyncio library allows developers to write concurrent code using the async/await syntax. This means that multiple I/O-bound tasks can run efficiently within a single thread. Essentially, while synchronous code processes tasks one after another—like standing in a single line at the grocery store—asynchronous code can handle multiple tasks simultaneously, akin to using multiple self-checkout machines. This is especially beneficial for API calls, which often involve waiting for responses.
Getting Started with Asynchronous Python
Example: Running Tasks With and Without Asyncio
Consider a simple function that prints a greeting, waits for 2 seconds, and then completes. In a synchronous setup, running this function three times results in a total wait time of 6 seconds. However, by using asyncio, all three greetings can be printed almost simultaneously, significantly reducing the total wait time.
import time
def say_hello():
print("Hello...")
time.sleep(2) # simulate waiting (like an API call)
print("...World!")
def main():
say_hello()
say_hello()
say_hello()
if __name__ == "__main__":
start = time.time()
main()
print(f"Finished in {time.time() - start:.2f} seconds")
In contrast, the asynchronous version allows all calls to start almost at the same time, with each greeting printed immediately, leading to a total wait time closer to 2 seconds.
import nest_asyncio, asyncio
nest_asyncio.apply()
async def say_hello():
print("Hello...")
await asyncio.sleep(2) # simulate waiting (like an API call)
print("...World!")
async def main():
await asyncio.gather(
say_hello(),
say_hello(),
say_hello()
)
if __name__ == "__main__":
start = time.time()
asyncio.run(main())
print(f"Finished in {time.time() - start:.2f} seconds")
Example: Download Simulation
Imagine needing to download several files. In a synchronous approach, each download would block the next one until it completes. However, with asyncio, your program can handle multiple downloads at once, making better use of time.
import asyncio
import random
import time
async def download_file(file_id: int):
print(f"Start downloading file {file_id}")
download_time = random.uniform(1, 3) # simulate variable download time
await asyncio.sleep(download_time) # non-blocking wait
print(f"Finished downloading file {file_id} in {download_time:.2f} seconds")
return f"File {file_id} content"
async def main():
files = [1, 2, 3, 4, 5]
start_time = time.time()
results = await asyncio.gather(*(download_file(f) for f in files))
end_time = time.time()
print("\nAll downloads completed.")
print(f"Total time taken: {end_time - start_time:.2f} seconds")
print("Results:", results)
if __name__ == "__main__":
asyncio.run(main())
Using Asyncio in an AI Application with an LLM
Now, let’s see how to apply asyncio in a real-world AI context. LLMs like OpenAI’s GPT models often require multiple API calls. If these calls are made sequentially, it leads to wasted time. Let’s compare the performance of running multiple prompts with and without asyncio.
!pip install openai
import asyncio
from openai import AsyncOpenAI
import os
from getpass import getpass
os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')
import time
from openai import OpenAI
client = OpenAI()
def ask_llm(prompt: str):
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
def main():
prompts = [
"Briefly explain quantum computing.",
"Write a 3-line haiku about AI.",
"List 3 startup ideas in agri-tech.",
"Summarize Inception in 2 sentences.",
"Explain blockchain in 2 sentences.",
"Write a 3-line story about a robot.",
"List 5 ways AI helps healthcare.",
"Explain Higgs boson in simple terms.",
"Describe neural networks in 2 sentences.",
"List 5 blog post ideas on renewable energy.",
"Give a short metaphor for time.",
"List 3 emerging trends in ML.",
"Write a short limerick about programming.",
"Explain supervised vs unsupervised learning in one sentence.",
"List 3 ways to reduce urban traffic."
]
start = time.time()
results = []
for prompt in prompts:
results.append(ask_llm(prompt))
end = time.time()
for i, res in enumerate(results, 1):
print(f"\n--- Response {i} ---")
print(res)
print(f"\n[Synchronous] Finished in {end - start:.2f} seconds")
if __name__ == "__main__":
main()
The synchronous version took significantly longer, processing all 15 prompts sequentially. In contrast, the asynchronous version processes all prompts concurrently, drastically reducing the total runtime.
from openai import AsyncOpenAI
client = AsyncOpenAI()
async def ask_llm(prompt: str):
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
async def main():
prompts = [
"Briefly explain quantum computing.",
"Write a 3-line haiku about AI.",
"List 3 startup ideas in agri-tech.",
"Summarize Inception in 2 sentences.",
"Explain blockchain in 2 sentences.",
"Write a 3-line story about a robot.",
"List 5 ways AI helps healthcare.",
"Explain Higgs boson in simple terms.",
"Describe neural networks in 2 sentences.",
"List 5 blog post ideas on renewable energy.",
"Give a short metaphor for time.",
"List 3 emerging trends in ML.",
"Write a short limerick about programming.",
"Explain supervised vs unsupervised learning in one sentence.",
"List 3 ways to reduce urban traffic."
]
start = time.time()
results = await asyncio.gather(*(ask_llm(p) for p in prompts))
end = time.time()
for i, res in enumerate(results, 1):
print(f"\n--- Response {i} ---")
print(res)
print(f"\n[Asynchronous] Finished in {end - start:.2f} seconds")
if __name__ == "__main__":
asyncio.run(main())
Why This Matters in AI Applications
In real-world AI applications, waiting for each request to finish can become a significant bottleneck, especially when dealing with multiple queries or data sources. This is particularly common in:
- Generating content for multiple users simultaneously—like chatbots or recommendation engines.
- Calling the LLM several times in one workflow—for tasks like summarization or multi-step reasoning.
- Fetching data from multiple APIs—combining LLM output with external information.
Using asyncio can lead to:
- Improved performance: Parallel API calls reduce overall execution time.
- Cost efficiency: Faster execution can lower operational costs.
- Better user experience: Concurrency enhances responsiveness in real-time systems.
- Scalability: Asynchronous patterns allow handling more simultaneous requests without a proportional increase in resource consumption.
In conclusion, integrating asyncio into your AI applications can significantly enhance performance, efficiency, and user experience. By leveraging asynchronous programming, developers can make the most of their resources and build more responsive applications.
FAQ
- What is asyncio?
Asyncio is a Python library used for writing concurrent code using the async/await syntax, which allows for efficient handling of I/O-bound tasks. - How does asyncio improve performance?
By allowing multiple tasks to run concurrently, asyncio reduces the total waiting time for I/O operations, making applications faster. - When should I use asyncio?
Use asyncio when your application involves many I/O-bound tasks, such as API calls or file downloads, where waiting time can be reduced. - Can asyncio be used with LLMs?
Yes, asyncio can significantly improve the performance of applications that make multiple API calls to LLMs by processing requests concurrently. - What are some common mistakes when using asyncio?
Common mistakes include not using await with async functions, blocking the event loop with synchronous code, and not handling exceptions properly.



























