Build an OCR App in Google Colab with OpenCV and Tesseract-OCR

Introduction to Optical Character Recognition (OCR)

Optical Character Recognition (OCR) is a technology that transforms images of text into machine-readable data. As the demand for automated data extraction increases, OCR tools have become vital for various applications, including document digitization and information extraction from scanned images.

Building an OCR Application

This guide will help you create an OCR application using Google Colab. We will utilize tools such as OpenCV for image processing, Tesseract-OCR for text recognition, NumPy for numerical operations, and Matplotlib for visualization. By the end, you will be able to upload an image, preprocess it, extract text, and download the results seamlessly.

Setting Up the OCR Environment

To set up the OCR environment in Google Colab, install Tesseract-OCR and essential Python libraries:

  !apt-get install -y tesseract-ocr
  !pip install pytesseract opencv-python numpy matplotlib
  

Importing Necessary Libraries

Next, import the required libraries for image processing and OCR:

  import cv2
  import pytesseract
  import numpy as np
  import matplotlib.pyplot as plt
  from google.colab import files
  from PIL import Image
  

Uploading an Image

To process an image, upload it to Google Colab using the following code:

  uploaded = files.upload()
  filename = list(uploaded.keys())[0]
  

Image Preprocessing

Enhance the image quality for better OCR accuracy with the following preprocessing function:

  def preprocess_image(image_path):
      image = cv2.imread(image_path)
      gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
      _, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
      return thresh

  processed_image = preprocess_image(filename)
  plt.imshow(processed_image, cmap='gray')
  plt.axis('off')
  plt.show()
  

Extracting Text

Perform OCR on the preprocessed image using the following function:

  def extract_text(image):
      pil_image = Image.fromarray(image)
      text = pytesseract.image_to_string(pil_image)
      return text

  extracted_text = extract_text(processed_image)
  print("Extracted Text:")
  print(extracted_text)
  

Saving and Downloading Extracted Text

To make the extracted text easily accessible, save it as a text file:

  with open("extracted_text.txt", "w") as f:
      f.write(extracted_text)

  files.download("extracted_text.txt")
  

Conclusion

By integrating OpenCV, Tesseract-OCR, NumPy, and Matplotlib, we have successfully created an OCR application in Google Colab. This workflow provides an efficient method to convert scanned documents and printed text into digital formats. The preprocessing steps enhance accuracy, while the ability to save and download results facilitates further analysis.

Next Steps

Explore how artificial intelligence can transform your business processes. Identify areas where automation can add value, select appropriate tools, and start with small projects to measure effectiveness before scaling up.

Contact Us

If you need guidance on managing AI in your business, reach out to us at hello@itinai.ru. Connect with us on Telegram, Twitter, and LinkedIn.


AI Products for Business or Custom Development

AI Sales Bot

Welcome AI Sales Bot, your 24/7 teammate! Engaging customers in natural language across all channels and learning from your materials, it’s a step towards efficient, enriched customer interactions and sales

AI Document Assistant

Unlock insights and drive decisions with our AI Insights Suite. Indexing your documents and data, it provides smart, AI-driven decision support, enhancing your productivity and decision-making.

AI Customer Support

Upgrade your support with our AI Assistant, reducing response times and personalizing interactions by analyzing documents and past engagements. Boost your team and customer satisfaction

AI Scrum Bot

Enhance agile management with our AI Scrum Bot, it helps to organize retrospectives. It answers queries and boosts collaboration and efficiency in your scrum processes.

AI Agents

AI news and solutions