Pages

Thursday, 5 September 2019

Image Recognition: Text Detection (Optical Character Recognition) using Google Cloud Vision API

Google Cloud Vision API helps in label detection, face detection, logo detection, landmark detection and text detection (OCR: Optical Character Recognition). In this article, we will see how can we use Google Cloud Vision API to extract the text from the image? This is a step by step guide for text detection (OCR) using Google Cloud Vision API. Let's follow it.

I will directly start from step 5. First 4 steps are same as mentioned in my previous post on label detection using Google Cloud Vision API.

You can download my Jupyter notebook containing below code from here.
 
Step 5: Import required libraries

from googleapiclient.discovery import build
from oauth2client.client import GoogleCredentials
from base64 import b64encode

You may get import error "no module name..." if you have not already installed Google API Python client. Use following command to install it.

pip install --upgrade google-api-python-client

If you also get import error for oauth2client, you must install it using following command:

pip3 install --upgrade oauth2client

Step 6: Load credentials file

Load the credentials file (which we created in step 3 of my previous article) and create a service object using it.

CREDENTIAL_FILE = 'credentials.json'
credentials = GoogleCredentials.from_stream(CREDENTIAL_FILE)
service = build('vision', 'v1', credentials=credentials)

Step 7: Load image file (from which we need to extract the text)

I will load an image of cover page of my deep learning book and encode it so that it becomes compatible with the cloud vision API.



























IMAGE_FILE = book_cover_page.jpg'
with open(IMAGE_FILE, 'rb') as file:
    image_data = file.read()
    encoded_image_data = b64encode(image_data).decode('UTF-8')

Step 8: Create a batch request

We will create a batch request which we will send to the cloud vision API. In the batch request, we will include the above encoded image and the instruction as TEXT_DETECTION.

batch_request = [{
    'image':{'content':encoded_image_data},
    'features':[{'type':'TEXT_DETECTION'}],
}]

Step 9: Create a request

request = service.images().annotate(body={'requests':batch_request})

Step 10: Execute the request

response = request.execute()

This step will throw an error if you have not enabled billing (as mentioned in step 4 of my previous article). So, you must enable the billing in order to use Google Cloud Vision API. The charges are very reasonable. So, don't think too much and provide credit card details. For me, Google charged INR 1 and then refunded it back.

Step 11: Process the response

For error handling, include this code:

if 'error' in response:
    raise RuntimeError(response['error'])

We are interested in text annotations here. So, fetch it from the response and display the results.

labels = response['responses'][0]['textAnnotations']

extracted_text = extracted_texts[0]
print(extracted_text['description'], extracted_text['boundingPoly'])

Output:

Objective Type Questions and Answers in Deep Learning
Deep
Learning
ARTIFICIAL
INTELLIGENCE
MACHINE
LEARNING
DEEP
LEARNING
NARESH KUMAR
 {'vertices': [{'x': 42, 'y': 77}, {'x': 2365, 'y': 77}, {'x': 2365, 'y': 3523}, {'x': 42, 'y': 3523}]}

You can test the above code using different images and check the accuracy of the API.

No comments:

Post a Comment