Skip to Content
All posts

OCR Translation

 — #python#deeplearning

Abstract

Smartphones have been known as the most commonly used electronic devices in daily life today. As the hardware used in the latest smartphones can perform much more intensive tasks than traditional phones, smartphones are no longer just a communication device but also a powerful computing device. It is, for example, possible to apply techniques to perform text detection and translation right from the phone. Therefore, an application that allows smartphones to capture an image and extract the text from it to translate it into other languages is possible now.

In this study, we have developed a model to extract the text from the image. Final deliverable is tested on many end devices with English and Hindi background and concluded that the application benefits many users. By using this app, travelers who visit India will be able to understand the messages portrayed in Hindi.

In this project, we will be building a model which can extract the text from image and then translate it into other languages. For this project we are going to use googletrans package for the translation and tesseract for image recognition.

Introduction

Current methods only allow the isolated implementations of either OCR or Translation and often, the direct combination of these two independent models only lead to inefficient models that take a lot of time in translating the text in the given image.

This project implements an efficient model that involves the use of Google’s Tesseract module for optical character recognition, EAST text detector for segmentation and Googletrans module for translating the recognized text to other languages.

This allows for achieving the end goal within 3-4 seconds which is an appreciable decrease of processing time which speeds up the entire, thus enabling multiple applications like real-time image translation, video transcripts etc.

Model Architecture

Architecture Diagram

Conclusion

The model shows an appreciable speed in detecting the text in the images and translating it instantly while maintaining a good average accuracy.