Text Detection and Inpainting

April 17, 2017 • 3 min read

computer vision

For my final project in ECE 590: Image and Video Processing (also a popular MOOC), I worked with three other teammates to build an application in MATLAB to detect text in images and perform inpainting to remove it.

The application consists of four phases:

Input the image
Text detection
Perform inpainting
Post-processing

1. Input the image

The input image can be any one with some obscuring text to detect and remove. Without loss of generality, the text can be of any font, weight, size, and color, and appear in any part of the image.

2. Text detection

The application allows the user to select from four different methods of detecting text:

Method 1: Edge Detection (Prewitt)

The Prewitt operator calculates the gradient of the image intensity at each pixel and uses that information to detect edges at points where the changes are most abrupt. Then, the edges are thickened in order to generate the mask to use for inpainting.

Method 2: Otsu's Method

In the special case where the image has primarily two colors, with the text being a different color from the underlying image, Otsu's method can be used to automatically threshold the histogram and perform a segmentation, giving us the desired text binary mask.

Method 3: Color-Select Mask

This algorithm takes user input for the text color and creates a mask surrounding areas with a concentration of this color value.

Method 4: User Generated Mask

As a last resort, if a user finds the program is having difficulty isolating the text specifically, the user can input a premade mask.

3. Perform inpainting

The Mumford-Shah functional is defined as follows, where $I$ is the particular image, $D$ is some domain of the image, $J$ is the image’s model, and $B$ are the boundaries associated with that model: