OCR on iPhone demo

Update: Source code for demo project released.


i finally got around to building a proof of concept implementation of tesseract-ocr for the iPhone. months ago, i documented the steps which helped to get the library cross-compiled for the iPhone’s ARM processor, and how to build a fat library for use with the simulator as well. several folks have helped immensely in noting how to actually run the engine in obj-c++. thanks to everyone who has commented so far.

anyway, below is a short video of the POC in action. the basic workflow is: select image from photo library or camera, crop tightly on the box of text you’d like to convert, wait while it processes, select / copy or email text.

there are loads of improvements which could be implemented (image histogram adjustment, rotation / perspective correction, automatic text box/layout detection, content detection – dates, links, contact information…) but this is a nice point to stop and document.

i realize that there are several OCR applications available for the iPhone, including a few which also run the engine on the device rather than handing it off to a web service. this started as an educational project on cross-compiling, and to fill a personal want for a handheld OCR app of my own. for these reasons, i’m going to open-source the entire app. look for it after this semester ends when i’ll have some more time to properly document the code. in the meantime, enjoy these code snippets demonstrating how to initialize the engine and process an image.

Initialize the engine:

    NSString *dataPath = [[self applicationDocumentsDirectory] stringByAppendingPathComponent:@"tessdata"];
     Set up the data in the docs dir
     want to copy the data to the documents folder if it doesn't already exist
    NSFileManager *fileManager = [NSFileManager defaultManager];
    // If the expected store doesn't exist, copy the default store.
    if (![fileManager fileExistsAtPath:dataPath]) {
        // get the path to the app bundle (with the tessdata dir)
        NSString *bundlePath = [[NSBundle mainBundle] bundlePath];
        NSString *tessdataPath = [bundlePath stringByAppendingPathComponent:@"tessdata"];
        if (tessdataPath) {
            [fileManager copyItemAtPath:tessdataPath toPath:dataPath error:NULL];

    NSString *dataPathWithSlash = [[self applicationDocumentsDirectory] stringByAppendingString:@"/"];
    setenv("TESSDATA_PREFIX", [dataPathWithSlash UTF8String], 1);

    // init the tesseract engine.
    tess = new TessBaseAPI();

    tess->SimpleInit([dataPath cStringUsingEncoding:NSUTF8StringEncoding],  // Path to tessdata-no ending /.
                     "eng",  // ISO 639-3 string or NULL.

Process an image. This should be threaded as it’s a heavy process:

    CGSize imageSize = [uiImage size];
    double bytes_per_line	= CGImageGetBytesPerRow([uiImage CGImage]);
    double bytes_per_pixel	= CGImageGetBitsPerPixel([uiImage CGImage]) / 8.0;

    CFDataRef data = CGDataProviderCopyData(CGImageGetDataProvider([uiImage CGImage]));
    const UInt8 *imageData = CFDataGetBytePtr(data);

    // this could take a while. maybe needs to happen asynchronously.
    char* text = tess->TesseractRect(imageData,
                                     0, 0,
                                     imageSize.width, imageSize.height);

    // Do something useful with the text!
    NSLog(@"Converted text: %@",[NSString stringWithCString:text encoding:NSUTF8StringEncoding]);

    delete[] text;

Enjoy the video!

Tags: , , , , ,