woensdag 20 januari 2010

OCR with Microsoft Office

If you have Microsoft Office available on the platform where you're developing (for) you can accomplish OCR functionality by taking advantage of the Office dll's. All you need to do for the following code to work is to add a reference to the Microsoft Office Document Imaging Library in the COM tab.

Be aware that this library might not have been installed along with the Office apps, but you can modify your installation in the control panel to include this library.

All you have to do here is create a document from the image file you wish to analyze and select a language. You can then loop through the recognized words in the layout object of the image.

              MODI.Document doc = new MODI.Document();
doc.Create(filename);

doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);

MODI.Image image = (MODI.Image)doc.Images[0];
MODI.Layout layout = image.Layout;

string text = "";
for (int j = 0; j < layout.Words.Count; j++)
{
MODI.Word word = (MODI.Word)layout.Words[j];
text += " " + word.Text;
}
doc.Close(false);

Geen opmerkingen:

Een reactie posten