Screen scraping from an image (not web scraping)

I am very new to C#, so please excuse my ignorance.

I want to write a program to help automate some tasks where I work. Specifically, it involves using a program that displays architectural drawings (similar to Bluebeam, but not). And, there is no API for the program.

What I’d like is for a user to use a rectangle to select a portion of the drawing, and then have an OCR engine grab simple text from that image, and load it into the clipboard. From there, I think I can manage the rest.

So, I guess my questions are:

  1. What do you recommend for OCR in C#? And, are there good resources for how to get tesseract working in Visual Studio 2017? It seems like you have to install a package with NuGet (which package?), and then install the language data files… somewhere? I think the best package I’ve found is the .NET wrapper by charlesw (located here: https://github.com/charlesw/tesseract).

  2. Most of the stuff I do see about OCR is grabbing all the text from a file on the HD. Is there a good way to just grab it directly from the screen? Or will I need to use an intermediate step of performing a screen capture, saving it to some temporary location, and then passing that path to the OCR engine?

I feel like these are probably really basic questions, and for that I’m sorry. But, if any of y’all have any resources you’d recommend about how to go about doing this, I’m all ears.

Thanks for any advice!

submitted by /u/dedroia
[link] [comments]

Leave a Reply