I've got a PDF file that was OCR'd by Mobi PDF. I double-checked the OCR by closing Mobi PDF, re-opening the file, selecting a phrase, and then copy/pasting that phrase into NotePad and it's correct, so the OCR is good.
The challenge is, I'd now like to look at the output of the OCR using PdfSharp, but I can't find the text anywhere. All I can see is that in the contents of the page by calling
Code:
ContentReader.ReadContent(Page);
, there is a Dictionary operator "/Part <</MCID 0 >>".
I've been reading up on marked-content identifiers but it's all new to me and I can't figure out how to find the content that the MCID is referring to.
How can I find the actual content in the PDF file? Or am I barking up the wrong tree, is the OCR text actually stored somewhere completely different?
Thanks,
Chris