Copy the text in the pdf format gives me gobbledygook. Is there a way to OCR to correct?

I have some documents that are complete gibberish when I select the text and copy. If I open them in Acrobat Pro, select text, copy and "Show clipboard" in the Finder, I see a lot of 'skull characters', and if I open them in the preview and do the same thing, I see strings of points. The text in the Clipboard may not intelligible stuck being in any other program, and I can't find the document.

Some of these documents were downloaded from commercial sites. One of them is from (I think) OCR'ing has scanned the document using ClearScan. An example of such a document is at https://public.me.com/ix/alanterra/Reynoso%202006%20p%201.pdf?disposition=download+1317001 233647 (small, 55 K).

It seems to me only one way to do this is to convert the document into a "scanned" pdf, then OCR it. But the only way I can understand how this is the image of each page separately in Photoshop and then assemble the pages into a new document.

There must be a way to solve this problem.

Any thoughts?

A

PS - If you look at the document linked to above, you will notice that the text in the footer is consistent, but not the text in the body of the document.

You will be able to use the OCR in Acrobat after you convert the outline type. You will need to add transparency, then use the Flattener Preview to expose your type. Here are the steps (for Acrobat 9):

1. the document > watermark > Add (add a text watermark, press the SPACEBAR once).

2 advanced > Print Production > overview of flattening > convert all text to outlines text (box). Record.

3. the document > OCR text recognition > recognize text using OCR. Select all text with the text tool, copy.

This method is not perfect, you will need to check the copy to find errors.

Tags: Acrobat

Similar Questions

Maybe you are looking for