Resurrect the old PDF files into usable documents

Hi all

I have a few dozen files PDF that was apparently created by scanning the old doc printed 1990 (full of staple holes and Visual noise, like perhaps other pages had glued over time) in PDF without benefit of OCR. I don't know how long the PDF was created, but Ctrl + D shows Acrobat 5.x as the latest version of the cluster. The files have a line drawings. Some have tables created in the old typewriter/WordPerfect bullies: using special characters to delimit columns online. Roll forward about 20 years now... These old relics of decades are the only things that shows how something important, and I have to put things in a useful format for programmers and developers of new products today. Unfortuntately, there is no admin on the staff, and yours is really too short a time to allow to fix such things even at 120 OMP.

I use Acrobat Pro Extended v.9 on XP - SP3. (I have access to Acrobat 9 on Win7, but I'm just a die-hard XP.)  I tried to run 'Recognize text in multiple files using OCR' with parameters to ClearScan and Word output, and it really gum up the layout until you can not literally follow one sentence to another enough say where are the sentences and paragraphs.  I've seen some other posts about the differences between the 3 OCR options (still a little foggy on it) and thought maybe ClearScan would be a good choice and then save them in Word or save it as plain text or RTF, but ClearScan 'seen' the Visual noise and the punctuation marks and special characters. Maybe it was a hypersensitivity?

What Miss me about the settings? Or is there something I could / should do to clean up this Visual noise before running OCR? I need ideas like a lot of people who know more that I do what I can maybe you! :-)

Thanks in advance,

Wintenberger

CH-

I forgot to mention, on some pages, there are also hand-written notes. Is there a way to get the OCR to understand handwriting? Or if I use the typewriter tool to record the handwritten stuff, is it possible to remove the PDF manuscript stuff before running OCR?

PPS-

These are not legal documents, but I'm working copies stored away so that the original PDF files are archived for posterity.

As mentioned, some of the 3rd party are supposed to do a better job, as the OCR is their main goal. When I did my job, I OCRB the scan, then moved to my word processor. Of course, it was not perfect, but much better than of type from scratch. My copy is pretty good and about 600 dpi I recall. I spent probably not more than about 10 minutes per page on average. Maybe it's even higher. I had to retype each equation. The alternative is to copy the equations in the form of a graph. In some cases, you may even want to anyway as you so that the image of the equation to the view that you retype. I have just try a few pages and see what you can get. Be sure to keep a copy of your Moose and work in parts.

Tags: Acrobat

Similar Questions

Maybe you are looking for

  • code value 43 device unknown 0000002b

    have x - box wireless controller pc not able to work with win 7 64 bit system. Downloaded the driver for micro and still does not

  • Problems with SSH Cisco 871W

    Hi, I started training for my certification and now have any posible explanation how to configure ssh to a cisco 871w router, and there is no way I can connect. I used TeraTerm Version 3.13 and 4.69 and he keeps asking me the password that I entered

  • Router from edge of logging for CS-MARS

    Is it possible to record events to a border router (internet routable IP), to our box of MARCH, which is inside our network (private IP)? If so, what commands logging would be for this?

  • Page creator: vacuum

    I installed 5.0 APEX in an all new PDB on Oracle 12 c and imported my 4.2.6 workspace APEX and loads of applications inside by running the SQL export files generated by the APEXExport Java utility.2015-07-10_1227 - vikasa libraryNo errors during the

  • Have HP J4680C, printing and copy work OK but I can't scan.

    Scan feature stop working after trying it out a HP software update. Print and copy still work normal. I can't open the solution center to select the type of analysis.