In my college life, I’ve had this thought cross my mind every semester – “What if I can copy that text from the scanned PDF or image file?” It’s an appealing thought, considering that many instruction manuals and booklets are saved in PDF format, which is an ideal way to supplement my presentation. However, I never really thought it was a practical idea, and I just do the tedious task of typing the whole document in order to have a soft copy.
One day I realized that it is very easy to make PDF document text-searchable and editable using optical character recognition, called OCR. Optical character recognition uses the outlines of the letters and numbers to recognize individual character. Once this has been done, you can work with the file like with any other file.
Introduction of Optical Character Recognition (OCR)
OCR is short for Optical Character Recognition. It is a technology to extract text from scanned PDF or image files. It enables users to edit, copy and search the text of the scanned PDF document.
Definition from Wikipedia:
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of printed records. It is a common method of digitizing printed texts so that they can be electronically searched, stored more compactly, displayed online, and used in machine processes such as machine translation, text-to-speech and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision. To get detailed definition of OCR, you may refer to Wikipedia.
What is the best way to OCR a Scanned PDF?
To OCR a scanned PDF in Mac OS X, there are several free and paid OCR programs available online. There are also websites that perform OCR functions for small files. Here are the most notable ones more.
Each service or desktop program helps to easily and efficiently convert PDF into searchable and editable files for editing or further usages. The best depends on your own needs. If you need to OCR a PDF not so frequently, you can just choose a free online service because converting one file at a time won’t matter. If you want a powerful and professional OCR tool, then you may need to consider some paid ones.
Adobe Acrobat has OCR built-in, which is the most efficient way to OCR a PDF.
PDF Converter Programs
There are a lot of programs out there which can convert the PDF document to other file formats such as Word, Excel, HTML, or Pages.
Google Drive also provides an OCR functionality which not only lets you edit scanned PDF, but also allows you to make scanned PDF editable by converting it to Microsoft Word.
Online OCR PDF sites
There are also some online OCR PDF sites like onlineocr.net, newocr.com, etc. so you don’t even need to install any software to OCR your PDF files. Keep in mind that, using online OCR PDF sites, you are uploading the PDF file to a third party, so consider the privacy implications if you are converting confidential and sensitive documents. So do other online services.
Mac’s built-in application Preview can also do the trick, open the PDF with Preview, click the “Text Tool” and select the contents you want, then copy and paste the selected PDF content to Word. However, you should notice that only the text can be saved in Word and after the conversion, all the original layout, graphics, hyperlinks and other elements will be lost. And, it would be a labor some task if there are tons of text needs to be copied.
Easy steps to OCR a Scanned PDF on Mac
Open the scanned PDF in your OCR program. This may vary depending on which program you are using, but generally you just need to add the scanned PDF files and the program will convert it for you, and then save the results as a new file.