Character Recognition and Electronic Document Management System: The Connection Established
A Vice-Chancellor of a University in Seattle is faced with a peculiar problem. He has received an application for various courses in their curriculum. They are all handwritten forms which have a lot of details. However, to process the applications, he needs a mechanism whereby he can capture few relevant fields from the forms and maintain a database of the same. He gets the documents scanned from a nearby scan shop which also indexes them by tagging each image file with a syntax including the name of course applied for an applicant’s initials. How does he capture the data trapped in those forms?
One answer for the above problem would be to enter the relevant details into a spreadsheet manually. This is a very labour-intensive activity which will also be expensive and time-consuming. The data entry operator will have to go through each scanned image and manually enter the details in the spreadsheet software. Another problem that this method has is the levels of accuracy that can be expected as it is a manual task.
The best way would be to adopt electronic document management system using character recognition techniques. There are two types of character recognition.
OCR – Optical Character Recognition:
The software contains “templates” of possible characters. When the document scanner sees a letter, it compares it to its library of pattern templates. If it matches precisely, it translates it to the corresponding text character and sends the ASCII equivalent of the letter to the output file. Character matching is a very accurate OCR method if a scanning expert accurately controls the input. It is frequently used for automatic indexing of electronic documents.
ICR – Intelligent Character Recognition:
This is an advanced optical character recognition (OCR) or rather more specifically a handwriting recognition system that allows fonts and different styles of handwriting to be learned by a computer during processing to improve accuracy and recognition levels.
Most ICR software has a self-learning system referred to as a neural network, which automatically updates the identification database for new handwriting patterns. It extends the usefulness of scanning devices for document processing, from printed character recognition (a function of OCR) to hand-written matter recognition.
Once the documents are indexed and added to your systems, OCR (optical character recognition) facilitates content text searching. The accuracy of this process is also dependent on the reliability of the indexing process. OCR is a very effective technique, even if you are searching through thousands of files.
Recent improvements in OCR make the process very accurate (up to 99%) depending on the quality of the original document and the scanned version of the same. Most companies are happy to enjoy the benefits of OCR and content text search even with its imperfections.
ICR is a more advanced version of OCR. Using artificial intelligence based on neural knowledge, OCR processes the handwritten text using some cues from the handwritten text. So, for instance, if a particular word is being written in a certain way in more than one place, the system tries to understand the context and translate the text into Unicode or ASCII.
“Unicode is an industry standard designed to allow text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers.”
“ASCII (American Standard Code for Information Interchange) is a character encoding based on the English alphabet.”
We, at print2eforms offer the best electronic document management system using either type of character recognition and, are adept at processing files of any complexity and converting them into electronic records. Please visit Print2eforms Electronic Document Management System for more info.