Leveraging an investment in scanning hardware and software should always be a priority. After all these are typically not cheap investments although the ROI can be outstanding if implemented properly.
In this blog I would like to share some little known, yet extremely useful, features that can dramatically improve forms processing automation and accuracy. I am occasionally asked about these features and I believe if more people knew these were available then it would help improve efficiency in the capture process tremendously.
Multistream – Multiple versions of one captured image
The first feature I would like to explain is “Multistream”. As the word would indicate this means that for each image captured, the scanner can output two or more versions of the image. Why in the world would anyone want to do this you ask? Good question and the answer is to improve Forms Processing data extraction accuracy. Typically when people use Multistream they will output a color version of the image and a bitonal (black and white) version of the image. The color version is stored for the purpose of retaining an electronic version of the original document. This version of the image is for human’s to retrieve and view images. However, the bitonal version is used for the capture technology such as OCR to process by computers. Bitonal images are preferred for OCR because the color is unnecessary for a computer to interpret pixels and might actually decrease the level of accuracy.
As you can see in the image below the OMR (Optical Mark Recognition – checkboxes), ICR (Intelligent Character Recognition – Handwritten) and OCR (Optical Character Recognition – Machine characters) are much cleaner on the bitonal image on the left. While the color image on the right is good for human viewing but not as good for capture and data extraction.
Dropout Color – Remove form background color
Another useful feature to use, in conjunction with, or just use in general on certain types of forms, is called “Dropout Color”. This means that either the scanning hardware, sometimes the scanner driver or even capture application, can remove the forms background color. In the image below the form color for the Healthcare form is a red color. This red color is a good way to guide humans completing these forms to which area of the form to fill-in information. However, this color is unneccasary and not needed for a computer to read this information via OCR, ICR or OMR. Therefore, we can “dropout” the color to expose only the information on the form that we really care about.
Forms Processing – Automatically extracting data from forms
Now, after using Multistream and/or Color Dropout, as you can see in the image below, you can now expose all the data you wish to capture in a neat manner which a computer can better understand and interpret. The combination of using these advanced features can certainly help improve your data capture automation and accuracy levels.
Gaining value by using tools available to you
Enabling these features is quite simple so I encourage everyone to consider if these, or other features, might be available to you in your document capture solution that might help improve productivity. These are just a few examples of using available functions to enhance process. Within the entire capture process there are many techniques, functions or features that can be incorporated that would make capture much more efficient.
What do you think? Are you getting the most out of your capture solution or do you think that there are possibly areas of improvement had you known about capabilities such as Multistream or Color Dropout?