Getting the most out of your document capture solution – Multistream, color dropout for forms processing

Leveraging an investment in scanning hardware and software should always be a priority.  After all these are typically not cheap investments although the ROI can be outstanding if implemented properly.

In this blog I would like to share some little known, yet extremely useful, features that can dramatically improve forms processing automation and accuracy.  I am occasionally asked about these features and I believe if more people knew these were available then it would help improve efficiency in the capture process tremendously.

Multistream – Multiple versions of one captured image

The first feature I would like to explain is “Multistream”.  As the word would indicate this means that for each image captured, the scanner can output two or more versions of the image.  Why in the world would anyone want to do this you ask?  Good question and the answer is to improve Forms Processing data extraction accuracy.  Typically when people use Multistream they will output a color version of the image and a bitonal (black and white) version of the image.  The color version is stored for the purpose of retaining an electronic version of the original document.  This version of the image is for human’s to retrieve and view images.  However, the bitonal version is used for the capture technology such as OCR to process by computers.  Bitonal images are preferred for OCR because the color is unnecessary for a computer to interpret pixels and might actually decrease the level of accuracy.

As you can see in the image below the OMR (Optical Mark Recognition – checkboxes), ICR (Intelligent Character Recognition – Handwritten) and OCR (Optical Character Recognition – Machine characters) are much cleaner on the bitonal image on the left.  While the color image on the right is good for human viewing but not as good for capture and data extraction.

Dropout Color – Remove form background color

Another useful feature to use, in conjunction with, or just use in general on certain types of forms, is called “Dropout Color”.  This means that either the scanning hardware, sometimes the scanner driver or even capture application, can remove the forms background color.  In the image below the form color for the Healthcare form is a red color.  This red color is a good way to guide humans completing these forms to which area of the form to fill-in information.  However, this color is unneccasary and not needed for a computer to read this information via OCR, ICR or OMR.  Therefore, we can “dropout” the color to expose only the information on the form that we really care about.

 Forms Processing – Automatically extracting data from forms

Now, after using Multistream and/or Color Dropout, as you can see in the image below, you can now expose all the data you wish to capture in a neat manner which a computer can better understand and interpret.  The combination of using these advanced features can certainly help improve your data capture automation and accuracy levels.

Gaining value by using tools available to you

Enabling these features is quite simple so I encourage everyone to consider if these, or other features, might be available to you in your document capture solution that might help improve productivity.  These are just a few examples of using available functions to enhance process.  Within the entire capture process there are many techniques, functions or features that can be incorporated that would make capture much more efficient.

What do you think?  Are you getting the most out of your capture solution or do you think that there are possibly areas of improvement had you known about capabilities such as Multistream or Color Dropout?

The logic of document capture

Indexing, Metadata, Keyword, SharePoint, Capture, Scanner, Documents, ECM, Content Management

What is wrong with the collection of words above?  Well, it’s a collection of terms that are closely related but have no logical structure in order to be of value to anyone reading them.  In order for these words to be valuable in terms of readability for context they need to be logically organized into a sentence.  The logic of document capture and Enterprise Content Management is much the same.  In this blog post, instead of going into the nuts and bolts of document capture I thought it is more important to discuss two critical components to your overall success, or failure, of your content management strategy.  These two critical components are taxonomy and metadata.  This is philosophy and not technology.

To break down document capture in its simplest form, just think of this as the process of extracting information from a document and making that information available in the future.  The future could be immediate where a scanned invoice, for example, immediately kicks-off a payment process.  Or it could be two weeks from now where a customer service agent needs to retrieve a signed airbill for a proof of delivery.  The point is that document retrieval is based on some unique keyword or a set of keywords related to a particular document.  In the case of the invoice it could have been the invoice number and in the case of the airbill it could have been the shipping tracking number.

If you do not consider a well thought-out strategy then your organization could have accomplished the task of taking an organized paper mess and simply converted it to an electronic mess.

Establish a well thought-out taxonomy

Taxonomy is defined as classifying organisms into groups based on similarities.  Why is taxonomy relevant for document capture?  For several reasons, including security, quicker access to information and retention policies.  So, if you work backwards in the methodology of how and what, technology to implement for your document capture solution a solid consensus of the end result is of paramount importance.  The end result is typically a high-quality scanned image conducive for data capture (OCR, ICR, OMR, bar code, etc.) and the metadata itself.  So if your taxonomy has organized methodology then it should assist in making your document capture strategy fairly obviously.  Let’s take security as a benefit for a well thought-out taxonomy strategy.  By segregated documents based on a logical taxonomy, organizations are afforded an addition level of comfort knowing that a set of security policies can be applied to, for example, Human Resource, documents allowing access to everyone for a general set of available scanned documents such as the café menu which is clearly not a information sensitive document.  Additionally, another benefit of a well thought-out taxonomy is quicker access to information for users.  Many content management software applications and search engines use a ‘crawl’ method to check newly added content and add them to an index (database) which is then searchable.  As you can imagine, common sense and logic dictates that ‘crawling’ a more narrow scope is much quicker to keep the database up-to-date, but also access times could be considerably less by not having to search the entire database and only the relevant data indexed.  This makes access to data quicker.  Lastly, in regards to retention policies, having your data well organized is a major benefit for this area.  Imagine that an organization has all of their tax documents properly electronic stored via a well thought-out taxonomy in their content management system.  If they did then easily, and within corporate governance standards and policies the organization can removed these images from their repository based on a retention schedule.  So, as illustrated, investing the time to develop a strong taxonomy is important for many reasons including security, searchability and retention.

It is extremely important to not over look this important concept when planning out a document capture strategy.  A simple taxonomy might be organized like below:

  • Accounting
    • Accounts Receivable
      • Check
      • Statement
    • Accounts Payable
      • Invoice
      • Receipt
  • Human Resources
    • Applications
    • Resumes
    • W2 Forms

taxonomy

Considering a well thought-out strategy might seem cumbersome in the initial stages of establishing your document capture strategy, but it can save organizations significant time, money and aggravation in the long-run.  As a best document capture practice it is important to establish a solid taxonomy for scanned documents and also re-evaluate the strategy as it relates to taxonomy as any new documents are introduced within your organization.

 

Consider what information is important, and what is not

Creating Searchable PDF’s is one form on document capture; however, it is not always an ideal document capture strategy.  While sometimes, in certain situations, creating Searchable PDF images of your scanned documents is the right approach for an organization sometimes this technique of document capture often creates inefficiencies.  You might be thinking to yourself how could creating a fully Searchable PDF with all the words of the document indexed be construed as being inefficient?  Let me elaborate.  When creating a Searchable PDF the scanning software does its best job possible to recognize every single character and every single word on a page.  This might sound appealing but let’s consider the possible results in real-world applications.  Imagine that an organization in the insurance business scans as little as 100 single-page documents and creates Searchable PDF documents.  Then they want to retrieve a document based on a keyword so they use the word “claim” in their search criteria to find a document a user is searching for.  As you can imagine the user would most likely be presented with a long set of links to possible documents but only one is the important document they are looking for and the rest is “irrelevant search”.  This is because the entire page was indexed via the Searchable PDF method.  Alternatively, if your data capture strategy had included only extracting “relevant search” terms that apply to a particular document then you make the organization much more efficient by being able to find the data you have requested much quicker with the first search.

One of the other significant benefits with an integrated document capture/content management strategy is that often times any sort of metadata fields created, and rules applied, in the content management system can be brought forward and applied into the document capture system itself.  For example, if an organizations’ policy dictates that on a healthcare insurance form that for a metadata field the social security number is required and can only be nine characters long of numeric characters, then directly in the document capture system these rules can be enforced.  This allows for great business continuity and consistency in your data capture process.

An analogy I like to use is go to your favorite internet search engine and enter in a vague term such as “taxonomy for document capture” then you will get a long list of ‘hits’ that probably are not of interest because you might be looking for a specific piece of information, or a scanned image.  In the contrary, if the user enters-in a more specific term such as “aim document taxonomy” then the focus of the search is narrowed down to a more relevant list of potential information the user is searching for.  This is an example of relevant search versus irrelevant search and it’s all related to applying metadata to web pages, electronic documents and, yes, especially scanned images.

Summary: Organized taxonomy + relevant metadata = Efficient process

In summary, my point is to carefully plan out your document capture process.  Pay close attention to developing an effective taxonomy for your documents.  Determine what information is important on a particular document and what is not.  Document capture technology has evolved to nearly magically proportions but, the truth is that organizations can still greatly help their efficiency and content management effectiveness through careful planning; after all there still is logic to document capture.

Do you have thoughts of the topic of document capture, taxonomy or classification?  Please share your comments.

Capture begins with process

Capture begins with process

As a prelude to an upcoming series of blog posts I will be posting on the topic of “Building an effective capture solution” I wanted to preface these posts and focus on the question of ‘where do I start if I want to build an effective capture solution?’.

More education, less self promotion

With information capture being such an obvious way to decrease operational costs, increase efficiency, reduce risk and assist with compliance, then it begs the question of why wouldn’t everyone be using capture?  I think the answer lies in the fact that as an industry we have done a dis-service to our community.  Every vendor’s product is the best *sarcasm*.  Everyone can offer the complete solution *eyeroll*.  Vendors compete for business on a list of features instead of a genuine desire to assist their customers become more productive *disgust*.  Of course this is a generalization and not every vendor, or person, is so self-centered but my point is that a resource such as the AIIM community, which is rich in educational information and maintains a genuine vendor-neutral stance, are too few and far between.  We need to breakdown the components of a capture solution to their lowest common denominator and share with others how to achieve an effective capture solution so that everyone can benefit from a technology that has a proven track record of success.  Breaking down the components of a capture solution involves three basic parts:  User Interface, Processing and Storage.  It’s really that simple.  Of course this is an oversimplification but those are the basic three components.

Eating my own dog food

Having spent nearly my entire professional career in the document capture/ECM industry you would think that someone like me might suggest that a ‘solution’ starts with consideration of capture hardware or capture software.  Not true.  An effective capture solution, to the contrary, does not start with capturing information from an image.  Rather it starts with a well-defined process.  Capture is an extension of a process that makes things more efficient.

To give some specific examples I would like to provide four different business processes and breakdown the ‘Activity’, as it might happen in a manual process, and the ‘Benefit’, which is the result of what we are trying to achieve.  You will notice, while it’s pretty obvious, that the ‘Activity’ in each case can be slow, costly and inefficient yet many organizations continue to operate in this fashion because it’s the traditional way of doing business.  However, if you truly consider the ‘Benefit’ and know that in each ‘Process’ example below there are well established document capture solutions that can drastically improve these processes then hopefully this will drive more adoption of such a fantastic technology:

Process Activity Benefit
Contact Management Typing the information from a Business Card into Contact Relationship database You want to be able to organize and retrieve contact details
Expense Management Entering the information from a receipt into an Accounts Payable system You want to get reimbursed for your expense
Invoice Management Manual Data Entry of vendor, terms and total information into ERP application The organization would like to realize pre-pay discounts
Inventory Management Keying the line item details from a Packing List into inventory system The business can be more efficient by making product available for sale quicker

capture begins with process_network

Building an effective capture solution:

Part 1 of 3 (User Experience/Device/Interface)
Part 2 of 3 (Capture/Processing/Transformation)
Part 3 of 3 (Storage/Business Policy/Workflow)

 

Your killer SaaS app

Is your SaaS value proposition convincing enough without automatic data entry? 
Imagine you’ve just created the next ‘killer’ Software as a Service (SaaS) app and you are absolutely convinced your new software service is going to revolutionize a particular industry or solve a significant pain point for organizations all over the world.  You create some compelling sales and marketing materials with a heavy emphasis on Return on Investment.  After all, you have conviction that your service is going to help businesses decrease operational costs, improve worker productivity and provide much better access to information which all translates to achieving tangible payback on your customer’s technology investment.
So you’ve done your research, you’ve developed the software application; you created awesome marketing materials, assembled a sales team and created a terrific support structure but for some reason your totally revolutionary SaaS application just isn’t selling as well as you had hoped.  Do you think that you might be overlooking a feature or function that is so fundamental to providing tangible Return on Investment that customers simply cannot say “No” to immediately deploying your innovative solution?
whats missing_data capture
Time is money
I might really be overstating the obvious but employers pay employees to work, not do data entry.  Whether your core expertise is in accounting, customer service or mechanical, your employer pays you to spend a majority of your time focusing on your respective skills.  However, organizations often overlook the total amount of time that is consumed with such tedious activities such as manually entering data from a bank statement into an accounting system.  Or how many total hours field service technicians are spending collecting and entering work order data into an ERP system.  These are real, tangible costs that the organization is paying.  This directly relates to unrealized business productivity and effects the financial bottom-line significantly.  Time is money and time utilized manually entering data into systems is, quite frankly, a waste.
Use cases for Information Capture
Let’s take a look at a few use case scenarios and focus on Mobile Information Capture, specifically, since there is a lot of interest in this area and there is an abundance of data to support that this is one of the greatest opportunities to achieve quick return on investment.
First, consider the industry of Field Service technicians.  According to a November 2011 study by Dave Wood of Harvey Spencer Associates (HSA) entitled “A Study of the Mobile Capture Marketing in the United States”, he cites DF Blumberg Associates as sizing the Field Service market at $225 billion in 2011 and growing to $500 billion by 2018 with nearly half of the 3 million workers using mobile productivity solutions by then.  Since a good majority of these mobile devices will most likely be equipped with a camera this translates directly into a great opportunity to provide these workers with the ability to nearly effortlessly snap pictures of objects such as work order signatures, checks for payment, assessment photos or even invoices and then automatically have the data extracted from these images to populate database fields in a Field Service SaaS application.  Just to name a few of the Field Service benefits for Mobile Capture could be enhanced customer service, the ability to realize the payments quicker and, of course, improve overall worker efficiency.
hsa
In a second use case scenario, also taking data from the same Mobile Capture Market survey, consider the Transportation industry.  For the survey, they focused on Long Haul Trucking.  They found that this particular market featured 1.9 million trucks and 1.7 million deliveries daily.  The research showed that each delivery generated a packet of documents that must be captured for invoicing, with an average of 5 pages per packet.  This translated into a total capture volume of this market of 8.5 million documents PER business day.  The types of items that needed to be captured will slightly vary depending on the particular trucking organization, yet generally documents such as Bills of Lading, Trip Sheets, Scale Tickets and Vehicle Expense Receipts were common amongst most organizations.  After some calculation of the projected number of drivers that will have access to dedicated scanners or multifunction devices, the survey predicted that approximately 400,000 drivers will have only smart phones as their primary capture device.  This presents a terrific opportunity to capture all these documents DURING the trip instead of waiting until the trip is complete which could be days, or even weeks later.
The last use case scenario shared by the HSA survey was general Capture to Cloud.  This was predicted to be, by-far, the largest growth opportunity for Mobile Capture and anyone would be hard pressed to argue this prediction.  With the prediction of 2 billion smart phones by 2018 and cloud storage vendors competing like crazy for market share, it only stands to reason that these factors are going to contribute to huge growth for Capture to Cloud applications using mobile devices.
Bringing easy to use, yet highly-effective Ubiquitous Information Capture into the mix
Now that you have your killer SaaS app ready for prime-time.  Your story is polished and you are earning business because your SaaS application is addressing customer pain points such as decreasing operational costs, improving worker productivity and providing better access to information.   You can prove, without a doubt, a tangible Return on Investment with reduced labor costs associated with manual data entry and you recognize the unbelievable potential in the Mobile Capture market, so the question begs, ‘what do you do to make your SaaS application even more appealing to potential customers?’
current solution offering
‘Add Data Capture to you SaaS’ is the answer.  It’s really that simple.  The technology has evolved over the past couple years so that the technology offers extremely advanced features and functions that are completely transparent to the users themselves.  This helps achieve a pleasant user experience which helps drive adoption of the solution among users.  Additionally, the behind-the-scenes technology is performing tasks traditionally done by humans so the processing is highly effective from an automation standpoint.  The user simply snaps a picture and this technology can automatic recognize the type of document and will intelligently extract all the information from the image.
enhanced solution offering
With this new Data Capture capability not only will your SaaS application provide a much more elegant user experience but you can absolutely guarantee cost savings to your customers with the quantifiable amount of time that is recouped by not having users do manual data entry.  The benefits of your SaaS can be incrementally increased with this new Data Capture capability.  Overall you can offer a truly appealing ROI story before you even being to discuss all the wonderful capabilities of your particular application.  The additional features are just like icing on the cake to solidify the sale.

Total Hours x Dollars per Hour = Tangible Cost Savings

This helps achieve a few things in your favor as the preferred software vendor of choice:
* Encourages your customers to make a quicker decision on purchase and implementation of your solution because every day they choose not to make a decision they are squandering money and resources
* Helps differentiate your application from competitors with valuable business functionality that makes the user experience much more enjoyable and helps drive higher adoption rates
* The likelihood of selling more subscriptions to your customers is higher because they can justify adding more licenses due to the fact that they have proven ROI
uic_large
So, are you ready to take your killer SaaS app to the next level with Ubiquitous Information Capture?