EMC/Documentum ApplicationXtender presentation/webinar

This was a presentation I created and presented in conjunction with our software technology partner, EMC Corporation, on the topic of document scanning, capture and then utilization in their Document Management system called ApplicationXtender.

emc applicationxtender












Click here to view or download the PDF file.

The logic of document capture

Indexing, Metadata, Keyword, SharePoint, Capture, Scanner, Documents, ECM, Content Management

What is wrong with the collection of words above?  Well, it’s a collection of terms that are closely related but have no logical structure in order to be of value to anyone reading them.  In order for these words to be valuable in terms of readability for context they need to be logically organized into a sentence.  The logic of document capture and Enterprise Content Management is much the same.  In this blog post, instead of going into the nuts and bolts of document capture I thought it is more important to discuss two critical components to your overall success, or failure, of your content management strategy.  These two critical components are taxonomy and metadata.  This is philosophy and not technology.

To break down document capture in its simplest form, just think of this as the process of extracting information from a document and making that information available in the future.  The future could be immediate where a scanned invoice, for example, immediately kicks-off a payment process.  Or it could be two weeks from now where a customer service agent needs to retrieve a signed airbill for a proof of delivery.  The point is that document retrieval is based on some unique keyword or a set of keywords related to a particular document.  In the case of the invoice it could have been the invoice number and in the case of the airbill it could have been the shipping tracking number.

If you do not consider a well thought-out strategy then your organization could have accomplished the task of taking an organized paper mess and simply converted it to an electronic mess.

Establish a well thought-out taxonomy

Taxonomy is defined as classifying organisms into groups based on similarities.  Why is taxonomy relevant for document capture?  For several reasons, including security, quicker access to information and retention policies.  So, if you work backwards in the methodology of how and what, technology to implement for your document capture solution a solid consensus of the end result is of paramount importance.  The end result is typically a high-quality scanned image conducive for data capture (OCR, ICR, OMR, bar code, etc.) and the metadata itself.  So if your taxonomy has organized methodology then it should assist in making your document capture strategy fairly obviously.  Let’s take security as a benefit for a well thought-out taxonomy strategy.  By segregated documents based on a logical taxonomy, organizations are afforded an addition level of comfort knowing that a set of security policies can be applied to, for example, Human Resource, documents allowing access to everyone for a general set of available scanned documents such as the café menu which is clearly not a information sensitive document.  Additionally, another benefit of a well thought-out taxonomy is quicker access to information for users.  Many content management software applications and search engines use a ‘crawl’ method to check newly added content and add them to an index (database) which is then searchable.  As you can imagine, common sense and logic dictates that ‘crawling’ a more narrow scope is much quicker to keep the database up-to-date, but also access times could be considerably less by not having to search the entire database and only the relevant data indexed.  This makes access to data quicker.  Lastly, in regards to retention policies, having your data well organized is a major benefit for this area.  Imagine that an organization has all of their tax documents properly electronic stored via a well thought-out taxonomy in their content management system.  If they did then easily, and within corporate governance standards and policies the organization can removed these images from their repository based on a retention schedule.  So, as illustrated, investing the time to develop a strong taxonomy is important for many reasons including security, searchability and retention.

It is extremely important to not over look this important concept when planning out a document capture strategy.  A simple taxonomy might be organized like below:

  • Accounting
    • Accounts Receivable
      • Check
      • Statement
    • Accounts Payable
      • Invoice
      • Receipt
  • Human Resources
    • Applications
    • Resumes
    • W2 Forms


Considering a well thought-out strategy might seem cumbersome in the initial stages of establishing your document capture strategy, but it can save organizations significant time, money and aggravation in the long-run.  As a best document capture practice it is important to establish a solid taxonomy for scanned documents and also re-evaluate the strategy as it relates to taxonomy as any new documents are introduced within your organization.


Consider what information is important, and what is not

Creating Searchable PDF’s is one form on document capture; however, it is not always an ideal document capture strategy.  While sometimes, in certain situations, creating Searchable PDF images of your scanned documents is the right approach for an organization sometimes this technique of document capture often creates inefficiencies.  You might be thinking to yourself how could creating a fully Searchable PDF with all the words of the document indexed be construed as being inefficient?  Let me elaborate.  When creating a Searchable PDF the scanning software does its best job possible to recognize every single character and every single word on a page.  This might sound appealing but let’s consider the possible results in real-world applications.  Imagine that an organization in the insurance business scans as little as 100 single-page documents and creates Searchable PDF documents.  Then they want to retrieve a document based on a keyword so they use the word “claim” in their search criteria to find a document a user is searching for.  As you can imagine the user would most likely be presented with a long set of links to possible documents but only one is the important document they are looking for and the rest is “irrelevant search”.  This is because the entire page was indexed via the Searchable PDF method.  Alternatively, if your data capture strategy had included only extracting “relevant search” terms that apply to a particular document then you make the organization much more efficient by being able to find the data you have requested much quicker with the first search.

One of the other significant benefits with an integrated document capture/content management strategy is that often times any sort of metadata fields created, and rules applied, in the content management system can be brought forward and applied into the document capture system itself.  For example, if an organizations’ policy dictates that on a healthcare insurance form that for a metadata field the social security number is required and can only be nine characters long of numeric characters, then directly in the document capture system these rules can be enforced.  This allows for great business continuity and consistency in your data capture process.

An analogy I like to use is go to your favorite internet search engine and enter in a vague term such as “taxonomy for document capture” then you will get a long list of ‘hits’ that probably are not of interest because you might be looking for a specific piece of information, or a scanned image.  In the contrary, if the user enters-in a more specific term such as “aim document taxonomy” then the focus of the search is narrowed down to a more relevant list of potential information the user is searching for.  This is an example of relevant search versus irrelevant search and it’s all related to applying metadata to web pages, electronic documents and, yes, especially scanned images.

Summary: Organized taxonomy + relevant metadata = Efficient process

In summary, my point is to carefully plan out your document capture process.  Pay close attention to developing an effective taxonomy for your documents.  Determine what information is important on a particular document and what is not.  Document capture technology has evolved to nearly magically proportions but, the truth is that organizations can still greatly help their efficiency and content management effectiveness through careful planning; after all there still is logic to document capture.

Do you have thoughts of the topic of document capture, taxonomy or classification?  Please share your comments.

Capture begins with process

Capture begins with process

As a prelude to an upcoming series of blog posts I will be posting on the topic of “Building an effective capture solution” I wanted to preface these posts and focus on the question of ‘where do I start if I want to build an effective capture solution?’.

More education, less self promotion

With information capture being such an obvious way to decrease operational costs, increase efficiency, reduce risk and assist with compliance, then it begs the question of why wouldn’t everyone be using capture?  I think the answer lies in the fact that as an industry we have done a dis-service to our community.  Every vendor’s product is the best *sarcasm*.  Everyone can offer the complete solution *eyeroll*.  Vendors compete for business on a list of features instead of a genuine desire to assist their customers become more productive *disgust*.  Of course this is a generalization and not every vendor, or person, is so self-centered but my point is that a resource such as the AIIM community, which is rich in educational information and maintains a genuine vendor-neutral stance, are too few and far between.  We need to breakdown the components of a capture solution to their lowest common denominator and share with others how to achieve an effective capture solution so that everyone can benefit from a technology that has a proven track record of success.  Breaking down the components of a capture solution involves three basic parts:  User Interface, Processing and Storage.  It’s really that simple.  Of course this is an oversimplification but those are the basic three components.

Eating my own dog food

Having spent nearly my entire professional career in the document capture/ECM industry you would think that someone like me might suggest that a ‘solution’ starts with consideration of capture hardware or capture software.  Not true.  An effective capture solution, to the contrary, does not start with capturing information from an image.  Rather it starts with a well-defined process.  Capture is an extension of a process that makes things more efficient.

To give some specific examples I would like to provide four different business processes and breakdown the ‘Activity’, as it might happen in a manual process, and the ‘Benefit’, which is the result of what we are trying to achieve.  You will notice, while it’s pretty obvious, that the ‘Activity’ in each case can be slow, costly and inefficient yet many organizations continue to operate in this fashion because it’s the traditional way of doing business.  However, if you truly consider the ‘Benefit’ and know that in each ‘Process’ example below there are well established document capture solutions that can drastically improve these processes then hopefully this will drive more adoption of such a fantastic technology:

Process Activity Benefit
Contact Management Typing the information from a Business Card into Contact Relationship database You want to be able to organize and retrieve contact details
Expense Management Entering the information from a receipt into an Accounts Payable system You want to get reimbursed for your expense
Invoice Management Manual Data Entry of vendor, terms and total information into ERP application The organization would like to realize pre-pay discounts
Inventory Management Keying the line item details from a Packing List into inventory system The business can be more efficient by making product available for sale quicker

capture begins with process_network

Building an effective capture solution:

Part 1 of 3 (User Experience/Device/Interface)
Part 2 of 3 (Capture/Processing/Transformation)
Part 3 of 3 (Storage/Business Policy/Workflow)


The Rise Of Networked Scanning

Business Solutions, September 2008

Written by: Vicki Amendola

The adoption of networked scanning is on the rise, and document imaging VARs should prepare to cash in on the opportunity.

Converting paper documents into digital data isn’t an earth-shattering phenomenon anymore. Instead, document imaging can finally claim a firm foothold as a proven strategy for VARs to use with customers struggling to improve operational efficiency and productivity, reduce administrative burdens and costs, and even achieve compliance with governmental regulations. The trend that continues to enfold the document scanner market is a migration that draws the technology from a centralized, backroom process to points much closer to document creation in distributed, or workgroup, scanning solutions.

Most analysts and research firms that cover the document imaging market agree that distributed scanning applications have become — and are predicted to remain — the dominating segment of the scanner market. Network scanners are a subcategory of this segment and, although not yet recognized as a stand-alone hardware segment, network scanning is showing significant growth year over year. A recent report from InfoTrends, a research firm that provides in-depth analysis of the document scanner market, supports the premise that network scanning is on the rise, making it fertile ground for imaging VARs. The group’s U.S. Document Imaging Scanner Survey Report: 2007 illustrates a 112% increase in network scanning use over the last three years, from a starting point of 16% in 2004 to 34% in 2007.

Now Is The Time To Sell Network-Enabled Hardware
Network scanning hardware has imaging specifications nearly identical to the dedicated scanner models found in the desktop or workgroup segments. However, the trend in imaging is bringing network connectivity into the mix, with additional network-capable scanner models being released each year. These scanners reside directly on a company’s network, rather than being attached to a dedicated PC. “Network scanning provides obvious advantages, such as those we’ve grown accustomed to with network-attached printers,” says Kevin Neal, product manager at Fujitsu.

Neal’s example of a networked printer highlights the ability for VARs to integrate a vital piece of productivity equipment directly into a customer’s network, enabling the device to be shared and accessed by multiple individuals as part of that network. Shared devices reduce the cost of the solution, a primary sales objection, by reducing the total number of devices needed. In addition, deploying fewer devices can lead to reduced maintenance requirements and can even help to land sales in cases where conserving valuable office space is a primary concern. “While networked printing has become commonplace and has become very beneficial as an efficient output device, this connectivity is now being leveraged to input information into a company’s computer systems via scanning/imaging technology,” says Neal.

For some companies, high-end digital copiers and MFPs (multifunction peripherals) have provided an introduction to the basic concept of network scanning. According to a recent IDC report, 1.54 million scan-enabled MFPs shipped in 2007. The trend has not gone unnoticed by the ISVs (independent software vendors) in the document imaging arena. Many ISVs have recognized these devices as another source of capture and, as the corporate office environment embraced the MFP, these ISVs developed solutions to capitalize on the opportunity.

Satisfy Ease Of Use And Security With Networked Scanners
Despite the applicability of the MFP as a networked scanner, it still can’t compete with a dedicated networked scanner in most cases where document imaging is the primary emphasis of a reseller’s solution. “Frequency, complexity, and larger scanning jobs tend to drive more dedicated scanning equipment for individuals or workgroups,” says John Capurso, VP of marketing at Visioneer. A dedicated network scanner eliminates the competition that can be experienced with an MFP-based solution, such as waiting for a large print job to finish before being able to scan a document to e-mail or file. In addition, despite all the advances being made on higher-end MFPs, a dedicated device can still be easier to use.

“Ease of use is a critical selling point for customers that have multiple users with different levels of technical expertise using the scanner,” says Jackie Horn, director of worldwide marketing at BÖWE BELL + HOWELL. “VARs are leveraging user-friendly touch screens and built-in features [such as one-button scanning] to make life easier for end users to simply walk up to the scanner and scan.” Many network scanners available today are incorporating much bigger touch screens than earlier models — some as large as 8 inches across — to promote ease of use. These larger screens provide a GUI (graphical user interface) on which the user can not only select scanning options, but also preview the scanned image and even enter basic indexing information.

Security is also a driving force behind the adoption of networked scanning, and it is occurring at both the device and document level. At the document creation level, network scanning is beginning to incorporate encryption capabilities to enable the creation of secure image files. For example, scanning to encrypted PDF can prevent unauthorized individuals from viewing the document. At the device level, user authentication can take many forms, including user password or even fingerprint and other biometric technologies. These options can satisfy access control by restricting device usage and can also provide audit trails by recording which authorized users have accessed the scanner and which company information was created or viewed on the device.

Networked Scanners Can Support ECM Solutions
Another trend in the network scanning market is the growing availability of SDKs (software development kits) that can be used to run customized document management systems right from the network scanner. “Although well-suited for ad hoc scanning, one-touch scan-to-job buttons on the network scanners enable VARs to establish dedicated buttons that can trigger specific workflow processes, delivering the combination of more scanning power and functionality with simpler operation,” says Michael Oliva, manager of product marketing, Canon USA. “Incorporating various connectors to third-party applications, such as SharePoint or RightFax, can simplify integration and enhance interface options between the network scanners and various document management systems.”

In some cases, network scanning has become a way for VARs to enhance existing document management systems or even form the nucleus of brand new ones. “VARs have the ability to bring the entire system architecture together: network scanner, connectivity, servers, ECM (enterprise content management) applications, workflow, access rights, and document life cycle,” says Visioneer’s Capurso. “And since every organization has different requirements, the opportunity is there to make all the components come together and function reliably.” Just as with distributed capture implementations, VARs should leverage network scanning to continue pushing the point of capture even closer to the point of document creation. Doing so will help customers realize the benefits of increased ease of use, increased information security, increased productivity and efficiency, and perhaps what is at the top of most customers’ minds today, reduced costs.

– See more at: http://www.bsminfo.com/doc/The-Rise-Of-Networked-Scanning-0001#sthash.z0GfvD3B.dpuf