A “cloudy” future for document capture

Hearing a phrase such as “cloudy future” immediately conjures up bad thoughts and gloom-and-doom scenarios.  However, in the case of document capture “cloud computing” is bringing extremely positive change.  In this post I would like to break down the basic components of “cloud computing” and explain how document capture into “the cloud” is appealing for several reasons including scalability, interoperability and usability.  Simply put, the “cloud” = Infrastructure + Content + Users.  Using cloud computing is not magical or mysterious, yet it is a topic of great discussion and, might I say, confusing. Accessing data “in the cloud” is not too unusual from what most of us do every day;  E-mail, accessing web sites or even contributing scanned images to an ECM system.  While I don’t want to dive too deep into the general benefits and appeal of cloud computing, in each of the sections below I hope to describe a unique way in which utilizing the cloud as it relates to document capture and ECM can be beneficial for organizations of all sizes.

Existing Internet Infrastructure

Probably the easiest understood component in “Cloud Computing” is the existing infrastructure that most of us are familiar using with whether we consciously know it or not.  The fact of the matter is that data still needs to reside on a computer server somewhere.  In other words, it’s not technically stored in some magical cloud.  This data still needs to be hosted somewhere on high-powered servers.  Typically in a data center with a climate controlled temperature, backup generators in case of power outage and high security.Ever use Hotmail.com for e-mail?  Or, browse to www.KevinNeal.com using your internet browser?  Access your Blackberry messages on your handheld device?  These are all examples of hosted applications.  What is somewhat unique about hosted “cloud” applications, as opposed to traditionally hosted applications, is that at their core most cloud applications offer industry standard communication protocols to enable a wide range of open interoperability.  Basically it’s two completely different systems talking the same language.  To illustrate my point let’s use the HTTP protocol as an example.  What was probably the single most reason for the explosive growth of the internet over the past few decades?  It most likely was the fact was that two systems (your computer) and a web site (hosted/server application) had a common language to communicate by the means of an internet browser such as Internet Explorer, Firefox, Safari or Chrome.  Look at the top of this web page you are viewing now.  See the “http://” prefix before the KevinNeal.com address?  This is an example of you accessing hosted information via the HTTP protocol and using advanced technology that was completely transparent to the you as the user. 

To over simplify things, my point is that cloud computing is really nothing more than a collection of many hundreds of thousands, if not millions, of applications available on the internet.  The truly powerful concept of cloud computing and what has peaked the interest among users and vendors alike is the opportunity to “mash-up” or bring together the best-of-breed technologies from various sources to build powerful applications.  As it relates to document capture, many organizations are considering “cloud” for their Enterprise Resource Planning (ERP) systems, Customer Relation Management (CRM) portal or even their Enterprise Content Management (ECM) repositories.  Scanning documents, with relevant metadata data extracted using document capture technology, into these various systems helps drastically improve efficiency.

Content Creation

There is an unbelievable amount of content available in the cloud.  Believe it?  Anything you can access over the internet whether it be public content or private content should be considered part of the available cloud-content.  What information an organization chooses to include as their available content is certainly up to their specific requirements but do not underestimate the value of these resources.  From a document capture and ECM perspective, the most valuable content to businesses and organizations, of course, is their intellectual properties and not just random data found doing an internet search.  Specifically, this could be their internal customer contacts, an accounts receivable database or their inventory management system.  All of this data is unique to the organization and the value of sharing among other employees and/or other departments helps to greatly improve process and the “cloud”, over the internet, represents a low-cost means to efficiently share this information.When organizations embark on a cloud strategy content is created in a wide variety of ways.  The content could be electronic files such as spreadsheets, word processing documents, presentations, video or even e-mail.  Additionally the content could consist of scanned images and metadata extracted from these scanned images.  Regardless, the challenge is to make this content available via search in order to find exactly what a user is looking for as quickly as possible.  This is the reason organizations should carefully consider a well thought-out taxonomy and metadata strategy for all of their content.  After all, just dumping a bunch of scanned images and other content into the cloud is not an effective strategy when making it easily accessible to users is tremendously effective.

User interaction with data in the cloud can be a significant benefit for cloud applications.  Anyone that has any level of computing experience can use a web browser and this is the means (user interface) that most cloud applications utilize to deliver content to users.  Not having to install software, do any special configuration and the ability to have quick user adoption/acceptance of this new technology are all major benefits.For users that need to create content to be utilized within cloud applications there are several document capture methods including Manual Indexing, Automatic Indexing and Network Scanning which can be deployed depending on an organizations specific requirements.Cloud computing can offer extremely powerful and innovative applications to users and there is a lot of advanced technology behind the scenes.  However, from the user perspective, whether they are consuming information within a web browser or whether they are contributing scanned documents and relevant metadata, this advanced technology should be completely transparent to the users themselves in order to be effective.

Emerging Cloud Applications & Services
Hopefully I’ve done a decent job of demystifying the “cloud” and broken it down into it’s core components in a easy to understand way in this quick cloud overview.  Now I would like to briefly elaborate on the opportunity of document capture for Emerging Cloud Applications & Services.  In essence, everything described above was logical, had structure and most people are familiar with how to use.  Internet applications and services such as e-mail, browsers and social networking sites all make sense and are easily understood.  What is not easily understood or defined by most is how to implement an effective a cloud strategy.  I can appreciate this struggle because the cloud is new, emerging and dynamic.  What a cloud application might be today can be drastically different in just weeks for sophisticated integration/functionality or literally minutes for simple expansion or additional functionality.  This is because adding new functionality or capability to an open cloud platform is far easier than in the in the past using standard communication protocols as were described above in the HTTP example.  Most cloud applications utilize HTTP, Web Services, XML, SOAP, REST and other common standards to reduce development time, decrease costs and eliminate unnecessary complication.Cloud applications and services are developing quickly and will become exponentially powerful as different technologies are collaborated.  As more and more organizations rely on the cloud to reduce on-premise IT infrastructure there will still be a need for scanning hardware to digitize documents into the cloud.  Therefore, the near term future for document capture and scanning into cloud applications is extremely bright.If I was vague about what a “cloud application” is and you are looking for a definition, well, I would suggest there are many opinions that can be found with a simple internet search.  I, however, once read an article about how an industry expert was asked to define “the cloud”.  After he pondered the question for a bit he finally came to the most appropriate definition he could think of and it was just one powerful word;  Innovation.

Putting it all together

Cloud Computing presents a great opportunity for document capture.  For organizations that are convinced a cloud approach is in their best interest, hopefully they can realize that in order to maximize their investment to the fullest all the important information still trapped on paper documents in file cabinets and desk drawers must be added to their cloud applications available content.The most important and relevant data in the cloud is your organizations intellectual property and an effective document capture strategy can contribute greatly to providing quick and accurate access to information.

I’m predicting a “cloudy” forecast for document capture…..and this is a really good thing.  As always, I encourage any constructive feedback or comments.Sincerely,Kevin

Document Capture from the user’s perspective

Sometimes it is not the technology itself that dictates either the success or failure of a particular technology.  I believe that the “user experience” helps drive adoption of a particular technology or ultimately will bring its demise.  Let me give you a few examples. Microsoft Windows:  Ask yourself this question; self: Was Windows the most robust and feature-rich operating system when Microsoft introduced Windows in the early 1990’s? Probably not, but what Microsoft clearly understood was that the Windows Graphical User Interface (GUI) and ease of use from the user perspective was going to be a key to their success.  Microsoft Windows now dominates market share among operating system software available in the market today.  The next example is the iPhone and iPad.  Unquestionably two extremely successful products released by Apple in recent years.  Most people will agree that the elegant User Interface and ease of use is one of the driving factors for the success of the iPhone and iPad.  My point is Document Capture vendors, both hardware and software, as well as even system integrators, should carefully consider how the user themselves interact with scanning applications and Enterprise Content Management (ECM) systems.  True adoption of a technology only happens with users fully embrace the technology wholeheartedly. 


Businesses and organizations scan documents to capture information – not because it’s a fun activity like playing World’s of Warcraft on a Windows operating system; updating ones Facebook status on an iPhone or even watching a hi-def movie like Avatar on an iPad.  Document Capture is implemented for several reasons including reduced operating costs, improved efficiencies or adherence to compliance. However, “fun” is clearly not near the top of the list.  We must take this into account when presenting users with various methods of document capture. Therefore, I would like to share some of the common techniques that are used to scan documents into ECM/ERP/CRM/EMR systems.  These three general methods (manual indexing, automatic indexing and network scanning) of capture are intended to illustrate various ways to accomplish capturing scanned documents to these systems however; the specific techniques utilized will vary depending on individual organizations requirements.  


Three methods of document capture 


Manual Indexing offers a simple and cost effective way for scanned images and associated search words to be imported into document management systems or simply to make access to these scanned images easier.  In order to provide ECM users with relevant search results instead of vague results, metadata must be associated with documents.   Adding metadata to documents is a critical step in making an ECM system effective and not just simply an electronic replication of a previously paper-based system of disorganization.  The general concept of Manual Indexing allows a user to scan a document, choose a destination directly within the ECM Library then manually (as opposed to computer-processing) type metadata for that particular document type and then release into a back-end system.  This is drastically different than scanning to a folder, then importing.  Scanning to a folder is not an integrated approach.  This direct communication between an ECM back-end system (server) and scanning application software (workstation) allows for real-time changes within the ECM system to immediately be applied to the scanning application software.  Once the destination/document type has been selected by the user any associated metadata or search terms, are dynamically presented to the user for indexing purposes.  These index fields are specific to each document type and business rules to establish continuity in your document capture process can be transparently delivered to scanner users without any disruption whatsoever..  A manual indexing approach to document capture is best for ad-hoc use or low volume scanning requirements such as a knowledge worker scanning an occasional document where the amount of index fields is limited to under 50 total fields per day as a best practice.  Anything more than 50 total fields per day becomes quite tedious and should dictate consideration for some level of automation within a document capture strategy. 

Use scenario: 

  • Ad-hoc
  • Low volume
  • Desktop environments


Benefits of Manual Indexing: 

  • Easy to learn
  • Simple to deploy
  • Inexpensive

Scanner requirements: 

  • Paper handling
  • Image enhancement
  • Reliability



Automatic Indexing into ECM systems provides a way for organizations to gain additional productivity with the ability to scan large quantities of documents at a single time without interruption of the scanning process.  With this approach the scanning, indexing and release into the ECM system is more automated and highly efficient which is ideal. However, it typically requires some level of technical expertise to install, configure and use these software packages. 


In the case of automatic indexing, image quality is typically much more important than with the manual indexing approach.  This is because often times the system utilizes advanced technology such as Intelligent Document Recognition (IDR), Optical Character Recognition (OCR), or Enhanced Bar Code (EBC) Recognition to allow a computer to make decisions based on the accuracy of a collection of dots, or pixels, on a scanned image.  If you truly break-down document capture to its core an image is nothing more than a collection of dots.  A collection of dots then compose characters and then characters formulate words.  And then, eventually, you have a document containing many of these elements.  The whole entire capture process is directly affected by the quality of the scanned image and, therefore, excellent image quality is essential to the success of an automatic indexing strategy for capturing scanned documents. 

Use scenario: 

  • Centralized capture
  • Moderate to high volumes of paper
  • Process control

Benefits of Automatic Indexing: 

  • Enhance productivity
  • Immediate access to information
  • Reduce labor costs

Scanner requirements: 

  • Excellent image quality
  • Rated speeds for OCR
  • Hardware-based image processing



A Network Scanning approach to capturing scanned documents into ECM systems can use either the Manual Indexing or Automatic Indexing so the method itself is not necessarily the main appeal of a network scanning capture strategy.  Some of the many appeal points of network scanning, in contrast to USB-attached scanners, includes the flexibility of integration options, effective device management and, of course, ease of use.  Integration options using communication standards such as HTTP, Web Services and possibly even utilizing Cloud Computing infrascture can greatly benefit organizations by limiting their reliance on a proprietary vendor application or platform.  With a well-constructed network scanning platform, organizations are presented with a nearly limitless list of integration options with complimentary or even drastically disjointed systems.  All presented to the user through an ease to use, consistent touch screen interface.  Does this sound too incredibly different than the iPhone interacting with different sorts of data???  As I mentioned earlier and would like to re-iterate, true adoption of technology happens when users have a comfortable and pleasant experience. 


The flexibility of using a network scanning solution as a platform for each company/organization scanning requirement is a key appeal point for this method.  Most network scanners offer many useful features including scan to e-mail, folder, ftp, network fax and network printers.  Additionally, some network scanner platforms offer Software Developer’s Kits (SDKs) which enable third-party integration software to operate directly on the device which offers another level of tight integration possibilities to other complimentary systems and/or additional functionality.   And probably one of the most appealing attributes of network scanners, are the large high resolution/color touch screen interfaces.   This is truly innovative for users to interact with data directly in the ECM system via the touch of the screen.  These devices offer users an easy to operate and highly functional scanning experience that allows workers to get their scanning done quickly and efficiently. 

Use scenario: 

  • Shared environments
  • Remote locations
  • Multifunctional purpose
  • Platform for emerging technology

Benefits of Network Scanning: 

  • Consistent process
  • Limited learning curve
  • Easy deployment
  • Effective device management

Scanner requirements: 

  • Intelligent scanning
  • Large touch screen
  • Central Administration
  • Third-Party integrations and connectivity

In summary, I hope that you can appreciate the value of carefully considering the importance of the user experience when developing your document capture strategy.  The behind-the-scenes technology can be the best in the industry but when resistance among users exists then true adoption suffers causing terrible inefficiencies.  Or, you might still be able to find a copy of the OS/2 operating system for those fancy cell phones…