A “cloudy” future for document capture

June 12October 20 KevinLeave a comment

The most important and relevant data in the cloud is your organizations intellectual property and an effective document capture strategy can contribute greatly to providing quick and accurate access to information.

Hearing a phrase such as “cloudy future” immediately conjures up bad thoughts and gloom-and-doom scenarios. However, in the case of document capture “cloud computing” is bringing extremely positive change. In this post I would like to break down the basic components of “cloud computing” and explain how document capture into “the cloud” is appealing for several reasons including scalability, interoperability and usability. Simply put, the “cloud” = Infrastructure + Content + Users. Using cloud computing is not magical or mysterious, yet it is a topic of great discussion and, might I say, confusing. Accessing data “in the cloud” is not too unusual from what most of us do every day; E-mail, accessing web sites or even contributing scanned images to an ECM system. While I don’t want to dive too deep into the general benefits and appeal of cloud computing, in each of the sections below I hope to describe a unique way in which utilizing the cloud as it relates to document capture and ECM can be beneficial for organizations of all sizes.

Existing Internet Infrastructure
Probably the easiest understood component in “Cloud Computing” is the existing infrastructure that most of us are familiar using with whether we consciously know it or not. The fact of the matter is that data still needs to reside on a computer server somewhere. In other words, it’s not technically stored in some magical cloud. This data still needs to be hosted somewhere on high-powered servers. Typically in a data center with a climate controlled temperature, backup generators in case of power outage and high security. Ever use Hotmail.com for e-mail? Browse to
www.KevinNeal.com/blog using your internet browser? Access e-mail messages on your smartphone device? These are all examples of hosted applications. What is somewhat unique about hosted “cloud” applications, as opposed to traditionally hosted applications, is that at their core most cloud applications offer industry standard communication protocols to enable a wide range of open interoperability. Basically it’s two completely different systems talking the same language. To illustrate my point let’s use the HTTP protocol as an example. What was probably the single most reason for the explosive growth of the internet over the past few decades? It most likely was the fact was that two systems (your computer) and a web site (hosted/server application) had a common language to communicate by the means of an internet browser such as Internet Explorer, Firefox, Safari or Chrome. Look at the top of this web page you are viewing now. See the “http://” prefix before the
kevinneal.com address? This is an example of you accessing hosted information via the HTTP protocol and using advanced technology that was completely transparent to the you as the user.To over simply things, my point is that cloud computing is really nothing more than a collection of many hundreds of thousands, if not millions, of applications available on the internet. The truly powerful concept of cloud computing and what has peaked the interest among users and vendors alike is the opportunity to “mash-up” or bring together the best-of-breed technologies from various sources to build powerful applications. As it relates to document capture, many organizations are considering “cloud” for their Enterprise Resource Planning (ERP) systems, Customer Relation Management (CRM) portal or even their Enterprise Content Management (ECM) repositories. Scanning documents, with relevant metadata data extracted using document capture technology, into these various systems helps drastically improve efficiency.

Content Creation
There is an unbelievable amount of content available in the cloud. Believe it? Anything you can access over the internet whether it be public content or private content should be considered part of the available cloud-content. What information an organization chooses to include as their available content is certainly up to their specific requirements but do not underestimate the value of these resources.From a document capture and ECM perspective, the most valuable content to businesses and organizations, of course, is their intellectual properties and not just random data found doing an internet search. Specifically, this could be their internal customer contacts, an accounts receivable database or their inventory management system. All of this data is unique to the organization and the value of sharing among other employees and/or other departments helps to greatly improve process and the “cloud”, over the internet, represents a low-cost means to efficiently share this information.When organizations embark on a cloud strategy content is created in a wide variety of ways. The content could be electronic files such as spreadsheets, word processing documents, presentations, video or even e-mail. Additionally the content could consist of scanned images and metadata extracted from these scanned images. Regardless, the challenge is to make this content available via search in order to find exactly what a user is looking for as quickly as possible. This is the reason organizations should carefully consider a well thought-out taxonomy and metadata strategy for all of their content. After all, just dumping a bunch of scanned images and other content into the cloud is not an effective strategy when making it easily accessible to users is tremendously effective.

Users
User interaction with data in the cloud can be a significant benefit for cloud applications. Anyone that has any level of computing experience can use a web browser and this is the means (user interface) that most cloud applications utilize to deliver content to users. Not having to install software, do any special configuration and the ability to have quick user adoption/acceptance of this new technology are all major benefits.For users that need to create content to be utilized within cloud applications there are several document capture methods including Manual Indexing, Automatic Indexing and Network Scanning which can be deployed depending on an organizations specific requirements.Cloud computing can offer extremely powerful and innovative applications to users and there is a lot of advanced technology behind the scenes. However, from the user perspective, whether they are consuming information within a web browser or whether they are contributing scanned documents and relevant metadata, this advanced technology should be completely transparent to the users themselves in order to be effective.

Emerging Cloud Applications & ServicesHopefully I’ve done a decent job of demystifying the “cloud” and broken it down into it’s core components in a easy to understand way in this quick cloud overview. Now I would like to briefly elaborate on the opportunity of document capture for Emerging Cloud Applications & Services. In essence, everything described above was logical, had structure and most people are familiar with how to use. Internet applications and services such as e-mail, browsers and social networking sites all make sense and are easily understood. What is not easily understood or defined by most is how to implement an effective a cloud strategy. I can appreciate this struggle because the cloud is new, emerging and dynamic. What a cloud application might be today can be drastically different in just weeks for sophisticated integration/functionality or literally minutes for simple expansion or additional functionality. This is because adding new functionality or capability to an open cloud platform is far easier than in the in the past using standard communication protocols as were described above in the HTTP example. Most cloud applications utilize HTTP, Web Services, XML, SOAP, REST and other common standards to reduce development time, decrease costs and eliminate unnecessary complication.Cloud applications and services are developing quickly and will become exponentially powerful as different technologies are collaborated. As more and more organizations rely on the cloud to reduce on-premise IT infrastructure there will still be a need for scanning hardware to digitize documents into the cloud. Therefore, the near term future for document capture and scanning into cloud applications is extremely bright.If I was vague about what a “cloud application” is and you are looking for a definition, well, I would suggest there are many opinions that can be found with a simple internet search. I, however, once read an article about how an industry expert was asked to define “the cloud”. After he pondered the question for a bit he finally came to the most appropriate definition he could think of and it was just one powerful word; Innovation.

Putting it all together

Cloud Computing presents a great opportunity for document capture. For organizations that are convinced a cloud approach is in their best interest, hopefully they can realize that in order to maximize their investment to the fullest all the important information still trapped on paper documents in file cabinets and desk drawers must be added to their cloud applications available content.

I’m predicting a “cloudy” forecast for document capture…..and this is a really good thing. As always, I encourage any constructive feedback or comments.

Network Scanner extravaganza! AIIM 2009

July 12August 1 adminLeave a comment

As product manager for the Fujitsu fi-6010N network document scanner I was extremely passionate about my product and never was this more apparent than at our industries largest trade show expedition every year, AIIM.

About AIIM

AIIM (Association for Information and Image Management) is the global community of information professionals. We provide the education, research and certification that information professionals need to manage and share information assets in an era of mobile, social, cloud and big data.

Fujitsu 2009 Tech Suite

For the AIIM 2009 event we had rented a “Technology Suite” which was basically a private, upstairs meeting room which was away from the busyness of the expedition floor itself where you could host quite meetings and display technology solutions in a more relaxed environment. Our network scanner was still a relatively new device at the time and was also a new market segment for us so it was decided that I could utilize the Tech Suite for whatever type of presentation I wanted to do. So with the great help of my fellow employees and the fantastic cooperation of our software technology partners we did it up big!

The overall concept was extremely ambitious indeed. What we planned was to setup a live network showing each one of our existing partner integrations. There was no hocus-pocus, hypothetical or fake about these demonstrations. Everything on display was production scan, capture, index and store into a repository. Below are photos of the nine solutions demonstrated:

1. Drivve | Image
2. Marex FileBound
3. One Touch Global Integration Server (OTIS)

4. ABBYY TouchTo
5. ImageTek Inofile
6. Hyland OnBase

7. Notable Solutions (NSi) AutoStore
8. Kofax Document Exchange Server (DES)
9. KnowlegdeLake Capture

Were you at AIIM 2009 to see the display? Do you have a comment on any of these solutions? Which is your favorite?

[yop_poll id=”3″]

Perceptive Software “Best Practices in the Hospital Admissions Process” presentation/webinar

July 12July 12 admin1 Comment

This was a presentation I created and presented in conjunction with our software technology partner, Perceptive Software, on the topic of “Best Practices in the Hospital Admissions Process”. Perceptive Software provides a Document Management/Electronic Content Management (ECM) system called ImageNow and they have great experience and expertise in many verticals and in the Healthcare market, in particular.

It was a great pleasure to collaborate them and the research I did, as well as the information I learned from Perceptive, really gave me interesting perspective. This made me extremely passionate about the critically important need for better management of information in Healthcare to improve care and even save lives.

Click here to view or download the PDF file.

EMC/Documentum ApplicationXtender presentation/webinar

July 12August 1 adminLeave a comment

This was a presentation I created and presented in conjunction with our software technology partner, EMC Corporation, on the topic of document scanning, capture and then utilization in their Document Management system called ApplicationXtender.

Click here to view or download the PDF file.

The logic of document capture

November 22August 1 KevinLeave a comment

Indexing, Metadata, Keyword, SharePoint, Capture, Scanner, Documents, ECM, Content Management

What is wrong with the collection of words above? Well, it’s a collection of terms that are closely related but have no logical structure in order to be of value to anyone reading them. In order for these words to be valuable in terms of readability for context they need to be logically organized into a sentence. The logic of document capture and Enterprise Content Management is much the same. In this blog post, instead of going into the nuts and bolts of document capture I thought it is more important to discuss two critical components to your overall success, or failure, of your content management strategy. These two critical components are taxonomy and metadata. This is philosophy and not technology.

To break down document capture in its simplest form, just think of this as the process of extracting information from a document and making that information available in the future. The future could be immediate where a scanned invoice, for example, immediately kicks-off a payment process. Or it could be two weeks from now where a customer service agent needs to retrieve a signed airbill for a proof of delivery. The point is that document retrieval is based on some unique keyword or a set of keywords related to a particular document. In the case of the invoice it could have been the invoice number and in the case of the airbill it could have been the shipping tracking number.

If you do not consider a well thought-out strategy then your organization could have accomplished the task of taking an organized paper mess and simply converted it to an electronic mess.

Establish a well thought-out taxonomy

Taxonomy is defined as classifying organisms into groups based on similarities. Why is taxonomy relevant for document capture? For several reasons, including security, quicker access to information and retention policies. So, if you work backwards in the methodology of how and what, technology to implement for your document capture solution a solid consensus of the end result is of paramount importance. The end result is typically a high-quality scanned image conducive for data capture (OCR, ICR, OMR, bar code, etc.) and the metadata itself. So if your taxonomy has organized methodology then it should assist in making your document capture strategy fairly obviously. Let’s take security as a benefit for a well thought-out taxonomy strategy. By segregated documents based on a logical taxonomy, organizations are afforded an addition level of comfort knowing that a set of security policies can be applied to, for example, Human Resource, documents allowing access to everyone for a general set of available scanned documents such as the café menu which is clearly not a information sensitive document. Additionally, another benefit of a well thought-out taxonomy is quicker access to information for users. Many content management software applications and search engines use a ‘crawl’ method to check newly added content and add them to an index (database) which is then searchable. As you can imagine, common sense and logic dictates that ‘crawling’ a more narrow scope is much quicker to keep the database up-to-date, but also access times could be considerably less by not having to search the entire database and only the relevant data indexed. This makes access to data quicker. Lastly, in regards to retention policies, having your data well organized is a major benefit for this area. Imagine that an organization has all of their tax documents properly electronic stored via a well thought-out taxonomy in their content management system. If they did then easily, and within corporate governance standards and policies the organization can removed these images from their repository based on a retention schedule. So, as illustrated, investing the time to develop a strong taxonomy is important for many reasons including security, searchability and retention.

It is extremely important to not over look this important concept when planning out a document capture strategy. A simple taxonomy might be organized like below:

Accounting
- Accounts Receivable
  - Check
  - Statement
- Accounts Payable
  - Invoice
  - Receipt
Human Resources
- Applications
- Resumes
- W2 Forms

Considering a well thought-out strategy might seem cumbersome in the initial stages of establishing your document capture strategy, but it can save organizations significant time, money and aggravation in the long-run. As a best document capture practice it is important to establish a solid taxonomy for scanned documents and also re-evaluate the strategy as it relates to taxonomy as any new documents are introduced within your organization.

Consider what information is important, and what is not

Creating Searchable PDF’s is one form on document capture; however, it is not always an ideal document capture strategy. While sometimes, in certain situations, creating Searchable PDF images of your scanned documents is the right approach for an organization sometimes this technique of document capture often creates inefficiencies. You might be thinking to yourself how could creating a fully Searchable PDF with all the words of the document indexed be construed as being inefficient? Let me elaborate. When creating a Searchable PDF the scanning software does its best job possible to recognize every single character and every single word on a page. This might sound appealing but let’s consider the possible results in real-world applications. Imagine that an organization in the insurance business scans as little as 100 single-page documents and creates Searchable PDF documents. Then they want to retrieve a document based on a keyword so they use the word “claim” in their search criteria to find a document a user is searching for. As you can imagine the user would most likely be presented with a long set of links to possible documents but only one is the important document they are looking for and the rest is “irrelevant search”. This is because the entire page was indexed via the Searchable PDF method. Alternatively, if your data capture strategy had included only extracting “relevant search” terms that apply to a particular document then you make the organization much more efficient by being able to find the data you have requested much quicker with the first search.

One of the other significant benefits with an integrated document capture/content management strategy is that often times any sort of metadata fields created, and rules applied, in the content management system can be brought forward and applied into the document capture system itself. For example, if an organizations’ policy dictates that on a healthcare insurance form that for a metadata field the social security number is required and can only be nine characters long of numeric characters, then directly in the document capture system these rules can be enforced. This allows for great business continuity and consistency in your data capture process.

An analogy I like to use is go to your favorite internet search engine and enter in a vague term such as “taxonomy for document capture” then you will get a long list of ‘hits’ that probably are not of interest because you might be looking for a specific piece of information, or a scanned image. In the contrary, if the user enters-in a more specific term such as “aim document taxonomy” then the focus of the search is narrowed down to a more relevant list of potential information the user is searching for. This is an example of relevant search versus irrelevant search and it’s all related to applying metadata to web pages, electronic documents and, yes, especially scanned images.

Summary: Organized taxonomy + relevant metadata = Efficient process

In summary, my point is to carefully plan out your document capture process. Pay close attention to developing an effective taxonomy for your documents. Determine what information is important on a particular document and what is not. Document capture technology has evolved to nearly magically proportions but, the truth is that organizations can still greatly help their efficiency and content management effectiveness through careful planning; after all there still is logic to document capture.

Do you have thoughts of the topic of document capture, taxonomy or classification? Please share your comments.

Kevin's Barnhouse

Technology, Movies, Life and Family!

Category: Imaging

A “cloudy” future for document capture

Network Scanner extravaganza! AIIM 2009

Perceptive Software “Best Practices in the Hospital Admissions Process” presentation/webinar

EMC/Documentum ApplicationXtender presentation/webinar

The logic of document capture