EMC/Documentum ApplicationXtender presentation/webinar

This was a presentation I created and presented in conjunction with our software technology partner, EMC Corporation, on the topic of document scanning, capture and then utilization in their Document Management system called ApplicationXtender.

emc applicationxtender

 

 

 

 

 

 

 

 

 

 

 

Click here to view or download the PDF file.

The logic of document capture

Indexing, Metadata, Keyword, SharePoint, Capture, Scanner, Documents, ECM, Content Management

What is wrong with the collection of words above?  Well, it’s a collection of terms that are closely related but have no logical structure in order to be of value to anyone reading them.  In order for these words to be valuable in terms of readability for context they need to be logically organized into a sentence.  The logic of document capture and Enterprise Content Management is much the same.  In this blog post, instead of going into the nuts and bolts of document capture I thought it is more important to discuss two critical components to your overall success, or failure, of your content management strategy.  These two critical components are taxonomy and metadata.  This is philosophy and not technology.

To break down document capture in its simplest form, just think of this as the process of extracting information from a document and making that information available in the future.  The future could be immediate where a scanned invoice, for example, immediately kicks-off a payment process.  Or it could be two weeks from now where a customer service agent needs to retrieve a signed airbill for a proof of delivery.  The point is that document retrieval is based on some unique keyword or a set of keywords related to a particular document.  In the case of the invoice it could have been the invoice number and in the case of the airbill it could have been the shipping tracking number.

If you do not consider a well thought-out strategy then your organization could have accomplished the task of taking an organized paper mess and simply converted it to an electronic mess.

Establish a well thought-out taxonomy

Taxonomy is defined as classifying organisms into groups based on similarities.  Why is taxonomy relevant for document capture?  For several reasons, including security, quicker access to information and retention policies.  So, if you work backwards in the methodology of how and what, technology to implement for your document capture solution a solid consensus of the end result is of paramount importance.  The end result is typically a high-quality scanned image conducive for data capture (OCR, ICR, OMR, bar code, etc.) and the metadata itself.  So if your taxonomy has organized methodology then it should assist in making your document capture strategy fairly obviously.  Let’s take security as a benefit for a well thought-out taxonomy strategy.  By segregated documents based on a logical taxonomy, organizations are afforded an addition level of comfort knowing that a set of security policies can be applied to, for example, Human Resource, documents allowing access to everyone for a general set of available scanned documents such as the café menu which is clearly not a information sensitive document.  Additionally, another benefit of a well thought-out taxonomy is quicker access to information for users.  Many content management software applications and search engines use a ‘crawl’ method to check newly added content and add them to an index (database) which is then searchable.  As you can imagine, common sense and logic dictates that ‘crawling’ a more narrow scope is much quicker to keep the database up-to-date, but also access times could be considerably less by not having to search the entire database and only the relevant data indexed.  This makes access to data quicker.  Lastly, in regards to retention policies, having your data well organized is a major benefit for this area.  Imagine that an organization has all of their tax documents properly electronic stored via a well thought-out taxonomy in their content management system.  If they did then easily, and within corporate governance standards and policies the organization can removed these images from their repository based on a retention schedule.  So, as illustrated, investing the time to develop a strong taxonomy is important for many reasons including security, searchability and retention.

It is extremely important to not over look this important concept when planning out a document capture strategy.  A simple taxonomy might be organized like below:

  • Accounting
    • Accounts Receivable
      • Check
      • Statement
    • Accounts Payable
      • Invoice
      • Receipt
  • Human Resources
    • Applications
    • Resumes
    • W2 Forms

taxonomy

Considering a well thought-out strategy might seem cumbersome in the initial stages of establishing your document capture strategy, but it can save organizations significant time, money and aggravation in the long-run.  As a best document capture practice it is important to establish a solid taxonomy for scanned documents and also re-evaluate the strategy as it relates to taxonomy as any new documents are introduced within your organization.

 

Consider what information is important, and what is not

Creating Searchable PDF’s is one form on document capture; however, it is not always an ideal document capture strategy.  While sometimes, in certain situations, creating Searchable PDF images of your scanned documents is the right approach for an organization sometimes this technique of document capture often creates inefficiencies.  You might be thinking to yourself how could creating a fully Searchable PDF with all the words of the document indexed be construed as being inefficient?  Let me elaborate.  When creating a Searchable PDF the scanning software does its best job possible to recognize every single character and every single word on a page.  This might sound appealing but let’s consider the possible results in real-world applications.  Imagine that an organization in the insurance business scans as little as 100 single-page documents and creates Searchable PDF documents.  Then they want to retrieve a document based on a keyword so they use the word “claim” in their search criteria to find a document a user is searching for.  As you can imagine the user would most likely be presented with a long set of links to possible documents but only one is the important document they are looking for and the rest is “irrelevant search”.  This is because the entire page was indexed via the Searchable PDF method.  Alternatively, if your data capture strategy had included only extracting “relevant search” terms that apply to a particular document then you make the organization much more efficient by being able to find the data you have requested much quicker with the first search.

One of the other significant benefits with an integrated document capture/content management strategy is that often times any sort of metadata fields created, and rules applied, in the content management system can be brought forward and applied into the document capture system itself.  For example, if an organizations’ policy dictates that on a healthcare insurance form that for a metadata field the social security number is required and can only be nine characters long of numeric characters, then directly in the document capture system these rules can be enforced.  This allows for great business continuity and consistency in your data capture process.

An analogy I like to use is go to your favorite internet search engine and enter in a vague term such as “taxonomy for document capture” then you will get a long list of ‘hits’ that probably are not of interest because you might be looking for a specific piece of information, or a scanned image.  In the contrary, if the user enters-in a more specific term such as “aim document taxonomy” then the focus of the search is narrowed down to a more relevant list of potential information the user is searching for.  This is an example of relevant search versus irrelevant search and it’s all related to applying metadata to web pages, electronic documents and, yes, especially scanned images.

Summary: Organized taxonomy + relevant metadata = Efficient process

In summary, my point is to carefully plan out your document capture process.  Pay close attention to developing an effective taxonomy for your documents.  Determine what information is important on a particular document and what is not.  Document capture technology has evolved to nearly magically proportions but, the truth is that organizations can still greatly help their efficiency and content management effectiveness through careful planning; after all there still is logic to document capture.

Do you have thoughts of the topic of document capture, taxonomy or classification?  Please share your comments.

Increase ECM Automation Processes With Higher Resolution Scanning

Source: Business Solutions Magazine

kevin_neal_for_web

Written by: Kevin Neal, product manager – production scanners, Fujitsu Computer Products of America

When we talk about software automation, it’s safe to say that we truly live in remarkable times. Automation, as it will be referred to in this article, can be defined as allowing a computer to accomplish tasks that traditionally took human intervention and/or action to complete.The rapid adoption of automation via software is driven by several basic technical factors, including high-powered, affordable CPUs (more cycles and lines of code executed per second), drastic increases in memory capacity in conjunction with reduced prices, as well as the ever-evolving intelligence within software packages. The computing resources behind all of the advancements are helping to reduce costs, improve efficiencies, and assist with compliance and regulation.

Software automation is becoming more pervasive among ECM (enterprise content management) and document scanning solutions. The virtue of implementing ECM solutions has historically been cost reduction, which could have meant decreased headcount or reallocating employee resources to other business units. It may even have been tangible costs such as reducing mailing and shipping charges, eliminating expensive fax transmissions, or increasing physical storage space too, by removing cabinets and file drawers.

Because of computing advancements, businesses and organizations are no longer asking the questions of whether ECM systems are truly viable. Instead, they are asking more pointed questions about how much the return on investment is and how quickly they will realize the ROI. In fact, according to Gartner, Inc. the worldwide ECM software market is expected to grow more than 12% per year through 2010, from $2.6 billion in 2006 to more than $4.2 billion in 2010. These days, it’s more about which hardware, software, and services best fit the needs rather than whether or not to put a solution in place.

With most of the pain points of the DIP (document image processing), DIM (document image management), and/or ECM solutions behind us, we now have an opportunity to do more remarkable automation tasks with software. But the success or failure of the entire system is closely tied to the ‘on-ramp’ of electronic document automation and your document scanner, in particular. In the next few paragraphs, I’ll examine several important software automation solutions from some of the premier forms processing and capture software companies in the industry.

High Resolution Maximizes Recognition Results (Contributed by ABBYY)
When scanning for OCR (optical character recognition) or data capture, start with an excellent quality original. This may be the single most important consideration to achieve optimal results for recognition and capture, as well as for the purposes of long-term preservation. In fact, using a high-quality image takes on increasing importance as more users depend on electronic documents to take the place of paper-based originals because of the searchability and cost savings. On the downside, once scanned, the paper document is often no longer available — so it is important to retain maximum quality from the outset.

Today, 300 dpi (dots per inch) color remains the gold standard for scanning. However, high-quality grayscale is an option when color is not achievable (since color scanning often results in 32-bit files). Whenever possible, maintain color images. Color provides additional depth, which enhances the ability of recognition software to gather additional information about the scanned document in order to maximize accuracy. In short, consider quality first when scanning for recognition and archiving.

Classification Of Forms (Contributed by ReadSoft)
Organizations are turning to one portal for all incoming documents — no matter if they arrive on paper or in electronic form. Technology is available to automatically sort incoming documents and classify them according to case. This enables the simple inputting of all incoming mail into a scanner (without any separator sheets) and lets the computer sort the documents. If documents arrive in electronic form, they are also easily incorporated into the flow. By digitizing paper documents through high resolution scanning, users can easily search and retrieve all incoming mail. What will this do for an organization? Efficiency increases when each and every document is distributed correctly. Fast access to status reports and audit trails gives users better control over information flow. In addition, a smooth integration with back end systems such as customer management applications, databases, and archives boosts the performance of IT systems. The overall result of high resolution scanning is automated classification and sorting — less need for document preparation, one portal for all incoming documents, (paper and electronic), electronic distribution to authorized staff, and control of information flows.

300 dpi — Friend Not Foe For Automated Document And Data Capture (contributed by AnyDoc Software, Inc.)
The idea that scanning documents at 300 dpi will create backlogs and bottlenecks within automated document and data capture solutions is an outdated myth. In fact, within many solutions, product settings default to 300 dpi to maximize character recognition with little or no adverse impact on processing or transmission speed or storage capabilities — and with a great positive impact on recognition accuracy. And, when processing healthcare forms such as explanation of benefits (EOB), Health Care Financing Administration (HCFA) and Uniform Bill (UB04s) — known for their notoriously small font and extremely high character density per page, proper resolution is critical. At a 300 dpi setting, recognition engines are optimized and file size is still very manageable. Because the average size of a 300 dpi 8.5” x 11” bi-tonal TIFF image is 40 KB, it means approximately 3,000,000 document images can be stored on a standard 120 GB hard drive.

In decades past, files competed for space that was limited and expensive, but no more. Now, a 40 KB file travels on today’s fast networks at what can be conversationally considered to be the speed of light. A lower scanning resolution can negatively impact data recognition, which is not offset by the saving of space — no longer the limited commodity it once was.

And, some of the better document processing packages will process at 300 dpi, but output at a lesser (i.e. 200) dpi, giving you the best of both worlds. Scanning at a higher resolution can dramatically improve data recognition, decrease the need for human intervention, and increase the efficiency of all downstream applications without negatively impacting electronic transmission or storage space.

More dots per inch (dot) for increased automation
So, maybe now you’re thinking — “Of course I want everything automated and I’ll scan everything at 300 dots per inch and/or color, or both.” Well, not so fast. First, we must consider the risks versus the rewards for this type of a decision as we addressed in an upcoming article entitled “Trends Towards Higher Resolution Scanning.”

To quote Gartner, “The quality, performance, and ease of use of software products will improve.” This will help drive adoption; however, an inefficient document capture solution, due to settling for anything but the most software automation, should be unacceptable these days considering the pros and cons of higher resolution scanning.

In a day and age where no two ECM solutions are built alike, and organizations have choices for software automation components, it’s important to implement the best-of-breed solutions that garner optimal automation results. Whether it is OCR, ICR (), forms processing, separation, classification, unstructured forms, bar code recognition, etc., each step in the automation process and the rest of the automation workflow is directly related to a prior event, and it all starts with document scanning. As more desktop scanners are deployed throughout organizations, there is certain to be an ever increasing demand for ease-of-use and automation. Give your ECM solution the best chance for automation success and don’t underestimate the trends towards higher resolution scanning.

For more information on topics covered in this article or more information in general please visit:

Fujitsu – http://us.fujitsu.com/fcpa

ABBYY – www.abbyyusa.com

AnyDoc Software – www.anydocsoftware.com

ReadSoft – www.readsoft.com

Kevin Neal, product manager – production scanners, with Fujitsu Computer Products of America has been involved in the document scanning/enterprise content management industry for over 18 years. He has held various customer service, sales and management positions for many hardware and software products during his career. In addition, he has years of experience installing, configuring, and troubleshooting networking components as a consultant and network administrator. Currently he handles product management responsibilities for Fujitsu’s complete line of production scanners.

– See more at: http://www.bsminfo.com/doc/Increase-ECM-Automation-Processes-With-Higher-0001#sthash.tUUIEbV9.dpuf

Crossing the ECM/Capture Chasm – ‘This is the Renaissance’

Marc Benioff, Salesforce.com CEO, has been famously quoted on his opinion of cloud computing in terms of saturation-point, as well as technology innovation, for a viable business model.

“This is the heyday of the Cloud. This is the Renaissance. We are in the Great Time. ”

…and he continues…

 “So we’re still at the very, very beginning.

We are in the first innings of Cloud Computing.

This is still the Renaissance. ”

While this is just one man’s opinion I personally happen to think he is absolutely correct.  We truly are in the first innings and, in particularly, as it relates to Capture and ECM moving to the cloud.  Future innings have yet to be played.  In this baseball analogy the convergence of old-school “traditional – behind the firewall” technology and new “innovative – cloud collaboration/mobile” technology are on a crash course of epic proportions.

Then on 9/6/2012 as Jeff Bezos, Amazon.com CEO, proudly introduced his companies’ new Kindle Fire tablet device he was quoted as saying the following:

“We want to make money when people use our devices, not when they buy our devices.”

 

Salesforce.com reinventing themselves

Let’s take a high-level look how Salesforce.com’s business has changed over the years since the company started business in 1999.  They started with their (1) core Customer Relationship Management (CRM) service and then they (2) offered a development platform.  Next, they (3) built an ecosystem of development partners, and then they created sales and marketing programs to (4) resell third-party as well as additional Salesforce.com branded-services.  All along, they have been strong in their advocacy of (5) using mobile devices so they have provided pre-built applications and also development tools for integrators to create mobile applications for Salesforce.com.

 

salesforce

 

Amazon.com reinventing themselves

Just like Salesfore.com reinventing themselves; Amazon.com has also done a great job on continually enhancing their business and the formula to success, at a high-level, is amazingly similar.  First, Amazon.com had their (1) core business of electronic commerce selling books and music items.  Next, they (2) built a platform and exposed their product information via Web Services.  Once they offered these Web Services, third-party web sites could integrate and (3) sell products directly from the Amazon.com online catalog with Amazon Affiliates.  Amazon realized their Web Services were world-class and their data center infrastructure could be additional sources of revenue so they started offering Amazon Web Services (AWS) for software developers to (4) create new applications other than just e-commerce.  And, of course, with the recent aggressive announcements with Kindle Fire, Amazon has made a huge investment in the future of (5) delivering content, over the long-term, to mobile devices as a financial business model, not when customers purchase the hardware itself.

 

amazon

 

Cloud Capture Convergence

This is not to say that this convergence of Traditional technology and Cloud technology is necessarily a bad thing and, in fact, can be quite good.  For example, ECM systems (or Systems of Record2) have a long history of positive results if implemented and governed properly.  There really is no question about this, however the truth of the matter is that with this legacy comes baggage which slows down technology innovation.  Baggage just means that there is an existing customer base that you must support and there is a feature improvement list gathered from customer feedback that is probably quite extensive.  Also, from a software architecture standpoint, the software was not engineered with modern capabilities such as mulitenancy, web services connectivity or thin client design.

 

willingness-to-innovate
However, on the complete other end of the technology spectrum you have a whole host of cloud-based, Software as a Service (SaaS) applications (or Systems of Engagement) which are highly collaborative with these modern capabilities, yet most of them lack the most basic capability in terms of enterprise-type features that have proven ROI over the years.  One of the most basic productivity-enhancing and cost-reducing capabilities missing, of course, is automatic Data Capture.  The cost of your investment is really easy to calculate just with the number of labor hours that can be recouped simply by eliminating manual data entry.  I admire these companies of being so forward-thinking that they overlook the obvious.

 

use-of-mobile-devices
 

The formula to success is rather obvious

So what’s the point of me pointing out these bold comments by these CEO’s from some of the more successful cloud companies?  The point is that both Amazon.com and Salesforce.com have quite similar business models now, yet they were born very different companies as their core business.  These companies are quickly transforming into “services” companies.  Both of these companies have fully-embraced cloud as a business model, not just a casual interest, or a fad that will fade away.  Both companies have built amazing technology and integration platforms for developers to quickly and easily create powerful applications like never before.  Each company has created two of the most thriving and robust ecosystems in computing history with partners gladly and enthusiastically promoting solutions built on these respective platforms.  Then one of the newest similarities of these two successful cloud companies is their absolute focus on using mobile devices as a delivery method for their content and services.

 

The application of the future

So now for my own bold prediction.  As these cloud applications evolve they, too, will start to incorporate core functionality such as automatic Data Capture themselves directly into their applications or mash-up software applications will be created that deliver the realization of best-of-breed solutions.  Let’s use two famous companies and describe the future of a best-of-breed business productivity software application, with specific details.  First, in the “traditional/behind-the firewall” ECM business let’s take Microsoft SharePoint Server.  Unquestionably one of the most popular ECM systems in the industry and very ‘disruptive’ since Microsoft starting sincerely promoting SharePoint as more of a true ECM solution instead of just a collaboration tool.  Secondly, in the “cloud/collaboration-mobile” business let’s take a look at Box.  Box is also a leader in their respective market space of cloud storage with high-security and easily accessable content via mobile devices.  (Admittedly, Box is a much smaller, newer start-up company but a leader none-the-less.)  ‘Where am I going with this vision?’ you might be asking yourself since you might be aware of Box’s infamous bashing of SharePoint as seen below in this billboard advertisement.  Well since these early days the rhetoric has been tempered quite a lot, in my opinion, and might I even dare to say that using each products respective strengths can help achieve the ultimate in business efficiency?

box-net

From a pure data capture and ECM standpoint, SharePoint has features that Box simply does not offer.  This includes a robust metadata framework, this also includes enterprise search and managed metadata just to name a few features that inhibit Box from serious contention if an organization requires these traditional ECM capabilities.  However, SharePoint has its own deficiencies and right now one of these areas is poor support for mobile devices.  Box absolutely excels in the area of mobile application development because their service was built with a “mobile first” mentality.  So what if we could blend the positive qualities into one to provide users with the functionality they desire on mobile, yet still adhere to traditional ECM policy and governance with metadata support?

The answer is “you can”.  Through the beauty of modern integration techniques users can now view, manage and edit documents stored in Microsoft SharePoint through the Box user interface on mobile devices.  Just imagine the enhanced productivity that can be achieved through a highly usable experience for the users themselves but also the piece-of-mind that your organization is not sacrificing critical features necessary to run an effective business.

 

convergence

 

This is the vision of the application of the future.  Remember, “we are in the first innings – This is the Renaissance.  We are in the Great Time.”

 

More information:

1Geoffrey Moore:  “Crossing the Chasm: Marketing and Selling High-Tech Products to Mainstream Customers” (amazon.com paperback)

2John Mancini:  “A future history of content management” (slideshare.net presentation)

Microsoft SharePoint – FAQs

 

1. What are the benefits of utilizing SharePoint for document imaging/ECM?

One of the main benefits of utilizing SharePoint for document imaging/ECM is the limited learning curve of both the users as well as systems administrators. With Microsoft operating systems and office applications being the primary graphic user interfaces most people are using in their organizations, it just makes sense the commonality between SharePoint and an application such as Outlook give the users a comfort level that typically does not involve complicated training. This decreased learning curve offers quicker adoption of the technology allowing organizations to focus on building out SharePoint sites for actual use which leads to tangible enhanced productivity. This is important because sometimes an organization can get burdened with months of installation, configuration and training before a system ever goes “live” which is not only time-consuming but is costly and leaves a bad impression on the ultimate success or failure of the system. Quickly demonstrating enhanced productivity through user adoption of a familiar graphical user interface within a departmental process such as invoice processing, for example, leads to a desire for stakeholders within organizations to sponsor additional departmental process improvement projects or even complete enterprise roll-outs of document imaging/ECM systems.

 

2. What is one of the most common misconceptions about scanning into SharePoint?

The idea that scanning into SharePoint is difficult or expensive seems to be a common misconception. There are more options than ever to scan a document into SharePoint and we think breaking down the high-level techniques for scanning and applying them to how organizations typically may scan into SharePoint is important.

There are three basic ways to get a scanned image with corresponding metadata, or search terms, into SharePoint. The first scanning option is Manual Indexing where users scan a document, then connect to SharePoint where a SharePoint Document Type has some associated metadata. The user types in the metadata for this particular scanned document then simply uploads the document directly into SharePoint. The second scanning option for SharePoint is Automatic Indexing. This is a more automated, but also more costly, option that is typically used to process higher volumes of documents. With the Automatic Indexing option, information from the scanned pages such as bar code values or printed characters such as invoice numbers, social security numbers or other data can be automatically extracted and sent directly into SharePoint. Lastly, a network scanning approach can involve either Manual Indexing or Automatic Indexing, however the important point about this method of scanning into SharePoint is the appeal from an ease of use standpoint for users and an effective device management perspective from network administrators. Network scanners typically are dedicated use devices where scanning into ECM systems such as SharePoint is their sole purpose; therefore, making scanning easy was a priority in their design. Features such as bright, colorful touch screens make image preview simple and easy. Integrated hardware keyboards make indexing documents quick and efficient as well. So, as we have illustrated, there are several high-level methods for scanning into SharePoint and the right method really depends of your organizational requirements.

 

3. How do you add ‘scanning to SharePoint’ functionality to a SharePoint
server? And, is it expensive?

Adding ‘Scan to SharePoint’ functionality is surprisingly simple. The wonderful thing about adding document scanning capabilities to SharePoint is that it involves no additional software installed on the server itself. There is optional third-party Imaging software that can be installed on the server to optimize performance, improve scalability and enhance search, but this is not a requirement to scan documents.

Simply install the Fujitsu scanning software application on a workstation. Then once the Document Libraries have been created in SharePoint with the corresponding metadata, or search terms, all that needs to be done is to connect to the SharePoint site and supply login credentials. After this simple configuration is completed users will never have to configure the software again. When new Document Types are added to SharePoint, or if metadata fields change, then the user will dynamically see these changes without ever having to change the scanning application.

The expense to add ‘scanning to SharePoint’ can literally be as inexpensive or expensive as an organization’s scanning volumes and/or requirements dictate. Many scanner hardware vendors provide some simple options for scanning to SharePoint in-the-box with the scanner so the expense is just the scanner itself and not additional software. However, if an organization requires a higher level of automation to do sophisticated data extraction such as automatic document recognition, document separation, then capture the data and automatically release to SharePoint, this could be a more expensive proposition. It’s important to remember that this expense can be easily justified with reduced human labor, examples of this could be the ability to take advantage of more pre-pay discounts on invoices or better customer service with immediate access to information.

Example of simply connecting to SharePoint once. Supply some basic information once then all SharePoint updates are instantly available dynamically and visible in your document scanning software application.

4. How important is document capture software compatibility with SharePoint?

It can be, but the truth is that the capabilities are limited. Also, it is important to note that no matter what version of SharePoint you have (Microsoft Windows SharePoint Services 3.0, Microsoft Office SharePoint Server 2007 or SharePoint 2010) the software must be configured before it is usable. Microsoft Windows SharePoint Services 3.0 is the free component of SharePoint that is included with Windows Server 2003 or it is a free download for customers who have Windows Server 2008.

Also, it is important to note that while SharePoint has its strengths, just like any other product, it also has its weaknesses. We have seen many situations where SharePoint is being used in conjunction with other complementary document management systems. The right solution truly depends on an organization’s business requirements and all options should be thoroughly investigated.

 

5. What capabilities are needed to help end users have a better ‘Scan to SharePoint’ experience?

There are a few capabilities that are needed to help end users have a better experience when scanning documents to SharePoint. First, image enhancement is an absolute must. Anyone that has ever scanned documents must have felt the pain of having to rescan documents for various reasons. Maybe the image quality was poor? Maybe the page was scanned upside down? Maybe only the front side was scanned on a double-sided document? These are just a few examples of situations where the user would have to stop their process to rescan these documents, which not only is a waste of time but also is costly in lost productivity terms. Image enhancement technology, which can dynamically adjust for perfect image quality and perform automatic tasks such as automatic page orientation, intelligent blank page removal, automatic color detection, automatic cropping and automatic deskew, is key to helping users have a pleasant experience. If scanning documents is a chore then users will resist using technology that is difficult to use.

Secondly, the method of capture is another critical consideration. There are several methods for capturing documents into SharePoint. Some of the common approaches are manual indexing, automatic indexing and network scanning. Manual indexing is ideal for ah-hoc or low volume scanning. With manual indexing the scanning application captures an image, then the image is presented and the user types the metadata into the fields configured on the SharePoint Server. This approach is the most cost effective, yet still adds the important step of capturing important metadata to be associated with the scanned images. Alternatively, automatic indexing is ideal for large volumes and/or when the documents have some sort of fixed content structure. For example, the Census 2010 forms have fixed structure where a particular field such as Social Security Number is always in the same place of the document. It’s easy to design document scanning templates that can automatically and quickly extract this information and place both the scanned image as well as the associated metadata directly into SharePoint. Lastly, the network scanning approach is one of the newest methods of capturing scanned images into SharePoint. The benefits of network scanning are typically about the ease of use with simple touch screen operation for the user and the ease of deployment and on-going maintenance for the administrator. A network scanner can be configured to use either the manual indexing or automatic indexing approach as described above. So, as you can see, there are several methods for capturing scanned documents into SharePoint and the right approach, or combination of approaches, really depends on an organization’s requirements levitra over the counter.

Example of Manual Indexing into SharePoint.

 

6. How are hardware vendors addressing the vigorous adoption of SharePoint?

Scanner hardware vendors are clearly trying to address the vigorous adoption of SharePoint by including some level of SharePoint integration in-the-box or even embedded into devices such as network scanners. Scanning to SharePoint has to be easy to setup and easy to use. Often times SharePoint is deployed as a document management system where this may be the system administrator’s first experience with this type of software. To help reduce the burden on system administrators, many scanner hardware vendors offer simple solutions for configuring and using the scanning software. Therefore allowing the the system administrator to focus their time learning the server-side functionalities such as creating Document Libraries, created Columns for metadata or establishing document workflow.

Example of embedded SharePoint connectivity using the Fujitsu network scanner.

7. What makes scanning to SharePoint different than scanning to any other content management repository or platform on the market today?

The user experience of scanning to SharePoint is not unlike other content management repositories or platforms available on the market today. Most scanning applications can connect directly to a repository and show index fields based on document types. Also, most scanning applications can utilize either the manual indexing or automatic indexing techniques described in question # 5 above. With SharePoint, the main appeal is the ability for the user’s to manage the overall SharePoint experience. SharePoint offers users the ability to create their own ‘sites’ without the involvement of the Information Systems department. This is basically the equivalent to your own web site where you can store all your electronic content including scanned images. Within these sites, users can create a custom page using different ‘web parts.’ For example, a user can have a news feed in the top-left portion of the page, a business intelligence chart of daily sales activity in the bottom-left, a spreadsheet of current stock prices in the top-right, and finally, a web part with point-and-click access directly to scanned images in the bottom-right. Some people might refer to this as a ‘dashboard’ specifically tailored to what information and what applications a user feels is most relevant to them.

 

8. What trends are hardware vendors and solutions providers seeing in terms of SharePoint customization?

There are several trends that hardware vendors and solution providers are seeing in terms of SharePoint customization. Both of these trends involve careful planning of the SharePoint system. We recommend that you do not rush to simply begin scanning and importing high volumes of documents into SharePoint without a well thought-out strategy. First, the ability to more effectively manage SharePoint is a big trend. Within some organizations that have migrated to SharePoint from simple shared network drives, they have found that while they achieved the intended reduction in paperwork, they have also found that now they have nothing more than another electronic mess of content. There are several SharePoint Solution Providers that are successful in helping organizations get a better handle on their SharePoint system even after it has been deployed. Secondly, and somewhat related to the manageability of SharePoint, is the importance of metadata and well thought-out document taxonomy. Metadata refers to the key search words used to retrieve documents stored in SharePoint. If an organization is not capturing the right, or accurate, metadata on associated documents then it could mean a complete failure to gain any meaningful benefit from a SharePoint system. A taxonomy provides a formal structure for information, based on the individual needs of a business. Categorization tools automate the placement of content (document images, email, text documents, i.e., all electronic content) for future retrieval based on the taxonomy. Users can also manually categorize documents. Categorization is a critical step to ensure that content is properly stored.

 

9. How does the implementation of SharePoint impact your current document management system?

Without a doubt, the implementation of SharePoint is going to drastically improve productivity or is going to become a burden to your organization. It will affect your organization either positively or negatively, but the SharePoint Effect will certainly be felt. Let me be specific.

Only a few short years ago, we think that many organizations were under the false impression that SharePoint Server contained all the same capabilities of traditional Enterprise Content Management (ECM) systems which was not the case. For example, auditing is a very important concept for ECM systems and up until SharePoint 2010, with the addition of full-featured auditing with the Compliance Details screen, Microsoft lacked this functionality that is pretty standard with most ECM systems. Additionally, another important ECM concept missing from previous generations of SharePoint was the idea of Managed Metadata (which is also a new feature of SharePoint 2010). Managed Metadata allows organizations to define a set of terms to be used in a consistent manner when applying searchable terms to scanned documents. Point-being that there were certain deficiencies within the suite of SharePoint capabilities that left organizations without adherence to compliance regulation due to the lack of auditing, for example, without the control over a consistent metadata strategy. These are a few examples where a SharePoint implementation might have been perceived to be a failure due to a lack of understanding critical ECM features organizations require.

Consequently, a solid understanding of SharePoint’s true capabilities helps organizations benefit greatly from the ability to leverage SharePoint’s core strengths. These core strengths have traditionally been focused around collaboration and portal – in other words, the sharing of electronic items such as Excel spreadsheets, Word documents, PowerPoint presentations and now, of course, scanned images. It should be noted that with some of the new features of SharePoint 2010, Microsoft is incorporating specific ECM capabilities to make SharePoint more appealing as a complete ECM solution. Many customers have shared with us that SharePoint in conjunction with other ECM software seems to be a solution that works well for them. To illustrate this point we will use a Records Management application as an example. Prior to SharePoint 2010, SharePoint lacked true Records Management capabilities such as ‘holds’ or ‘document retention periods’. Therefore, organizations could use SharePoint for their ‘active documents’ such as an Excel price list that needs updating by a team of people. These people could access the same document, check-out this document, edit it and then check it back into SharePoint for the next person to check-out and edit. However, once the spreadsheet is finalized and defined as a permanent final ‘record’ then this document would be committed into the traditional ECM Records Management system.

 

10. Can you scan to SharePoint without using another application?

No, Microsoft SharePoint does not offer any native support for document scanning. There are some creative ways to import images into SharePoint via e-mail or shared folders, however this is not ideal because there is no way to apply metadata, or search words, to those particular scanned documents. The true power of SharePoint, or any other Enterprise Content Management (ECM) system, is the ability to keep your information organized and searchable. Adding relevant metadata, and not simply a full-text OCR, to scanned images makes the system much more usable. For example, imagine you have a collection of one thousand images in your SharePoint repository and you had done full-text OCR on each document, then you search for the term ‘scanner documents.’ If you are in the scanner business then the likelihood of nearly each of those one thousand documents being presented as the potential actual document you were searching for is very high. However, if as a business rule or policy, your organization decided on a logical taxonomy to classify your documents and apply only relevant metadata then your search results would be much more pertinent to your query.

It is key to a successful ECM implementation to carefully consider the importance of applying metadata to scanned images. Otherwise you might simply replicate a current paper-based filing system with an electronic mess of disorganized and lost images.

 

11. Can SharePoint be a document management system for you, out of
the box?

It can be, but the truth is that the capabilities are limited. Also, it is important to note that no matter what version of SharePoint you have (Microsoft Windows SharePoint Services 3.0, Microsoft Office SharePoint Server 2007 or SharePoint 2010) the software must be configured before it is usable. Microsoft Windows SharePoint Services 3.0 is the free component of SharePoint that is included with Windows Server 2003 or it is a free download for customers who have Windows Server 2008.

Also, it is important to note that while SharePoint has its strengths, just like any other product, it also has its weaknesses. We have seen many situations where SharePoint is used in conjunction with other complimentary document management systems. The right solution truly depends on an organization’s business requirements and all options should be thoroughly investigated.