Skip to primary content
Skip to secondary content

Kevin's Barnhouse

Technology, Movies, Life and Family!

Kevin's Barnhouse

Main menu

Tag Archives: metadata

SharePoint 2010 with Box and LibraryCard

Posted on January 19 by Kevin
Reply

Use Case:  Your organization has installed SharePoint behind the corporate firewall to manage and organize your electronic content.  Also, your organization is actively digitizing paper documents via document scanners and, as a compliment to scanning to improve efficiency, desires to use Automatic Data Capture software technology to extract pertinent information from an invoice such as invoice number, bill to, ship to and total.  However, in this ever-evolving world of on-the-go and mobile you find yourself and your fellow co-workers on the road quite often which is problematic due to poor support of mobile devices when using SharePoint.  In this case you will want to be able to view, as well as contribute content to SharePoint using a highly-collaborative and easy to use service such as Box which has outstanding mobile device support.

Read more…

Posted in Cloud, Presentations, SharePoint, Technology, Videos | Tagged box, librarycard, metadata, SharePoint, sync | Leave a reply

Exploiting Big Data with Indexes

Posted on January 19 by Kevin
Reply

Use Case:  In today’s business environment, more than ever, it’s simply not good enough to be average.  Organizations of all sizes have to strive to create competitive advantages, understand trends and gain better insight into operational efficiency.  One of the most useful techniques to accomplish these goals is to Exploit Big Data through analysis.  However, this is challenging due to the volume, velocity and variety of content that must be analyzed.  Image-only files are useless in data analysis.  Therefore, in order to take the all-important first step in exploiting all of your content is to apply indexes so that computer systems can properly begin to understand the information.

Read more…

Posted in Presentations, Technology, Videos | Tagged bigdata, hadoop, index, indexes, indexing, metadata, splunk, variety, velocity, volume | Leave a reply

The SharePoint effect

Posted on January 2 by Kevin
Reply

‘SharePoint is ubiquitous’. Ubiquitous (adj.) is defined as ‘Being or seeming to be everywhere at the same time; omnipresent’.

Microsoft has certainly invested heavily on marketing SharePoint, but they have also invested heavily in Zune.  Why is SharePoint an unbelievable success and Zune….well, isn’t?

People might dispute when SharePoint truly became a legitimate ECM offering, but I would have to say that 2010 was a sincere “coming out” party for Microsoft SharePoint Server.

First, from the product development perspective specifically, Microsoft released a major overhaul of Microsoft Office SharePoint Server (MOSS) 2007 with SharePoint 2010 this year which incorporated many of the ‘must-have’ features for a true ECM solution such as List Validations and Document IDs.  Basically, and you would think this should have been obvious, List Validations means that administrators can enforce metadata rules with commonly used document capture techniques such as hidden columns, no duplicate values and column validation conditions.  While Document IDs are basically a persistent/permanent unique identification number assigned to a documents in SharePoint.  Meaning that if they are moved around within SharePoint site collections they can always be accessed by a special URL.

Secondly, Microsoft has done a wonderful job of building their SharePoint Ecosystem and, therefore, has the support of many complimentary software and hardware vendors to promote SharePoint ECM.  Generally speaking, I think just a few short years ago SharePoint was being wrongly promoted as the ‘the one-stop ECM shop’ for everything content management.  This was clearly misguided as SharePoint has many good qualities, but from an ECM perspective had many more deficiencies than it did useful features.  One obvious major deficiency even to this day is there is no native document capture, or scanning directly, into SharePoint without third-party document scanning solutions.  It’s clear that Microsoft has changed their tune on the promotion of SharePoint and now embraces the SharePoint ecosystem.  It seems that ‘enhancing the SharePoint’ experience by enabling integration to the SharePoint platform is more of a priority than it originally was as evidenced by the major turn-out and interest at the “SharePoint Partner Pavilion” at AIIM 2010 earlier this year in Philadelphia.  Complimentary vendors offering best-of-breed applications such as document capture and BPM (Business Process Management) to the SharePoint platform were nearly stampeded in their respective booths and kiosks with customers interested in product demonstrations or eager for more information about their respective offerings.


No longer do traditional ECM software vendors dismiss SharePoint as a weak solution.  Rather these long-standing ECM vendors now have complementary product development and marketing strategies because of the overwhelming interest in SharePoint is real and to deny the SharePoint effect.

I think in years past we’ve certainly heard more and more discussion about the practicality of using SharePoint for ECM but I believe for the most part it was early-adopter IT departments or organizations on a tight budget that were actually using SharePoint.  No longer is this the case.  Organizations and businesses of all sizes from small companies to large enterprise are deploying, investigating and building upon the SharePoint/Office platform.  But what is driving this demand?

Simply-put, if you make something accessible and ease to use then whatever it might be, it stands a good chance of adoption.  This is especially true when it comes to technology.  For users to learn SharePoint is not too unlike software applications they are already familiar with using such as Internet Explorer, Word, Excel or Outlook.  Since users already have some level of comfort using these applications, extending the capabilities of these desktop applications to now store electronic documents in SharePoint is probably one of the most effective ways to get users to accept and, more importantly, embrace change.

As always I appreciate the time you’ve spent to read this posting about ‘The SharePoint effect’ and how a major IT company such as Microsoft being involved in our once niche industry is helping drive adoption of technology we all know is good for business financially, terrific for the environment and improves our lives through more efficient processes.  I welcome comments, feedback and/or constructive criticism.  Please feel free to click either “The ‘No Folder Zone’” graphic below to read about one of the first trends that changed the Document Capture landscape forever of click the ‘Trends towards Network Scanning’ graphic below to read about the third trend witnessed in 2010 that changed the Document Capture landscape forever.

-Kevin

Posted in Imaging, Technology | Tagged ECM, governance, metadata, moss, SharePoint, sp2007, sp2010, windows, wss | Leave a reply

Trends Towards Higher Resolution Scanning

Posted on April 4 by Kevin
1

Napster, MP3, YouTube, iPhone and MySpace. You may be asking yourself “what does this have to do with document scanning?” In reality, not much, other than large file sizes, however when we draw an analogy between large audio files like those found on Napster such as wav files or mp3’s, video files like those found on YouTube and MySpace which you can utilize on a mobile device like the new iPhone then we can get an appreciate of the challenges of sharing large files. And because viewing document images has become just as important, if not more important from a business perspective, we need to have a clear understanding of the general technological trends of information sharing and the Trends Towards Higher Resolution Scanning.

In the perfect world of scanning technology someone would drop a document into the scanners automatic document feeder, scan the page and “voila”, all the vital business data has been automatically extracted for immediate use by an Enterprise Content Management (ECM) system or for general retrieval via a keyword search. This is similar to using a search engine to find the information we are looking for. Sounds like magic? It nearly is but there are many underlining technologies that create this magic. Technically speaking advanced forms processing, or the ability to perform these sophisticated tasks automatically, is a reality that is available today and this ‘magic’ starts with high quality scanned images which most closely resemble the original document. In Automated Forms Processing applications there is a lot going on behind the scenes where a poor or good quality image dictates the success, or failure, among other related processes in the grand scheme of the document imaging system. These features need to be highly functional, extremely accurate and transparent to the users themselves.

In a recent study of scanner users Susan Moyse of Moyse Technology Consulting summed up the current trend quite well, “Scanner users need applications that do more automatically. This requires vendors to deliver sophisticated functionality almost invisibly. The less these users know of the underlying technologies the better. Business users just want their scanning solutions to solve their problems.”

 

Traditional Obstacles Addressed By Advanced Technology

Advanced features are great from the standpoint that custom systems can be designed by systems integrators, value-added resellers or a professional services organization to fit individual business needs. An effective document capture system is a system that operator’s don’t have to think about. Capturing more dots per inch at scan time gives your scanning solution the greatest chance of automation success. Most likely no solution will be absolutely perfect, nevertheless, giving your capture solution the greatest chance at success through good image quality, more dots per inch and great paper handling can dramatically increase your level of automatic document capture.

There are advanced techniques such as automatic document classification, document separation and free-form processing, all of which greatly depend on the computer being able to read the dots on scanned pages to make intelligent, and critical decisions about these images. After all, garbage-in is garbage-out and your document capture solution is the on-ramp to transform paper to usable electronic data. Most often you get one chance to capture these images before they are filed in a permanent archive or the physical paper is destroyed forever.

To understand these trends and to develop our hypothesis for the future of document scanning we must evaluate what inhibited the sharing of large files in the early days of file sharing. While the ability to share audio, video and document images has been around a long time, this sharing was limited due to some rather common factors. The cohesion between all file formats is they have historically been large file sizes and difficult, if not impossible, to use over computer networks. Let’s take a look back into the not-so-distant past and get a glimpse at what ultimately made the likes of YouTube, MySpace and Napster, successful and what will drive the trend of scanning higher resolutions for automation. One of the most obvious drawbacks to sharing large files was the lack of bandwidth. Whether it was a remote user on a dial-up connection or corporate networks that had not the foresight to plan ahead for the sharing of large files, customer dissatisfaction was high and people were reluctant to use these services due the impending frustration of waiting for large downloads to complete. Likewise, video sharing had been, until recently, slow to adopt for many of us, however times are changing on the increased bandwidth forefront and we need to refer to history understand what limited the adoption rate of these technologies.

 

Contributing Factors to the Trends Towards Higher Resolution Scanning

Most leading Automated Forms Processing software companies recommend scanning at a minimum resolution of 300 dots per inch for effective data extraction. In other words, for every square inch of paper the scanner is capturing 300 dots horizontally and 300 dots vertically or 90,000 total dots (300 x 300 = 90,000 dots per square inch). This automation reduces manual intervention tasks such as ‘key index values from images’ which in turn decreases costs and improves efficiency. Some techniques, which you might be familiar with, include Optical Character Recognition (OCR), Intelligent Character Recognition (ICR) or Optical Mark Recognition (OMR).

Presume we settled for scanning at 200 dpi resolution. We would have captured only 40,000 total dots per inch versus 90,000. Why is this important? Below is an illustration which demonstrates how incrementally larger file sizes due to scanning higher resolutions or utilizing color. Higher Resolution Scanning equals Improved Automated Accuracy.

Trends Towards Higher Resolution Scanning

 

“The accuracy of the OCR systems declined dramatically when the resolution of the images was reduced from 300 to 200 dpi…”

Source: The Fourth Annual Test of OCR Accuracy

“Scan resolution: The number of dots per inch can affect the clarity of the image and accuracy of OCR. Recent tests found that reducing from 300 dpi to 200 dpi increased the OCR error rate for a complex document by 75%…”

Source:  http://epe.lac-bac.gc.ca/100/202/301/netnotes/netnotes-h/notes37.htm

So the question is “why wouldn’t everyone simply scan documents at 300 dots per inch?” Traditionally there have been several legitimate concerns that made higher resolution scanning unattractive to users and systems operators. This includes limited bandwidth, (as in the audio and video file size scenarios), lossy image compression technology or the physical scanners themselves might slow to two-thirds or less of their rated speed at 200 dpi scan resolutions. Lastly, the larger file sizes created by scanning at higher resolutions. Now, through advanced technologies and innovation, the document capture industry is addressing all of these obstacles, which should truly enhance the adoption rate of higher resolution scanning. Let me be specific about each:

• Increased Bandwidth for Remote Users and Corporate Networks –

For those of you that have tried sending a large file via your e-mail client, you can certainly relate to the ‘pain’ involved with sending even one file using a low bandwidth connection. Now, just imagine a customer service operator who has to retrieve hundreds of images per day during the normal course of their work day. Decreased costs and better availability to higher bandwidth networking components affords network administrators, or even remote users, to upgrade to high speed networks such as T1 internet lines, DSL, Cable Modem, Gigabit routers/cabling or even fiber optic networks. All of which bodes well for the future of sharing large size files types including audio, video and scanned images.

• Improved Image Compression Techniques of Scanned Images –

Many new image compression techniques have been introduced recently which drastically decrease file sizes of both color and black & white images while still retaining great image quality. Previously some compression techniques caused poor image quality that would drastically decrease automatic forms processing accuracy. In addition to better images and highly compressed images, technology such as Automatic Color Detection can determine whether to save the scanned images in a black & white or a color format at scan-time, thus eliminating the need to separate documents into stacks of bi-tonal and color pages. It’s much more desirable to compress a bi-tonal image than color which is an ideal example of combining emerging technologies for the benefits of users and systems administrators.

• Scanning Higher Resolutions at Rated Speed –

Just as your car’s engine is designed to perform at a maximum speed based on the combination of aggregate parts, your document scanner is only as good as its weakest link. Certain document scanners these days have been highly engineered specifically to perform at rated speeds while scanning in higher resolution modes, thus excelling at Automated Forms Processing tasks eliminating the need to sacrifice accuracy for throughput.

• Decreased Storage Costs –

When the expense per megabyte of storage cost dollars, or several dollars, per megabyte, businesses had to make a serious decision about their choice of a data storage medium. At the time, it could have been in the form of low-capacity/high-availability hard disk drives, which were expensive, optical disks for moderate-capacity/moderate-availability at a mid-range price, or tape drives which were typically high-capacity/slow-availability although the most affordable. Times have changed quickly with the evolution of CD-ROMs, DVDs and extremely high-capacity hard disk drives. The storage industry has reached the ‘critical mass’ stage where vendors are creating great technology but competing for market share which drives costs to users down. Businesses and individuals are consuming data storage devices at a greater rate and the end of this trend seems to be nowhere in sight. Increased storage capacities, smaller forms factors and decreased costs are a clear trend and portend well for storage of large file sizes.

 

Benefits of Higher Resolution Scanning to Automation

Consider that Automated Forms Processing involves computer-based intelligence to make crucial decisions concerning your scanned images. For example: Classification- What type of document? Separation- How many pages is the form? Anchor Points or Free-Form- Where is the information on the page? Quality Control- Are these characters meeting my defined accuracy criteria? Essentially, scanning hardware and software technologies have progressed to a level of automation that allows for sophisticated document capture, advanced forms processing and mission critical data extraction, all of which could be completely transparent or invisible to the user. However, this high level of automation beings with high resolution scanning. The ability to drop a document into the scanners automatic document feeder and perform these advanced tasks has become a reality without the traditional sacrifices inherent to Higher Resolution Scanning.

The trend towards more and more distributed scanning is obvious. As more document scanners find their way into the workplace, the demand for more invisible sophistication to the user must continue. Appreciate the technology; yet allow the user to be experts in their respective professions instead of having to become scanning experts as well. Capture more dots per inch with higher scanning resolutions and give your document capture system the greatest chance for success.

Posted in Technology | Tagged bar code recognition, document scanning, dots per inch, dpi, ebc, ICR, indexing, intelligent character recognition, metadata, OCR, optical character recognition, resolution, scanners | 1 Reply
  • July 2025
  • June 2025
  • April 2025
  • February 2025
  • January 2025
  • June 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • July 2023
  • March 2023
  • January 2022
  • April 2020
  • September 2019
  • August 2018
  • June 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • February 2016
  • January 2016
  • December 2015
  • June 2015
  • May 2015
  • January 2015
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • March 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • January 2012
  • December 2011
  • September 2011
  • August 2011
  • March 2011
  • January 2011
  • December 2010
  • October 2010
  • September 2010
  • April 2010
  • February 2010
  • May 2009
  • April 2009
  • March 2009
  • February 2009
Proudly powered by WordPress