Finding Needles in Haystacks
I think there's a great opportunity waiting for someone to create some new software. The model has already been created, we just need to migrate the model to corporate data. Here's the problem we need to solve - I need to be able to find data in my company, but I need to find it in particular context.
There are a large number of existing solutions to finding data, but most of them rely on some sort of a brute force approach. Google and other search engines can help find information in a database, but they aren't that great at finding information with a specific context. So if I run a search in my organization, assuming I have some way to search across the multiple data systems that exist in my organization, I'll get a "hit" for every occurrence of a search term. However, if the search term is customer inquiries, I may get a training manual which defines how to handle customer inquiries, pointers to a database on total customer inquiries, a word document about specific customer inquiries and so forth. This is like searching for a needle in a stack of needles. The search is easy if any needle will do.
I can begin to bring human reasoning and cataloging to the effort by creating a Yahoo like directory, relying on humans to create links to information. But that begs another question - who decides what to link to and how to link? What rules do we follow if any? And how much of the information in a firm can accurately be catalogued and linked to by humans. Let's assume that in any knowledge based business, an individual generates documents, receives email and other information, review documents and information from colleagues, business partners and downloads material from the internet or other sources. For grins let's assume that an average knowledge worker generates, reviews, stores or downloads over 5 MB of data per person per day. In even a small firm (under 50 people), this means we are adding 250 MB of data every day, or close to a gigabyte a month. In larger firms - who knows. Humans can't possibly catalog and create hierarchical links to all this information.
Another way we've tried to manage the information we know is by using Wikis. Wikis tend to grow rapidly as information that one person finds valuable ends up on the Wiki, and others follow suit. What tends to end up on Wikis are short instructions - how to change your 401K investment options or how to code a specific bit of code. Wikis are great for concepts that you use ocassionally and often bug your colleagues to help you with.
So, a Google concept might not work because it does not provide contextual meaning to a search. Human cataloging is problematic given the volume of the data and the speed we are generating the data. Wikis are great for some knowledge management but are not very structured.
What we need is an application that can crawl our data and read tags that we associate with the information as we create or update documents. It seems to me that the tags we use for blogs can be used to add as "metadata" to the files we create. Then, we could create a crawler to crawl the files and establish links. A "google" like solution at that point could indicate the relative importance of certain documents through the number of links, but a search could include contextural information so you could look at documents that were important and met some contextural hurdles.
This approach still requires a human - to generate and enforce a set of consistent tags. If you look at Del.ici.ous for example, blogs are filed by individuals by the tags each individual decides. So one blog might be tagged with productivity and innovation by two different individuals. The "metadata" to control the search and provide context needs to be developed in advance, regulated and monitored. Good metadata and consistent tags could make finding information and files in your firm a lot easier.
I think there's a great opportunity to bring some of these existing web technologies together to create a powerful new application to improve knowledge management immensely in larger firms. I'm very interested in finding specific needles in the needle haystack. Is anyone working on anything like this?



Recent Comments