What are the pros of indexed search?
Indexed searching is fast. Indexing for the first time is time consuming when dealing with a very large collection of documents and data, but once the indexing process is complete hundreds of thousands of documents can be searched in seconds across networks. Further re-indexing is much faster than the initial indexing procedure, as it only indexes new documents and data that has been added since the last indexing operation.
Unindexed searching has to look at every single document, whenever a search is completed, because it has no record of where words within a document, or even where any document are located. Each unindexed search is basically starting fresh and has no ‘memory’ of any searches that took place before it; or what has already been previously looked at.
As mentioned, each type of search has its own place in an enterprise setting. While indexed search is fast and efficient at searching large volumes of ‘known data’, this speed advantage fades to an insignificant amount when the data is forever changing or live.
The reason that indexing, and indexed search, is used so heavily within an enterprise setting is that it is able to compute and analyse documents and data.
Why should I care?
I’ll give you another scenario: A big law firm has a particular client that they have worked with many years. They wouldn’t be able to search for just documents containing the name of the client, as this would return too many results. A more detailed search would need to be done. Firstly, the name of the client, then possibly a case number, a particular word, a Microsoft Word document only filter, and finally only those documents generated in the last 12 months. An indexed search for all of the files and dates would take but a few seconds to show viable results. An unindexed search would have to look at each and every file – first check the file type, then the file date, then whether the client name was in the file, and then whether the case number was included, and finally the word that was required. A very long and laborious task.
Both indexed and unindexed searches have their place in an enterprise setting, but more often than not indexed searches are used because of their ability to search and drill-down within searches, giving relevant results, and ultimately what the user wants.
It would seem that indexed search, compared to unindexed search, is primarily used because of the speed advantage. While this is somewhat true there is another factor to take into consideration. Once indexes are created it is easier to search in particular areas of a disk, network, or infrastructure. For, example, why search your entire hard drive when you know the item you are searching for is located in a folder called “My Documents”? You can even go further with separate indexes for email, documents (both local and network based), and attachments. Anything that can be classed as a separate entity can be indexed; depending on your level of search requirements.
But then when you have several indexes how do you ensure that that all the results are included? The solution is simple – Use a ‘Multi-Search’. Using the library analogy this would be the same as searching several libraries all at once, without having to walk to each library to check each one.
What are the solutions?
Lookeen uses the Lucene feature “Multisearcher” allowing complex indexed searches to be completed all within one search. Which not only saves time but also allows for a complex search to be more directed to the area that should be searched.
Much has been said concerning the fact that indexed search is better than unindexed search because of the speed aspect. And while this is true, as mentioned previously, there are many occasions where it is not possible to build an index. Live data (news, RSS feeds, and such), that are constantly changing, would be forever trying to build an index and as such, it would take more time to build an index than it would take to complete the search that was wanted. The same can be said for a forensic search of a computer, while the search can be as simple or as complex as required the need to build an index is very unlikely due to the fact that it is a search more concerned with discovery than a search based on a predefined criteria.