The easiest way to understand how it works is to picture the index at the back of a large reference book. Unlike a table of contents, the index lists every page where you can find an instance of a single word. If you were looking for information about ‘publishing’, for example, you would go to the index and look up the word ‘publishing’, then visit each of the pages the index lists to find the information you need. As you can imagine, this is much, much, faster than flipping through the book page by page and checking if the word ‘publishing’ appears on it.
So, what is indexing?
Indexing is therefore defined as ‘the process of creating an index’. In computing terms, indexing is the process of creating tables or ‘indexes’ that point to the location of folders, files and other records like emails and contacts on a disk. Most desktop search indexing processes identify the location of resources based on file names, metadata, and sometimes text within a file (called a ‘full text search’). They then build an index database, or a list of keywords associated with the files, which can then be very quickly scanned by the program to tell you the location of a file or record.
How does Lookeen use indexing?
Lookeen uses an open source text engine library called Lucene to power its searches. Lucene is a full text search library, which means that it also indexes the content of documents. When you type a search term into the Lookeen search bar, it looks up that word in its index and immediately shows you where you can find the document that contains it.
Lookeen uses real-time indexing and regular updates to make sure that the index always contains the most recent content. Lookeen indexes several different types of information about your documents and records:
- Metadata – date of creation, author, file type, file size, etc
- File or folder name and its location
- Text content of files like .docx, .pdf or emails (and many more)
Lookeen takes advantage of Lucene’s ability to index all of this information by enabling you to very specifically target each of these factors using search filters. For example, instead of searching for ‘invoice’ and looking through the results yourself, you can specify that you are looking for a PDF, sent by Peter, received yesterday, containing the word ‘invoice’.
Another great advantage of using Lucene as a base is that the index it builds takes up very little storage space and, once the initial index has been built, uses very little system resources to maintain it. For a full explanation of why it’s so fast, you can watch the first 10 minutes of Apache Lucene: Then & Now.
If you’d like to learn more about how Lucene works, I highly recommend reading this great explanation from Parse.ly which breaks down the complex tech talk into ‘lay’ English.
Why should I care about indexing and search?
As I explained above, using an index significantly reduces the amount of time it takes to find the information you need. Document indexing in this way can lead to great improvements in time management and productivity for your organization simply by making it easier more efficient to find information.
Many companies see multiple benefits when they implement a search solution in their organization. This isn’t usually due to huge changes in the organization, but rather small, personal benefits to each employee, including:
- Enhanced personal productivity by being able to find the information they need quickly and accurately.
- Being able to find information directly instead of asking colleagues questions, the answers to which are already documented & saved somewhere on the system.
- Better decision making because employees are better informed due to faster and easier access to information.
As you can see, indexing is not just some boring thing your search program does in the background, but in fact a necessary and incredibly beneficial tool.
Nice post. indexing is the way of adding web pages into Google search. Depending upon which meta tag you used, Google will crawl and index your pages.