What is Latent Semantic Indexing (LSI)?

Though this article I am going to explain you in simple terms that what is Latent Semantic Indexing? Latent Semantic Indexing or LSI is a technique of indexing and retrieving all the pages relevant to the user’s query before the ranking algorithm is applied.

A method of retrieving relevant pages with the same text by matching the terms of a search query found in all web pages. But the problem with simple text matching methods is that they are inherently inaccurate. It happens because there are number of ways for a user to express a given concept using different words whose meanings are same and also because there are so many words that have multiple meanings. The problem of similar meanings means that the query of user may not match the text on relevant pages so they will be overlooked and the problem of multiple meanings means that the terms in the query of user will match terms in irrelevant pages often.

Latent Semantic Indexing or LSI is the solution of this problem as it looks at patterns of word distribution across the whole of the web. When it does this thing, it considers pages that have various words in common to be close in meaning (semantically close) and pages with a few words in common to be semantically distant. The result is an LSI indexed database with same values it has calculated for every content phrase and word.

In response to a query a Latent Semantic Indexing indexed database will return the pages it will best fit the search terms. The LSI algorithm does not understand anything about what the words mean and does not require correct match to return useful results.

Applications of Latent Semantic Indexing

•    Relevance Feedback
•    Archivist’s Assistant
•    Automated Writing Assessment
•    Textual Coherence
•    Information Filtering

I hope now you understand the meaning of Latent Semantic Indexing. But I will explain it more clearly in my next article by explaining its all applications.