What is LSI (Latent Semantic Indexing)?

Latent Semantic Indexing is a system that uses a mathematical technique to understand the relationship between terms or keywords. When search engines crawl a website, they look at and collect the most common words and phrases to identify the keywords for the page. During this method, stop words (e.g. is, the, at, on, which, at, etc.) are being ignored. In addition to this, LSI is looking for words that are used in the same contexts because either they have similar meanings or have a very specific relationship. An example: when a user is looking for articles about Roger Federer, he or she will find many stories about the tennis player, followed by articles about major tennis tournaments that don’t even mention his name: search engines using the LSI system are providing articles relevant to the search query.

How It Works

Interestingly, the most frequently used words in English don’t carry content at all (prepositions, connective words, functional words, etc.) so the first step is selecting all these irrelevant words from a document, leaving only content words. Of course it’s not that simple and there are many ways to define a content word – here are some of the steps:

  1. Discard common verbs (know, see, do, be)
  2. Discard pronouns
  3. Discard common adjectives (big, late, high)
  4. Discard frilly words (therefore, thus, however, albeit, etc.)
  5. Discard any words that appear in every document
  6. Discard any words that appear in only one document

Source: http://www.seobook.com/lsi/lsa_explanation.htm

What Has Changed In The SEO World

In the past, optimizing for search engines and optimizing for users were almost two completely different things.  But recent updates (Panda, Hummingbird, Penguin, Mobile Usability) affecting the rankings had one thing in common: improving user experience. This means, the best SEO campaign is keeping the user in mind, not the search engines. Search engines are looking at engagement KPIs like time on site and pages viewed per session. The download speed of pages, the ease of navigation, and crawlability all affect the website’s online visibility (A.K.A. the rankings for different keywords).

The Main Takeaway

Without Latent Semantic Indexing SEOs would face huge challenges and users would be provided with websites that have nothing to do with their search query’s intent. These challenges are originating in the limitations of the Boolean search where there are just “true / false” values. This means, in order to retrieve information you can refine your search with AND, OR and NOT commands. This way you are either combining or excluding terms from your search.

As we all know, the human language is not this simple. The 2 most common challenges when Boolean is used for text searches are synonymy (multiple words that have similar meanings) and polysemy (words that have more than one meaning). We can imagine how using Boolean search could return irrelevant results and/or miss information that is relevant.

As mentioned before, LSI looks for phrases related to the title of your page. After a topic is identified on a particular page, it has the potential of ranking for relevant search queries (or as SEOs like to call them, partial keyword matches) not just for the exact keyword match queries.

Here’s an example if you want to take a look:
Search for “accommodations in bay area” on google.com (or just open this link: https://www.google.com/?gws_rd=ssl#q=accommodations+in+bay+area). My screenshot below shows the first organic result is Expedia’s site (under the Google local pack). They are not using any of the words I typed in the search bar but the result is perfect since I was looking for hotels in San Francisco.

Latent Semantic Indexing is a system that uses a mathematical technique to understand the relationship between terms or keywords. When search engines crawl a website, they look at and collect the most common words and phrases to identify the keywords for the page. During this method, stop words (e.g. is, the, at, on, which, at, etc.) are being ignored. In addition to this, LSI is looking for words that are used in the same contexts because either they have similar meanings or have a very specific relationship. An example: when a user is looking for articles about Roger Federer, he or she will find many stories about the tennis player, followed by articles about major tennis tournaments that don’t even mention his name: search engines using the LSI system are providing articles relevant to the search query.

How It Works

Interestingly, the most frequently used words in English don’t carry content at all (prepositions, connective words, functional words, etc.) so the first step is selecting all these irrelevant words from a document, leaving only content words. Of course it’s not that simple and there are many ways to define a content word – here are some of the steps:

  1. Discard common verbs (know, see, do, be)
  2. Discard pronouns
  3. Discard common adjectives (big, late, high)
  4. Discard frilly words (therefore, thus, however, albeit, etc.)
  5. Discard any words that appear in every document
  6. Discard any words that appear in only one document

Source: http://www.seobook.com/lsi/lsa_explanation.htm

What Has Changed In The SEO World

In the past, optimizing for search engines and optimizing for users were almost two completely different things.  But recent updates (Panda, Hummingbird, Penguin, Mobile Usability) affecting the rankings had one thing in common: improving user experience. This means, the best SEO campaign is keeping the user in mind, not the search engines. Search engines are looking at engagement KPIs like time on site and pages viewed per session. The download speed of pages, the ease of navigation, and crawlability all affect the website’s online visibility (A.K.A. the rankings for different keywords).

The Main Takeaway

Without Latent Semantic Indexing SEOs would face huge challenges and users would be provided with websites that have nothing to do with their search query’s intent. These challenges are originating in the limitations of the Boolean search where there are just “true / false” values. This means, in order to retrieve information you can refine your search with AND, OR and NOT commands. This way you are either combining or excluding terms from your search.

As we all know, the human language is not this simple. The 2 most common challenges when Boolean is used for text searches are synonymy (multiple words that have similar meanings) and polysemy (words that have more than one meaning). We can imagine how using Boolean search could return irrelevant results and/or miss information that is relevant.

As mentioned before, LSI looks for phrases related to the title of your page. After a topic is identified on a particular page, it has the potential of ranking for relevant search queries (or as SEOs like to call them, partial keyword matches) not just for the exact keyword match queries.

Here’s an example if you want to take a look:
Search for “accommodations in bay area” on google.com (or just open this link: https://www.google.com/?gws_rd=ssl#q=accommodations+in+bay+area). My screenshot below shows the first organic result is Expedia’s site (under the Google local pack). They are not using any of the words I typed in the search bar but the result is perfect since I was looking for hotels in San Francisco.

Advice:

Develop non-branded content. It will help ensure that search engines will see the connection between your brand name and the services / products you provide (your non-branded keywords). After finishing some research to identify the topic and keyword group you want to target, keep your customers / clients and your website visitors in mind: create insightful, engaging and unique blog posts.

Let’s talk.

Name