What is LSI (Latent Semantic Indexing)?

by

5–8 minutes

read

Latent Semantic Indexing is a system that uses a mathematical technique to understand the relationship between terms or keywords. When search engines crawl a website, they look at and collect the most common words and phrases to identify the keywords for the page. During this method, stop words (e.g. is, the, at, on, which, at, etc.) are being ignored. In addition to this, LSI is looking for words that are used in the same contexts because either they have similar meanings or have a very specific relationship. An example: when a user is looking for articles about Roger Federer, he or she will find many stories about the tennis player, followed by articles about major tennis tournaments that don’t even mention his name: search engines using the LSI system are providing articles relevant to the search query.

How It Works

Interestingly, the most frequently used words in English don’t carry content at all (prepositions, connective words, functional words, etc.) so the first step is selecting all these irrelevant words from a document, leaving only content words. Of course it’s not that simple and there are many ways to define a content word – here are some of the steps:

  1. Discard common verbs (know, see, do, be)
  2. Discard pronouns
  3. Discard common adjectives (big, late, high)
  4. Discard frilly words (therefore, thus, however, albeit, etc.)
  5. Discard any words that appear in every document
  6. Discard any words that appear in only one document

Source: http://www.seobook.com/lsi/lsa_explanation.htm

What Has Changed In The SEO World

In the past, optimizing for search engines and optimizing for users were almost two completely different things.  But recent updates (Panda, Hummingbird, Penguin, Mobile Usability) affecting the rankings had one thing in common: improving user experience. This means, the best SEO campaign is keeping the user in mind, not the search engines. Search engines are looking at engagement KPIs like time on site and pages viewed per session. The download speed of pages, the ease of navigation, and crawlability all affect the website’s online visibility (A.K.A. the rankings for different keywords).

The Main Takeaway

Without Latent Semantic Indexing SEOs would face huge challenges and users would be provided with websites that have nothing to do with their search query’s intent. These challenges are originating in the limitations of the Boolean search where there are just “true / false” values. This means, in order to retrieve information you can refine your search with AND, OR and NOT commands. This way you are either combining or excluding terms from your search.

As we all know, the human language is not this simple. The 2 most common challenges when Boolean is used for text searches are synonymy (multiple words that have similar meanings) and polysemy (words that have more than one meaning). We can imagine how using Boolean search could return irrelevant results and/or miss information that is relevant.

As mentioned before, LSI looks for phrases related to the title of your page. After a topic is identified on a particular page, it has the potential of ranking for relevant search queries (or as SEOs like to call them, partial keyword matches) not just for the exact keyword match queries.

Here’s an example if you want to take a look:
Search for “accommodations in bay area” on google.com (or just open this link: https://www.google.com/?gws_rd=ssl#q=accommodations+in+bay+area). My screenshot below shows the first organic result is Expedia’s site (under the Google local pack). They are not using any of the words I typed in the search bar but the result is perfect since I was looking for hotels in San Francisco.

Latent Semantic Indexing is a system that uses a mathematical technique to understand the relationship between terms or keywords. When search engines crawl a website, they look at and collect the most common words and phrases to identify the keywords for the page. During this method, stop words (e.g. is, the, at, on, which, at, etc.) are being ignored. In addition to this, LSI is looking for words that are used in the same contexts because either they have similar meanings or have a very specific relationship. An example: when a user is looking for articles about Roger Federer, he or she will find many stories about the tennis player, followed by articles about major tennis tournaments that don’t even mention his name: search engines using the LSI system are providing articles relevant to the search query.

How It Works

Interestingly, the most frequently used words in English don’t carry content at all (prepositions, connective words, functional words, etc.) so the first step is selecting all these irrelevant words from a document, leaving only content words. Of course it’s not that simple and there are many ways to define a content word – here are some of the steps:

  1. Discard common verbs (know, see, do, be)
  2. Discard pronouns
  3. Discard common adjectives (big, late, high)
  4. Discard frilly words (therefore, thus, however, albeit, etc.)
  5. Discard any words that appear in every document
  6. Discard any words that appear in only one document

Source: http://www.seobook.com/lsi/lsa_explanation.htm

What Has Changed In The SEO World

In the past, optimizing for search engines and optimizing for users were almost two completely different things.  But recent updates (Panda, Hummingbird, Penguin, Mobile Usability) affecting the rankings had one thing in common: improving user experience. This means, the best SEO campaign is keeping the user in mind, not the search engines. Search engines are looking at engagement KPIs like time on site and pages viewed per session. The download speed of pages, the ease of navigation, and crawlability all affect the website’s online visibility (A.K.A. the rankings for different keywords).

The Main Takeaway

Without Latent Semantic Indexing SEOs would face huge challenges and users would be provided with websites that have nothing to do with their search query’s intent. These challenges are originating in the limitations of the Boolean search where there are just “true / false” values. This means, in order to retrieve information you can refine your search with AND, OR and NOT commands. This way you are either combining or excluding terms from your search.

As we all know, the human language is not this simple. The 2 most common challenges when Boolean is used for text searches are synonymy (multiple words that have similar meanings) and polysemy (words that have more than one meaning). We can imagine how using Boolean search could return irrelevant results and/or miss information that is relevant.

As mentioned before, LSI looks for phrases related to the title of your page. After a topic is identified on a particular page, it has the potential of ranking for relevant search queries (or as SEOs like to call them, partial keyword matches) not just for the exact keyword match queries.

Here’s an example if you want to take a look:
Search for “accommodations in bay area” on google.com (or just open this link: https://www.google.com/?gws_rd=ssl#q=accommodations+in+bay+area). My screenshot below shows the first organic result is Expedia’s site (under the Google local pack). They are not using any of the words I typed in the search bar but the result is perfect since I was looking for hotels in San Francisco.

Advice:

Develop non-branded content. It will help ensure that search engines will see the connection between your brand name and the services / products you provide (your non-branded keywords). After finishing some research to identify the topic and keyword group you want to target, keep your customers / clients and your website visitors in mind: create insightful, engaging and unique blog posts.

The CMO Who Gave Up Sales Pitches to Build Real Relationships

The CMO Who Gave Up Sales Pitches to Build Real Relationships

Chatting with Nathan Burke of 7AI on why relationship-building outperforms traditional B2B marketing Nathan Burke is intentionally doing less of what most B2B marketers are taught to do. As CMO of 7AI, he’s opting out of the usual B2B playbook, the awkward steak dinners with a pitch attached, the conference badge scanning arms race, and…

How UVEye’s Unicorn Drives Trade Show Excitement

How UVEye’s Unicorn Drives Trade Show Excitement

Trade shows are crowded. Competitive. Expensive. Every booth promises innovation. Every brand is trying to stand out to the sea of overwhelmed and tired attendees. For AI-driven vehicle inspection company UVEye, standing out meant not just thinking creatively. It meant creating a unicorn. UVEye calls its technology an “MRI for cars.” It provides AI-driven technology that…

How WalkMe’s Melanie Pasch Humanized the Enterprise AI Adoption Problem with “AI Shame”

How WalkMe’s Melanie Pasch Humanized the Enterprise AI Adoption Problem with “AI Shame”

Ask an executive how many software applications their company uses, and they’ll probably guess 30 or 40. The average organization, according to research by digital adoption platform (DAP) pioneer WalkMe, actually runs about 625 applications. This staggering digital ecosystem is where most tech investments stall, not because the technology is poor, but because employees can’t…

From $200M ARR to Pre-Seed: How Karina Lawrence Rewrites the Marketing Playbook for Early-Stage Startups

From $200M ARR to Pre-Seed: How Karina Lawrence Rewrites the Marketing Playbook for Early-Stage Startups

When you’ve helped scale a developer-focused company from roughly $200M to nearly $250M in ARR, you know what “grown-up” marketing looks like. Today, though, Karina Lawrence is back at the very beginning—leading marketing at Macrovo, a pre-seed, ~10-person startup that blends AI and human expertise to help financial institutions make faster, smarter decisions. It’s a…

B2B Videos You Actually Want to Watch? Meet Jared Evers of Medallia.

B2B Videos You Actually Want to Watch? Meet Jared Evers of Medallia.

For Jared Evers and his small and scrappy content team at Medallia – provider of customer and experience software – if you can’t do something stellar, there’s no sense in doing it at all. For proof, check out how the team is pushing the boundaries of corporate videos with Experience Now, Medallia’s own streaming platform.…

How HII’s Jaime Orlando Builds Connection, Culture, and Momentum Inside a Legacy Brand

How HII’s Jaime Orlando Builds Connection, Culture, and Momentum Inside a Legacy Brand

Q: Jaime, for those who might not know HII Mission Technologies, can you give us a quick overview of what your team does? Jaime Orlando Absolutely. HII as a company has an incredible legacy. It’s America’s largest shipbuilder, with more than 135 years of experience. About 75% of HII’s business comes from shipbuilding at our…

How Jenifer Kern Helped Qu Redefine Restaurant Tech

How Jenifer Kern Helped Qu Redefine Restaurant Tech

On the Radar sat down with Jenifer Kern, CMO of Qu, to talk about how she helped create a new category in restaurant technology, why maintaining industry focus has been key to business growth, and what it means to elevate marketing in a longstanding industry undergoing rapid transformation. Q: When you joined Qu, what did the industry…

From The New York Times to Muck Rack: Linda Zebian on Knowing What’s Newsworthy

From The New York Times to Muck Rack: Linda Zebian on Knowing What’s Newsworthy

Linda Zebian knows how to tell a good story. As VP of Communications at Muck Rack, she leads a lean, high-impact team responsible for brand, content, product marketing, internal comms, and more. Her approach is grounded in the instincts she developed over 10 years in corporate comms at The New York Times, where she learned…

How Sam Baldridge is Turning Culture Into a Competitive Edge

How Sam Baldridge is Turning Culture Into a Competitive Edge

At Applied Systems, Sam Baldridge wears a lot of hats. Officially, she’s the Senior Communications and Culture Specialist. Unofficially, she might be better known as the “Vibes Director.” Sam is part of a small but mighty three-person team tasked with building internal connection, shaping employer branding, and turning culture into a competitive advantage.  We caught…

How Kristina McConnell Uses Precision and AI to Power Account-Based Marketing at H1

How Kristina McConnell Uses Precision and AI to Power Account-Based Marketing at H1

A Director of Marketing at H1, Kristina McConnell brings structure, creativity, and a test-and-learn mindset to every campaign she touches. With a small team and a niche audience in the pharma space, she has helped transform H1’s account-based marketing (ABM) approach into a tightly aligned, data-driven engine. Her team goes far beyond basic alignment with sales.…

CONTACT US
CONTACT US

WE HELP BRANDS OWN WHAT’S NEXT

Our integrated PR and digital campaigns build reputations, drive growth, and shape conversations that define markets. Let’s talk about how we can help you do the same.