More proof that Google is not really using Latent Semantic Indexing (LSI)?

Most folks that are schooled in information retrieval (like my virtual buddy @thegypsy) already know that latent semantic indexing (LSI) is little more than a pipe dream at this point. However, there are still a lot of SEO folks out there that are convinced that LSI is a major factor and spend an inordinate amount of time trying to somehow optimize their sites accordingly.

I’m no information retrieval expert, but I do recognize anecdotal proof when I see it (at least I think I do). Here’s what I’m referring to specifically…

While browsing through my Google Analytics profile for this blog I happened upon a fairly odd keyword referral:

“social media agencies seattle”

It seemed odd that a blog that is not based in Seattle and rarely makes any mention of Seattle would rank well enough to generate a visitor referral for this term. However, when I took a quick look at the SERPs for this term I noticed that my recent blog post relating to a conversation I had with a “social media denier” while attending SMS Advanced (a conference that was hosted in Seattle) held the No. 5 position in the organic results set.

Note: I logged out of my Google account to counter an ranking effects that personalization might have had on the SERPs (though that might not completely eliminate personalization). Also, I found it interesting that this search referral came from a user based in Seattle, Washington (confirmed via Google Analytics data).

Why is this significant? There are several reasons as far as I’m concerned.

While I’m happy to get the extra referral, I’m convinced that my blog post is extremely irrelevant considering the context of the search query. Semantically speaking, the phrase “social media agency Seattle” is communicating a fairly clear intent to find an agency that specializes in social media and is based in Seattle (especially if the search query in question originates from the city of Seattle). I would think that even a very basic form of LSI would account for this and ensure that a blog post that simply makes mention of the word “Seattle” but does not tie that keyword to concept of social media agency services in any way would not show up anywhere near the first place of results. Sure, the words “social media” and “agency” are included in the post, but at no point is there any semantic inference to the concept of a “social media agency” in this post, much less a social media agency in Seattle.

Furthermore, knowing what I know about the inbound links and anchor text – namely references to “Seattle” and “Social Media” – that this particular blog post received, I’m convinced that the only reason my blog post is showing up for this post is mainly a function of anchor text saturation as opposed to any semantic logic.

In layman’s terms, my blog post has nothing to do with social media agencies in Seattle, so why in the heck is it ranking well for this phrase?

The answer, in my mind, is simple. It’s because LSI has little or no role in Google’s current search algorithm(s) while inbound links and anchor text account for the lion’s share of ranking influence.

P.S. This is by no means a scientifically sound evaluation. I have not properly used the scientific method (isolating variables, etc) and I have not tried to replicate the results. This is purely an anecdotal treatment (albeit a fairly strong one in my opinion.

Also, I would love to get feedback on my conclusions from information retrieval pros. Perhaps I’m misunderstanding the meaning of LSI. If so, please let me know. The last thing I want to do is spread misinformation.

