Google: New Algos or "SEO Filter"?
The recent changes to Google results have created a lot of buzz within the SEO community. Many theories have been discussed and debated. The "SEO Filter" discussion seems to be the one that holds water, or rather, has the fewest leaks.
However, there are anomalies within the results that don't point to a single "theory" being totally true. My belief is
that we are seeing newly acquired technology and PageRank modifications
in play.
This article makes use of a number of opinions by
people whom I personally regard highly as SEOs who do ongoing research and are members of
a forum which I know to be pretty reliable. Foremost I'd like to thank all
the IHY moderators especially Dan0
(Dan Thies), who pointed to the TSPR paper and the author's status with Google,
Alan Perkins for a concise
description of how TSPR seems to be used with a query and a special thanks to
Bernard Ertl, whose knowledge of his
competitors and research of a Florida SERP were very important in bringing Bob and
me to the conclusions we have about the recent Google update. I'm not
so sure Bob is in total agreement with our research but... we'll see, won't we
grasshopper!
The "SEO Filter" Theory
Barry Lloyd, AKA
MakeMeTop, was the first to advance
an
explanation for some wild moves downward in the results.
The theory is that sites that had used link text and reciprocal link campaigns to manipulate results were being penalized. Or were they penalized?
Barry also pointed out two articles which
served to explain a number of the changes which were being reported in the forums. The
Hilltop
paper written by Krishna Bharat and George A. Mihaila, and the
CIRCA
Technology white paper by Applied Semantics (a company recently acquired by Google and the creators of the technology behind AdSense)
were cited by Barry as possibly related to what has been seen in the Google
SERPs of late.
Hilltop made a lot of sense to me, and its ideas
seemed to be reflected in the results. The paper and one of its authors were in the news after a conference and it negated the type of "manipulation" I personally felt had been too prominent in the results.
That manipulation appeared to be at the centre of what the filters seemed to be
filtering. Or was it really a filter?
The technique which I refer to is something I like to call "optimization through promotion." Basically, a link campaign is
run, requesting specific link text and descriptions. In most of the sites I looked at the pages linked to were optimized for that text. Hence, the theory was that an "SEO Filter" was in play and that the old results could be found through added text in the query. The
"-adsfs -adsf"
or "-googlegoo" query supposedly shows the "SEO Filter" at work. Or does it?
Other SERPs demonstrate that it could just be a filter for "SEO over-optimization," tripped by having
overly high keyword density (commonly found in bad spam
implementations of cloaking and keyword stuffing), over-optimization of a title or headings and a host of other well-known SEO techniques.
This particular theory seemed to hold some water, but not enough for me to
conclude that
it is a very good explanation for the results being displayed by Google.
A good example of this is the
Examination of a 'Florida' SERP discussion and SERP
analysis prepared by Bernard, a moderator at IHY. I found this to be anomalous
compared to the other SERPs I'd seen supposedly affected by the "SEO Filter". It filtered the site but the linking part of the theory didn't hold water because
it was linked to by a number of the "experts" identified in the SERP. In my
opinion, it was an anomaly because it possibly was only penalized for an SEO technique. Or was it penalized?
Perhaps it was the high keyword density alone. Links were normal and most used the company name
rather than the optimized phrase in the anchor text. That is not to say that even a single inbound link with the matching text could trip the filter. That very well could be the case if it is in fact an "SEO Filter,"
but as some have noted this would be very easy to manipulate by malicious webmasters.
In the final analysis it was this SERP and others like it which helped me to
better understand what was taking place in the Google results. In my final
conclusions I will try to shed some light on this SERP.
Applied Semantics Influence
Another commonly discussed topic is that of the seemingly conspicuous division between the algorithmic results and
ads into categories of commercial and informational. This has raised a lot of eyebrows with the constant
rumor of Google going public, the theory being that it is a money grab. I can only hope this isn't the case because I've yet to see that work out in a positive result for anyone, stockholders or users!
One thing that I have noticed in the backlinks I watch is the sudden creep of
ads running on sites into the backlinks counted in PR calculations. As far as I
can tell, that is new, and distressing. One site I watched rise up the SERPs suddenly died. It was running
run-of-site ads on some obscure directories. I'm watching a couple of others in the same situation to see if it is filtered now, or just a case of it disappearing with refreshes of the database.
If you read the Applied Semantics paper you'll note some discussion of stemming. In
my opinion, the stemming that is being reported and the belief that Google is
now stemming is not always reflected in the results. It is still selective stemming,
but with a larger "dictionary" being used.
This is likely influenced by the stemming used in the Applied Semantics technology.
I believe that the enlarged stemming dictionary explains the "optimisation" and "optimization" change. In the past they
were separated, but now you find both together.
While researching with Bob some terms with which I'm very
familiar, we came across some SERPs which did show stemming being
used. Bob is in Boston, Massachusetts and I'm in
Toronto, Ontario. We both saw different ads, and most importantly the stemming was
different depending on the ads which were being displayed. This seems to be some
proof of my theory that stemming is a result of influences from the Applied
Semantics ad serving technology. There has also been the rumor that this
technology is in early release and is only being used in cases where the user's
physical location invokes use of the technology or possibly different parts of
the technology.
The best indication of this is the idea that if "true"
stemming were being implemented Google would be plural insensitive. In all cases
I looked at this was not the case, at first. Every day I see new instances of Google stemming, but nothing has changed:
you can't optimize a page for stemmed results; you still need to target them on
different pages (my favorite method), and watch the SE referrers to determine
which is most important. I go on the premise that most people are looking for
more than one result, so they query using plurals, but that is neither a proven or
researched technique. Call it an educated guess.
Is it Topic-Sensitive PageRank?
Dan0, a moderator at IHY, pointed out the
Topic-Sensitive PageRank
paper by Taher H. Haveliwala, which has some similarities to and overlaps with the Hilltop paper, and would if implemented result in
"experts" regaining prevalence in the results. Note in the early days of Google
this was the case. It gained popularity quickly with researchers for this reason.
In the paper, Haveliwala, and employee of
Google since October, discusses the use of topics to improve PageRank:
"To yield more accurate search results,
we propose computing a set of PageRank vectors, biased using a set of
representative topics, to capture more accurately the notion of importance with
respect to a particular topic. By using these (precomputed) biased PageRank
vectors to generate query-specific importance scores for pages at query time, we
show that we can generate more accurate rankings than with a single, generic
PageRank vector. For ordinary keyword search queries, we compute the
topic-sensitive PageRank scores for pages satisfying the query using the topic
of the query keywords. For searches done in context (e.g., when the search query
is performed by highlighting words in a Web page), we compute the
topic-sensitive PageRank scores using the topic of the context in which the
query appeared."
Interesting that the last sentence mentions
using "context in which the query appeared" and Applied Semantics' technology is
also contextual. I see a theme here.
Of particular interest is this statement later
in the article:
"By making PageRank topic-sensitive, we avoid the problem of heavily linked
pages getting highly ranked for queries for which they have no particular
authority [3]. Pages considered important in some subject domains may not be
considered important in others, regardless of what keywords may appear either in
the page or in anchor text referring to the page".
Are the present results being caused by a
filter or a new algorithm? If TSPR is truly what we're seeing, it seems apparent
that it is a new algorithm. It was noted in the IHY staffroom discussion that
keywords could be easily mistaken for topics in this context.
Some clues to TSPR usage are found in the SERPs
and Google's directory. After a short disappearance the ODP category links are
back in the results for many sites. Or are they really ODP categories? Perhaps
they are TSPR topics and descriptions.
One of the anomalies I've noticed coming up in these results
is that sites with no ODP category for the page in the results seem to slip past "non relevant" incoming link filters.
These sites have large numbers of
irrelevant links, many using optimized link text. A term for which I watch the SERP closely
is "search engine optimization." For obvious reasons, at any given time
I know
the top 30 and when they last appeared there, have done link research on them,
and have a pretty good idea how they got there and stay there.
Because of my work with SeoPros, let's just say this SERP gets extra attention and
has been a bit of a hobby for more than a few years. Yeah, I know, get a life.
There are two sites that are slipping through
the cracks, both of which have one thing in common. They either don't have an ODP listing
(never a good sign for an SEO), or the listing URL in the Google algo search
SERP is displayed in the Google directory results instead of the URL in the ODP listing. When you look in the Google directory, which is supposed to be a
clone of ODP, the instances that I and Bernard found give a good explanation of most of the
questioned ranks. Any result that is stemmed, or could be stemmed, is likely to
be fluctuating for a while.
One additional clue is a site I had been watching,
which in my opinion should have tripped the filter by adding the link text
phrase to the title. It had an effect, but a positive rather than negative one. In my opinion, if the "SEO Filter" is a reality then this should have affected the rank negatively. It was #1, up from #5-10ish then dropped to #2 a few days later.
This was after the phrase had not been in the title for weeks prior to this
update. In fact, I noticed the change after checking my notes and confirming a
discussion that Bob and I had a while back about the optimization of the page,
or, ummm, lack thereof... says a little about the link text. Or does it?
Another clue to the adjustment of PR using the implementation in the
paper is how ODP categories are influencing results. Note sites that place well and those below them seem to be influenced by the category they are included in on ODP. This is an indication that, as the
paper suggests, the categories of the ODP are being used to identify "experts" for a topic.
Nothing definitive here, I'll let you make up your own mind about this. In my
opinion, if
it is, it's because of the quality of the editors and the meta editors
overseeing them. The paper also discusses adversarial editors and specifically mentions
the ODP, but this could just as well be Google itself at present. All Google
really requires is the initial ODP dump and they have all they need to use TSPR.
Who knows? The Google directory may be a reflection of the topics and experts. Funny,
Google's directory does look an awful lot like the algo search results.
Conclusions
In the end, the questions about anomalies in
the Florida SERP, Bernard, I, and others had been discussing IMO, and the
opinion of others, is, they were influenced by the "authorities" in which the
sites were in, and the PR associated with the "authority" page. In other words, good old PR
as we have always known it! Bernard believes there is at least 1 anomaly
remaining, however, IMO, his data is slightly skewed by "non authoritative"
links. I used link: in my research and found that the quality of the incoming
links seemed to smooth out the anomalies. IMO, www.domain.com -site:www.domain.com
is showing links Google PR has never counted as many are less than PR4 or have
whitebars.
These changes have been ongoing since April,
with the explanation that new "filters" for inappropriate behavior or unwanted
manipulation were being added. The new "filters"
were rumored to be for hidden text and links. In my opinion, only human review can accurately detect hidden text and links, and that seemed to be the results of the new "filters."
They caught the stupid over-doers, but not the informed, calculated implementations.
In my opinion, SEOs' fear of penalties and the seemingly adversarial
relationship they have with engines is part of the reason for the "SEO Filter"
being pointed to as the source of the changes when all it really was is Google
trying to make itself better for its users!
The Goo?
The goo, as it has come to be known, could just be
Google giving you what they think you want, namely, the old index. You're
searching with negative phrases so it's using the old data because the new
Google doesn't do that yet. I have my opinion of Google's present results, and
search engines determine what's unwanted manipulation, not me, or anyone else I
know.
Hilltop?
Interesting, but if it is Hilltop, in my
opinion there is another shoe to drop.
Although current SERPs appear Hilltopesque, it is either only partially implemented to date or the real change is reflected in the
TSPR
paper. I lean toward a new Google employee overseeing implementation on a grander scale.
Haveliwala's
paper was researched using Stanford's index of about 120 million pages. Guess
what else was based on the same Stanford index -- Google, circa 1998.
"SEO Filter"?
There are indications that there is some sort of linking filter in place,
however, it could also be the result of the new algorithm. For some engines, the siege mentality is greater than others. I have believed that Google was least adversarial
toward SEOs.
But then again I might be just sayin' that to get on their good side.
As Dan0 has pointed out a few times, the "SEO
filter" theory is a subjective conclusion.
There is plenty of evidence that a lot of the sites thought to be filtered for SEO were just subject to the changes in
the algo and the designation of "experts"
and topics.
There is no absolute proof of this, but in my opinion it fits with past actions by
Google. Google seldom bans, preferring to adjust ranking by using PR. The "SEO Filter" seems
more like a ban for the optimized term then an adjustment.
In short, do what we did in the old days. It
never stopped working, it's just a little slower. In the end what Google does,
or any SE for that matter, can't change that, you still have links to your good
content and the directories are always looking to improve topics. Good content,
and a long term plan is what was left standing after all is said and done,
always has been, and always will be, Google just gave SEOs another lesson on
why!
See Ya' at The Top
Da' Tmeister
Edited by
Bob Gladstein, with
some research by the
generous moderators, owner Doug Heil, and Members of the
IHY forums who willingly share their collective knowledge, without which
I'd probably be still scratchin' my noggin too! ............... see ya at the top!
Da' Tmeister, Editor
|