Googlebot UserAgent Good For Something

| 14 Comments | 2 TrackBacks

One of my pet annonyances when browsing around on the net is sites where you have to register for no good reason. I have enough useless accounts as it is. What's even more annonying is when they return a different result to the search engine bots so that more than just the registration page is indexed.

A prime example of this is Unison.ie. When searching for current Irish news it usually ranks fairly high on Google, however all the pages require you register first before you view them. The registration gives no advantage to people like me who just want to a quick look at the latest news. I suspect that I'm not alone and that lots of people will just go back and look for another site.

Unison's simple user agent checking makes it very easy to get in unmolested though. The User Agent Switcher Plugin for Firefox allows you to easily set exactly what user agent you want your browser to appear as. The GoogleBot isn't in the list of Useragents available, but it is easily added. Switch to GoogleBot as your useragent, and magically you will have full access to the Unison site.

I know that Unison will probably close this hole within a few days now, but it's nice to be able to make a point. According to Google's Webmaster Help Center "crawler only" pages are a thing to avoid. I would class pages that react differently to GoogleBot as "crawler only" pages.

If Unison want to require people to register in order to get nice features such as customization, then grand, I have no problem with that. However, how much traffic are they missing out on by having the register page for everyone? And how many advertising impressions are they missing out on? I know that if I go to the BBC News site I will usually end up going to other stories which interest me, which means more page impressions on the BBC site. More impressions, more chance of clicking on ads, more money!

In this day and age it is senseless to have such stupid restrictions on a site like Unison that has enough content to be a massive earner on advertisments alone.

Update: I somehow managed to forget the user agent I'm using, it is:

Googlebot/2.1 (+http://www.googlebot.com/bot.html)

2 TrackBacks

from Unison.ie Cloaking - Will They Be Banned From Google? | Search Engine Optimisation Ireland .:. Red Cardinal on March 23, 2007 11:04 AM

TITLE: URL: http://www.redcardinal.ie/search-engine-optimisation/23-03-2007/unison-ie-cloaking/ IP: 64.247.42.6 BLOG NAME: Unison.ie Cloaking - Will They Be Banned From Google? | Search Engine Optimisation Ireland .:. Red Cardinal DATE: 03/23/2007 11:04:52 AM Read More

from Unison.ie Improvment on May 17, 2007 10:07 AM

TITLE: URL: http://blog.moybella.net/2007/05/17/unisonie-improvment/ IP: 81.17.240.212 BLOG NAME: Unison.ie Improvment DATE: 05/17/2007 10:07:11 AM Read More

14 Comments

The BBC don't run ads do they? I get your point though. I never visit that unison site, always thought it was stupid having to register to just read a story.

Anthony,

They don't, however it is a site that I go to and tend to stay at for a while. Unison should have enough content with the Independent and all the local papers to be able to do the same.

I always knew they ran a subscription wall, but I've never seen their results in Google news.

This is cloaking. Plain and simple. It goes against the Google guidelines. In fact it's a ban-worthy offence. Considering all the hassle over WMW and NYT this is quite funny. Deserves some further highlighting I think.

Nice find
Rgds
Richard

Richard,

Feel free to highlight :)

Niall.

I've always used http://www.bugmenot.com/ not get through Unison.ie , but still this is interesting to see what they are doing to get higher rankings.

Tut Tut !

So how do you report offenders ?

Paul,

Lots of sites started getting good at blocking bugmenot logins so I just stopped installing it :(

Not sure where it should be reported, haven't really looked to be honest.

Great find, this will really force them to choose either Google cloaking (and leaving it open to us) or closing the hole. Ultimatum anyone?

They could of course verify by way of reverse DNS but that would be rather costly per page load (never mind atrociously black hat).

Nice trick that with Googlebot. Not thought of that.

I was Opera so I often have to "fool" a website that I'm an IE browser (just 2 keep them happy so they believe they are covering their a** 4 the masses!).

I use BugMeNot.com where I want 2 c a site but not prepared 2 signup. Works great if someone else has been there b4 - u just re-use (good 4 the environment :)

Lal

David,

Reverse DNS is easily changed. Is there any list of common ips that the Googlebot comes from out there? This could be dangerous though, what happens if Google brings a new DC online with new IP Space? All of a sudden some googlebots won't be seeing the same as others, and then I'm presuming fun will follow :)

Lal,

That's what you get for using Opera ;)

Niall.

That is a great post. Thanks for the tip. :D

@Niall: (A bit late I know)
I would never recommend using Reverse DNS for something like this, total overkill when they should just use robots.txt
However, I have seen lots of examples of people reverse-DNSing Googlebots to check they are from a real Google IP. Of course new IPs get added but as long as people keep up on it they are relatively reliable.

David,

Never too late! I wasn't looking at the robots.txt when browsing using the Googlebot user agent ;-) I must have a look at how Wordpress blocks it though.

Niall.

What say you now ladies?

Hell of an improvement over the previous incarnation of Unison :)

About this Entry

This page contains a single entry by Niall Donegan published on March 22, 2007 8:40 PM.

Damn Interesting? More Like Damn Distracting! was the previous entry in this blog.

Setting The Region Of You DVD In Ubuntu is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Pages

OpenID accepted here Learn more about OpenID
Creative Commons License
This blog is licensed under a Creative Commons License.
Powered by Movable Type 5.02