Google
'Robots.txt' files do not treat all search engines equally
R E L A T E D   C O N T E N T

Free email newsletters




ADVERTISEMENT

Google bots get the red carpet treatment

Robots.txt files written to favour Google's web-crawlers

Robert Jaques, vnunet.com 19 Nov 2007
ADVERTISEMENT

Webmasters who control automated web-crawler access to their sites using 'robots.txt' files have a bias that favours Google over other search engines, according to new research.

The claim was made by researchers at Penn State University based on the results of a study of more than 7,500 websites.

C. Lee Giles, David Reese professor of Information Sciences and Technology at Penn State, who led the research team which developed the BotSeer search engine for the study, described the pro-Google bias as "surprising".

"We expected that 'robots.txt' files would treat all search engines equally, or maybe disfavour certain obnoxious bots," he said.

"So we were surprised to discover a strong correlation between the favoured robots and search engine market share."

'Robots.txt' files are not an official standard but, by informal agreement, regulate web-crawlers, also known as 'spiders' and 'bots', which mine the web continuously.

Web policy makers use the files found in a website's directory to restrict crawler access to non-public information.

'Robots.txt' files also are used to reduce server load which can result in denial of service and force a website to shut down. But some web policy makers and administrators are writing 'robots.txt' files which are not uniformly blocking access.

Instead, those files give access to Google, Yahoo and MSN while restricting other search engines, the researchers found.

While the study does not include explanations for why web policy makers have opted to favour Google, the researchers know that the choice was made consciously. Not using a 'robots.txt' file gives all robots equal access to a website.

"'Robots.txt' files are written by web policy makers and administrators who have to intentionally specify Google as the favoured search engine," said Professor Giles.

Not every site has a 'robots.txt' file, although the number is growing. About four in 10 of the 7,500 sites analysed by the researchers had such a file, up from fewer than one in 10 in 1996.

See also:

GoogleOpen handset project gets SDK and $10m developer challenge  13 Nov 2007
Big guns backing Roommates.com  08 Nov 2007
Cerf and Schneier fight for open access  08 Nov 2007
IT securityMcAfee highlights top 10 threats for 2007  20 Jun 2007
Sdbot and Gaobot malware groups responsible for 80 per cent of botnets  12 Apr 2007

All Ecommerce

Like this story? Spread the news by clicking below:

Post this to Delicious del.icio.us    Post this to Digg Digg this    Post this to reddit reddit!

Permalink for this story
R E A D E R   C O M M E N T S

M A R K E T P L A C E
Sponsored links
F E A T U R E D   J O B S
Colindale (C1905), United Kingdom | NHS Blood and Transplant
 Operations Engineer, £28,313 - £37,326 pa plus High Cost Area Supplement, Colindale (C1905) About us The National Blood Service is an integral and vital part of the NHS. Our two million volunteer donors contribute 1.6 ... more >
United Kingdom | London Borough of Sutton
Business Relationship Manager (Finance), Based at Civic Offices, £ 41,790 - £ 44,373  (PO 7)   Fixed Term to 31st March 2009 The IT service has four Business Relationship Managers (BRM); each one responsible for delivering and developing ... more >
London, United Kingdom | Shell
 Site Systems Integration Manager, London, United Kingdom Shell Downstream encompasses all the activities necessary to transform crude oil into petroleum products and petrochemicals, and deliver them around the world.   Our Downstream businesses refine, supply, ... more >
Newcastle, Tyne And Wear, United Kingdom | EDS
About EDS EDS provides a broad portfolio of business and technology solutions to help its clients worldwide improve their business performance. EDS' core portfolio comprises information-technology and business process outsourcing services, as well as information-technology ... more >
More job opportunities