Search engine tricks, and getting to grips with HTML Kit
Search engines are the gateways to the web today. Can’t remember the exact name of a site? Just Google it. Not sure where to look for information on a product? Google it.
A good listing on Google is much sought. Firms will spend a fortune on snake oil salespeople who promise to increase their page ranking and ensure they appear in those crucial first few hits.
Of course, those aren’t the only ways to drive traffic to your site. This month, I’ll explain a little more about how you can help the search engines without having to spend a fortune on extra software that promises to optimise your pages for you.
Meet meta
This will be teaching some readers to suck eggs, but for the rest here’s a quick
recap. Links to other relevant sites are obviously important, but they’re not
the only thing. Your site needs to be indexed properly. You can help by adding
meta tags to the pages, especially to the index page.
Meta tags can be used for many things, but the two we’re interested in here are the description and keywords. In the former you need a concise description of your site, while the latter contains a selection of the key words you think people might type when they’re trying to find the sort of information on your site.
All you have to do is add these in the <HEAD> section of your page. In some web design tools, you’ll find a menu option to insert them for you, or you’ll be asked to type them as part of a site setup wizard. Either way, you end up with something like this:
<meta name=”description”content=”User information and forums for the Topfield TF5800PVR Personal Video Recorder.”>
<meta name=”keywords”content=”TF5800,PVR,TF5800PVR, Topfield, MHEG,forum” >
When you’re adding keywords, tempting though it may seem, it’s not terribly sporting to put in the names of your competitors’ products. Once you’ve uploaded your index page with meta tags, you can ask the various websites to index you, or wait until they find you. But that’s less likely to happen if you only have a few links on other sites, so you may want to submit your pages via locations such as www.google.com/webmasters.
Spider men
After your site has been submitted, or if there are links to it from elsewhere,
it’ll be indexed by a web crawler, also known as a robot, bot or spider. These
are, in essence, programs that read the pages and analyse them, helping search
engines to build up a more detailed picture of your site.
This is where things can become interesting. On one of my sites, I recently noticed that the forum software was claiming our highest ever number of simultaneous visitors – about 125. But less than half were registered. I looked in the logs and found many requests from similar IP addresses. It turned out that between Yahoo and Google, which were by coincidence attempting to index at the same time, there were about 80 connections to the web server. It didn’t flinch, thankfully, but that’s partly because it’s running on a powerful system and there is an unlimited bandwidth-hosting plan.
Had that happened 18 months ago, the server would have ground to a standstill. If I were paying for data transfers, it could have eaten up a lot of my allowance.
In reality, it’s unlikely to happen to very many sites – it just happens that this one comes top in many Google searches for its particular topic and has a busy forum, generating lots of links to it from elsewhere.
The first thing to do is to get a grip on what’s happening. You can analyse your server logs, looking at the User-agent information for the different bots, or target specific search engines.
Google’s webmaster tools area will give you a wealth of information. You need to register that a site is yours, which is done by uploading a file with a specific name, or adding a code to a meta tag. The logic is that if you can do this when Google asks you, then obviously you have permission to upload to the site, so it must be yours.
Once that’s done, Google checks that you have made the changes and then you can view more statistics about your site.
From www.google.com/webmasters, click on Webmaster Tools. Then, on the ‘Dashboard’ page, click the site you want to see information about. On the summary page, click Crawl rate, in the Tools section, and you’ll see a display.
As you can see, an average day sees about 2MB of data downloaded by Google’s crawler, and 93 pages, but the maximum rate is a startling 22MB. If you feel the crawling is having an adverse effect on your site, you can change the rate to a slower crawl.
Another useful tool is the Query stats, accessible via the Dashboard page or via Statistics. Here, you can see the queries that result in people being shown a link to your site, and the average position within the results. For this site, we’ve got a few 1s and 2s, and most of the rest indicate we’ll be on the first page somewhere. The right-hand column shows the most clicked-on searches.
So now you know how to slow Google down, but what if you don’t want your site indexed at all, or you have more specific requirements? Both of those are possible using the robots.txt file, which can specify which pages of a site are indexed (and even refuse access to some search engines), and the sitemaps protocol, which lets you provide more detailed information about your pages to search engines that support them. We’ll explore both in detail next time.
Getting to grips with HTML Kit
Click on the link below for an example of using
HTML
Kit to create a page quickly. It may not be quite as slick as using
Dreamweaver to hide all the database tricks from you, but as long as you know
the basics of what’s going on behind the scenes, it’s surprising how quickly you
can knock up a page.
Here's how to use HTML Kit to create a page.