If this page does not print out automatically, select Print from the File menu.

Closer look at P2P technology

Bittorrent is almost part of the online media establishment, but why has it become so successful?

Kelvyn Taylor, Personal Computer World 05 Jul 2007

The internet was invented to allow people to easily share files and information. But ever since the original Napster launched in 1999, file sharing has had a bad name.

This is due to the fact that people quickly realised they could easily find and download files that they weren’t supposed to, such as illegally copied music mp3s.

But public demand for the capabilities unleashed by Napsters’s peer-to-peer technology (P2P) spawned a whole host of ‘me-too’ products and protocols, including Kazaa, Grokster and Gnutella.

Many of these have now gone to the wall; the Napster brand was reborn as a (non-P2P) legitimate music download service, Kazaa seems to be in virtual hibernation and Grokster was infamously closed down by the US Supreme Court.

But there’s one open source P2P technology – called Bittorrent – that has survived and prospered, even though it’s just as open to abuse as any file-sharing technology. Invented in 2001 by programmer Bram Cohen, it was developed as a way to efficiently distribute large files without the need for a dedicated file server. This reason alone helped it to become a popular – and legal – way to make Linux distributions available for download.

You might consider it a niche application, but an awful lot of people are using it – more than 150 million according to www.bittorent.com. Statistics published by Cachelogic (www.cachelogic.com) in 2004 showed that Bittorrent traffic accounted for more than 35 per cent of all web traffic worldwide. Bittorrent.com itself is currently ranked in the top 2,000 websites according to the website monitoring service Alexa.

Smart moves
In 2005 Bittorrent Inc. (the company that Cohen and partner Ashwin Navin created in 2004) neatly distanced itself from the shady side of P2P by doing a deal with the Motion Picture Association of America (MPAA), and agreeing to remove any illicit copyrighted works from Bittorrent’s commercial site.

Since then, Bittorrent.com has started offering paid-for legal TV and movie downloads. In 2006 it signed a breakthrough deal for online movie distribution with Warner Brothers. It now seems to be going from strength to strength, featuring video, music, TV and game content from top media brands such as 20th Century Fox, MTV, Paramount and Eidos.

Bittorrent, the technology, remains open source and is available for anyone to develop an application around, which is another reason it’s remained so popular, But how does it work, and just what makes it so suited to sending digital movies or huge Iso files around the world? To answer this, first we need to step back and look at some of the basics of P2P technology.

Peer pressure
There are two main ways to share a file between lots of users. The traditional and familiar client-server method is to put the file on a central server and allow multiple clients (PCs) to access it directly.

There’s no need for any communications between the clients – all they need to do is talk to the server. But this means that the server has to be able to cope with delivering multiple copies of the file to lots of clients simultaneously, otherwise the server becomes a bottleneck. For sharing via the relatively low-bandwidth internet this can obviously be a major problem, leading to high costs and congestion at the server.

A peer-to-peer architecture simply does away with the central server and allows any PC connected to the network (a peer) to act as both a client and a server. Peers can then communicate directly with each other to obtain the files they need. This gives several benefits, the most important being data redundancy and no need for massive bandwidth on any one peer machine. A peer can limit peer connections or file downloads depending on how much bandwidth it has available.

Napster used a proprietary P2P protocol to organise the downloading between peers and added a central server to register and validate users and store an index of what files were available on which clients.

It then wrapped it all up in an easy-to-use application that enabled users to search for a particular file or even chat to other Napster users.

Napster wasn’t ‘true’ P2P, but a centralised P2P system, because it relied on a central server for most administrative tasks, and didn’t fully use the computing power of the peers.

A ‘pure’ P2P architecture doesn’t need any central servers at all, sharing all the indexing and admin tasks between the clients as well. Such an architecture is often called a decentralised (or fully distributed) P2P network. Examples that are still going strong are the Gnutella and Edonkey networks. The once popular Kazaa, created by the inventors of Skype, was also a decentralised network.

As we’ll see below, Bittorrent falls between these extremes, adding new and unique twists that make it a hybrid between ‘pure’ P2P and a traditional client-server architecture.

Naming names
In any discussion of P2P, it’s easy to get muddled between application and protocol names, so let’s clear a few names out the way first. Napster was an application based on its own proprietary protocol, Kazaa software used the Fasttrack protocol and Gnutella is the name of the protocol. Edonkey is also a protocol.

For any particular protocol, many client applications are usually available, offering a variety of different features. For example, Gnutella clients include Limewire, Bearshare and Morpheus. Emule is the most popular Edonkey client.

Confusingly, Bittorrent is the name of the file-sharing protocol, the company, the associated website and the ‘official’ free software client, but there are dozens of alternative Bittorrent clients available, such as Utorrent and Azureus. For the purposes of this feature we’ll use ‘Bittorrent’ for the protocol or client and ‘Bittorrent.com’ for the company.

Torrents, seeds and trackers
From day one, Cohen designed Bittorrent and its interface to be easy to use, reliable, give fast downloads and avoid the problem of unfair P2P behaviour. In a normal P2P network, once a client has downloaded a file, it has no further incentive to make that file available to other clients. So the P2P network becomes reliant on a few generous clients and becomes very slow and inefficient.

In Bittorrent terminology, clients that are downloading are called ‘leechers’, whereas clients that are actively uploading are called ‘seeds’. A P2P network with only leechers wouldn’t work for obvious reasons.

Bittorrent tries to prevent this behaviour by forcing clients to do simultaneous downloads and uploads and using other tit-for-tat tricks in the protocol. The easiest way to explain this is to look at how you go about downloading a file in practice.

To download a file via Bittorrent, you first need a Bittorrent client. Probably the best one to start with is the official free client from Bittorrent.com. Although it’s a commercial website, you don’t have to register and there’s lots of free content that you can use to see how it all works.

Divide and conquer
The key to Bittorrent is its unique way of dividing files into small chunks called ‘pieces’ (typically around 256KB) to download.

A ‘torrent’ file is a small file (typically a few kilobytes) containing metadata that enables a client to download a file (or a collection of files) over the Bittorrent network.

When a torrent file (with an extension of .torrent) is created, an index of the source file’s pieces and data integrity information (an SHA1 hash number) for each piece is generated so that clients can verify they have received uncorrupted data. The protocol subsequently breaks these pieces into 16KB sub-pieces, to queue up TCP transfers (pipelining) and ensure maximum use is made of bandwidth.

To start a P2P download you first need to find a torrent file for the content you’re after. You can do this via Bittorrent.com’s search engine or any of the torrent search sites, of which Torrentspy is probably the largest. Be aware that outside the commercial sites, Linux and open source download sites such as www.distrowatch.com and http://sourceforge.net, there’s no way to tell whether you’re downloading legal content or not. And if you’re not sure, it’s best to assume the worst.

Torrent files don’t contain any of the actual file data; the metadata in the torrent file describes how many pieces of the file need to be downloaded, how big they are plus error-checking information to ensure you’re not downloading junk. The file also contains information on the ‘tracker’ for this file.

Trackers are Bittorrent’s way of letting downloaders find each other. They’re not involved in downloading the files, and they store no files themselves. In theory, anyone can run a tracker, but normally users rely on publicly available free tracker servers.

Many clients now support trackerless torrents, using DHT (distributed hash table) technology. In this scheme, each client effectively becomes its own tracker, so you don’t need one of these tracker server. The downside is that it only works while your PC is turned on.

Swarming
When a file’s made available via a torrent file and clients start sharing it, this is known as a swarm. The tracker provides all the members of the swarm with the IP addresses and TCP port numbers of a random selection of other members. Each client then tries to get the full collection of pieces from other clients, which is where the ingenuity of Bittorrent lies.

As soon as you’ve collected a few pieces of the file, the Bittorrent client makes these available to all the other members of the swarm, so you’re forced to simultaneously download and upload data to get the complete file.

This not only stops most of the usual problems of ‘greedy’ leechers, but also makes downloads faster the more clients there are in the swarm. With a healthy swarm, it’s quite easy to saturate all your available download bandwidth, even though each of the swarm members may only be uploading tiny amounts of data.

It’s like downloading from a super-fast web server, but without the need for high-end server hardware. This is why it’s such a popular way to distribute large files such as Linux distributions, but so unpopular with ISPs trying to maintain ‘fair’ contention of user bandwidth.

There are downsides to Bittorrent’s approach. Clients with a complete version of the file available to upload become seeds. The more seeds and leechers available for a particular torrent, the more efficient the downloading process. This tends to make torrents die out rapidly, with large swarms and fast downloads when a popular torrent first appears, but dying out to a trickle over time.

Looking ahead
We’ve already touched on one innovation – trackerless torrents – but there are several other possibilities waiting in the wings.

One is the idea of Bittorrent media streaming. Streaming is notoriously demanding on server resources for decent quality content, particularly video, so the idea is to try and leverage the power of the swarm to do the task cheaply.

Some companies are now starting to roll out streaming solutions based on Bittorrent or other P2P protocols.

Another technique called Similarity Enhanced Transfer (SET) claims to be able to speed up downloads by exploiting the fact that identical files are often labelled differently, and tries to spot identical data in different files. Again, there’s no implementation of this yet.

Of more practical interest is the incorporation of Bittorrent clients in hardware such as routers and Nas storage devices. In 2006, Bittorrent announced partnerships with Asus, Planex and Qnap to produce such hardware, and you can see one of the Qnap models in our Nas group test on page 97. We’ve also seen a Bittorrent client built in to the Excito Bubba home server we reviewed in our August 2007 issue.

P2P comes in from the cold
There was a time when to mention Bittorrent in polite company wasn’t a good idea. P2P is still viewed as the seedy side of the web, even though you can find P2P file sharing built into Windows Live Messenger, and Skype is a massive P2P-based VoIP service from the creators of Kazaa. But few pundits ever foresaw the bizarre twists that have led to us now seeing Warner Brothers and Paramount logos on the Bittorrent home page.

Of course, there’s nothing to stop the Bittorrent protocol from being used for illicit purposes, but that’s not the fault of the technology. The biggest story now is the ongoing fight between the torrent search sites, content pirates and the authorities, which is heading slowly but surely for a big showdown. Whatever the outcome, we predict that you’ll be hearing a lot more of Bittorrent and other P2P technologies over the coming months.

Bittorrent and security
Downloading via Bittorrent isn’t intrinsically any more dangerous than downloading an ordinary file from a website – which these days is perhaps cold comfort.

However, because you’re not downloading a single file from one computer, but lots of verified pieces from different computers, it’s much harder for malicious users to sneak in corrupted, pornographic or virus infected files; a practice Kazaa users were notorious for.

If the original file contains malware, there’s no way to check until you’ve downloaded it, though. Standard anti-virus and anti-spyware programs are sufficient protection, but there are other security aspects of Bittorrent to be aware of.

Any P2P application requires peers to be able to connect directly to your PC. If you’re using a firewall, this means enabling port forwarding for the relevant TCP ports (see here for an explanation of how ports work).

Many routers now have pre-configured settings for applications such as Bittorrent (see screenshot for an example). If your router supports Universal Plug and Play (Upnp), clients such as the official Bittorrent one, can do all the configuration for you automatically. A decent SPI (stateful packet inspection) firewall will also help ensure your safety.

Bittorrent needs these ports open to upload to other peers, but downloading will still work. You won’t get the best performance, though, as other peers may start to ‘snub’ you for not offering uploads.

Finally, you should be aware that when you’re connected to a public torrent tracker and performing either uploads or downloads, your IP address is easily viewable by the whole Bittorrent world via the client software. It’s not designed to be anonymous, so if that concerns you, then either use private trackers or avoid Bittorrent altogether.

www.pcw.co.uk/2193584
This article was printed from the Personal Computer World web site
© Incisive Media Ltd. 2008
Incisive Media Limited, Haymarket House, 28-29 Haymarket, London SW1Y 4RX, is a company registered in the United Kingdom with company registration number 04038503
Close this window to return to the website