Ed Henning
Ed Henning
R E L A T E D   C O N T E N T
ADVERTISEMENT

Ed Henning

Freeze file formats or lose data for ever

We must ensure the longevity of digital data for the future, or risk losing it altogether.

ADVERTISEMENT

The PC industry is notoriously short-sighted and apparently obsessed with change. Part of this is understandable, with the rapid improvements of new technologies being adapted to a huge and growing market.

But therein lies a serious problem that has concerned me for many years, and to which I see little attention paid by the industry: the longevity of data. Recent news reports and a couple of personal experiences have brought this to my attention again.

The first of these reports was about the photographic film industry suffering as a result of the rapidly growing success of digital photography. One, of which I caught only a part, contained an interview with a photographer concerned about the demise of film.

This was not out of some romantic attachment, but because he considered digital to have a short lifetime. He said there was no equivalent to finding old photos stuffed at the back of a drawer that had been forgotten for a couple of decades.

Well, there is. After a recent hard disk crash I was sorting out some of my own old data, and found some image files which are all 17 years old, the names of which intrigued me. Of course, I can find no way of opening them and seeing what they are before deciding whether to delete them.

The problem does not only apply to images but to data of any kind that might be in digital format. Digitising data is supposed to give you control over it, but that only works if you have the current software to recognise the format.

People for whom data is important over a long period of time have been forced to devise strategies for maintaining knowledge of what the data is, where it is and what format it is in.

I once read an article by somebody who claimed that whenever he started using a new word processor he would open all the old files and save them in the new format. I doubt the value of that solution, but it highlights the problem clearly.

My solution was to write my own word processor, and structure it so that it should be easy to port to new operating systems. I am not at all looking forward to the day that my theory gets put to the test.

But oddball strategies like these are not the long-term solution that is needed; that really has to come from the industry. I once asked Microsoft about this and drew a complete blank.

Look how often that company's file formats have changed over the years, and its response becomes understandable: the corporate embodiment of short-term thinking.

An example of a possible way forward lies with the use of text material. One popular way of packaging text currently is in PDF format, as images rather than editable text. For archived material these types of files are taking the place once held by microfilm and microfiche.

One librarian told me of his concerns about this. Apparently the bean counters had decided, when microfiche arrived, that the relevant paper books could then be discarded. Stupid enough, but imagine how even more stupid such a strategy would be given the arrival of PDF files.

For a library or similar organisation, PDF files should be a real benefit. Viewing places no wear and tear on the original material, copies - and backup copies - are easily made, printouts easily produced and so on.

But will you be able to open them in 50 or 100 years time? One library I know of that helps supply other archives has an interesting solution to this.

Serious archive material is dispatched in units containing both PDF files and the original Tif files from which they are produced, Tif being seen as more secure in the long term.

I would suggest that each package also contains an Ascii text file describing the file format of Tif files - something which is easy to implement programatically.

This is an indication of the kind of strategy that needs to be developed by the industry. I would like to see a group of companies come to agreement on a few select file formats for the range of the most important types of data, and then freeze these. And not just freeze them for a few years, but permanently.

Make the file formats publicly available, in plain Ascii text files, and guarantee that every version of their relevant software will be able, in future, to open and save these file formats, without even the tiniest variation in specification, as long as their companies are in business. You can imagine the Microsoft programmers squirming at the thought.

I doubt whether the problem of longevity of data is as important in businesses as it might be for other users. But there were similar problems a few years ago when businesses needed to adapt their database systems for the Y2K problem.

The difficulties lay not with the data, but with the old software source code, much of which could either not be understood or recompiled.

The report that triggered this article was about Caxton's original rendering of the Canterbury Tales, put online by the British Library at www.bl.uk. That book is still readable after many centuries, but will this be the case for modern Jpeg version in a few hundred years?


Like this story? Spread the news by clicking below:

Post this to Delicious del.icio.us    Post this to Digg Digg this    Post this to reddit reddit!

Permalink for this story
RELATED ARTICLES
M A R K E T P L A C E
Sponsored links
F E A T U R E D   J O B S
Aylesbury, Buckinghamshire, United Kingdom | Grass Roots
Business Analyst - £35,000 - £50,000 + benefits - Aylesbury    Grass Roots are one of the Sunday Times Top 100 companies to work for (2007 and 2008). Established in 1980, we're part of the ... more >
London, United Kingdom | The Crown Estate
 EDM Administrator - London - £22,300 to £24,200pa The Crown Estate is a unique organisation that manages a vast and varied property portfolio, comprising commercial, agricultural and marine interests throughout Britain. We are looking for an ... more >
London, United Kingdom | City of London
ICT Project Officer - Guildhall, London EC2 18-month fixed-term contract Bring your project management expertise to one of the country's most prestigious institutions. The City of London is the local authority for the Square Mile, ... more >
Central London, United Kingdom | MI5 Security Service
Communications Centre Engineer - Competitive salaries + excellent benefits - Central London Getting the best out of technology is critical to helping us protect the UK. Join MI5 and use your skills and experience to ... more >
More job opportunities