Image: XML file formats illustration
Is this the end of the Office stone age?
R E L A T E D   C O N T E N T
ADVERTISEMENT

New document file formats explained

The war between Open Document and Open XML is raging but users will benefit from the XML-based formats

Mark Schroder, Personal Computer World 06 Sep 2006
Download for this article
ADVERTISEMENT

Small, compatible and secure are the watchwords of the new generation of office files.

In future, all the important office software manufacturers are planning to use XML-based file formats. Despite this, there’s no sign of a standard format emerging.

There are already two rival camps: the Open Document format being promoted by IBM, Sun (Star Office) and Openoffice.org, and Microsoft’s own variant of XML.

Office 2007 will read and write Microsoft’s own Open XML files, but it won’t support Open Document out of the box.

Microsoft has recently relented somewhat with the announcement of its Open XML Translator project, which will let developers create a bridge between the rival formats.

Although this is being presented as a battle of the document formats, both sides are technically quite similar. Both file types have a common basis in XML (Extensible Markup Language).

All alphanumeric document content – presentations, text or tables – is stored in XML files. All other document elements, such as graphics and OLE (Object Linking and Embedding) or VBA (Visual Basic for Applications) objects, are strictly separated from them.

Further XML files belonging to the document can hold supplementary information (known as metadata) about format templates and definitions, comments, paths to linked resources, the author, number of characters and so on.

Open minded
In both Open Document and Open XML, all the constituent parts of the document are kept together in a Zip container file that appears as the actual document to the user.

Both types of file use a compressed archive, which reduces the storage space required. XML is slim, but this makes it smaller still.

Embedded picture files are converted into a space-saving format during saves and then the lossless Zip compression shrinks them further. In our tests, files which were saved in the new format shrank by 50-90 per cent compared with their original size.

Better data integrity is promised with the use of a CRC checksum (Cyclic Redundancy Check) – a familiar component of the Zip compression algorithm. This checks the integrity of each file in the archive.

The CRC is highly sensitive to any modifications to the archived data. But even if part of the Zip archive contains errors, you can still make use of the remaining data.

Once a document has been saved from Star Office (Odt, Open Document Text) or Microsoft Word (DocX, Word Open XML), you can rely on the content being stored securely, having been checked by a proven algorithm.

Separate data storage, compression and CRC testing also have other advantages.

As Windows has its own decompression routine, the use of Zip compression is a plus point. If need be the files can be worked on without any special software.

Simply change the document extension from Odt or Docx to Zip. You can then view the data container like a compressed file with Windows Explorer or a Zip-compatible compression utility such as PKzip for Windows.

In the case of Openoffice.org, all the text is saved in pure XML files. You can use copy and paste to move it to another program – for example, a text editor – without having the suite’s Writer component installed on your PC.

By using a PHP script and the add-on Pclzip, it’s possible to extract the content from large quantities of documents automatically.

The possibility of harvesting all or part of the content from Office files is very attractive for organisations needing to process the data from document management systems, and XML simplifies this process.


All Desktop Computers
Tags: Office, Open Source

Like this story? Spread the news by clicking below:

Post this to Delicious del.icio.us    Post this to Digg Digg this    Post this to reddit reddit!

Permalink for this story
R E A D E R   C O M M E N T S
M A R K E T P L A C E
Get your free demo of Numara Track-It! 8 - the leading help desk solution for IT related issues.
Make presentations, review documents & share your entire desktop. 30-day free trial! (cc required).
Discover how remote support can fuel your IT business in ways you've never thought of before.
Apply ITIL best practices at your service desk while eliminating integration cost. Learn more here.
WAN based, automated, daily vulnerability assessments. Click here to try and request our whitepapers.
Have your product or service listed here >   
Sponsored links
F E A T U R E D   J O B S
Welwyn Garden City, Hertfordshire, United Kingdom | Tesco.com
Technical Specialist Infrastructure - Welwyn Garden City Who's behind the world's most successful online retailer? Just over 10 years ago we started Tesco.com (aka Dotcom). Today, we've an incredible 750,000 active customers and sales at ... more >
Hertfordshire, United Kingdom | Tesco.com
Senior Business Analyst - Hertfordshire Who's behind the world's most successful online retailer? Just over 10 years ago we started Tesco.com (aka Dotcom). Today, we've an incredible 750,000 active customers and sales at just under ... more >
Solihull, United Kingdom | Enzen Global Limited
 BUSINESS CONSULTANT - Utilities - £35,000 - £40,000 - Solihull We are in need of a Business Consultant with strong analytical skills and a penchant for learning the domain knowledge of the Utilities sector (Gas ... more >
Shinfield Park, Reading, United Kingdom | Foster Wheeler
Our UK-headquartered operations employ more than 6,000 people and we are seeking qualified and experienced IT professionals to work in our head office in Reading, Berkshire. We are currently seeking an Analyst Programmer to join ... more >
More job opportunities