Andrew Treloar's personal website
Search WWW Search

Hypermedia Online Publishing: the Transformation of the Scholarly Journal

4.6.3 Archival issues

E-journals are somewhat different to other digital library archival objects in that they do not need to be digitised. Other objects which began life as paper or film and have been digitised carry with them a wide range of archiving issues [Conway, 1996]. Objects which begin their life in digital form have as their primary archival focus the challenge of protecting these digits from alteration or loss. This area, called 'digital preservation' typically centres on "the choice of interim storage media, the life expectancy of a digital imaging system, and the concern for migrating the digital files to future systems as a way of ensuring future access" [Conway, 1996]. These concerns are dealt with in much greater detail in [Lesk, 1992] and [Task Force on Archiving of Digital Information, 1996].

E-journals as digital objects are likely to consist of text, images and perhaps some attached binary data (compiled computer code, audio, or video). According to [Graham, 1997], digital preservation of such objects consists of three problems: medium preservation; technology preservation and intellectual preservation.

Medium preservation does not (as the name suggests) involve preserving old storage media. Rather, one preserves the information by moving a copy of the digital object from one medium to another (for instance from 1.4 MByte floppy disks to 100 Mbyte Zip disks). [Rothenburg, 1995] points out that traditional media are currently much better candidates for archiving than any existing electronic media. However, provided copying from one medium to the next occurs before the hardware to read the old version becomes obsolete this is not too difficult (the author did not move early enough in one case and now has some binary digits mouldering away and forever inaccessible on an 8" floppy disk).

Technology preservation usually involves taking a digital object in one form and converting it into another (for instance converting a WordPerfect file to a Word file), preferably without losing any information. Again, this needs to be done while the conversion is still possible. Word 6.0 for the Macintosh (or any later version) will not read Word 3.0 for the Macintosh files, as the author discovered when seeking to put his Master's thesis online.With respect to binary attachments to e-journal articles, the situation is more complicated. As an example, consider a Shockwave file attached to an article in the Journal of Interactive Media in Education (see 6.3.3: Journal of Interactive Media in Education on page 99). In ten years time, it is entirely likely that the Shockwave format (if it still exists) will have changed considerably and that this attachment will need to be run on emulated hardware and software. Old digital objects require old software. Old software often requires old operating systems. Old operating systems often require old hardware. If libraries are not to become museums, then this old hardware and software will need to be emulated.

Intellectual preservation means having the confidence that what is read now is what the author wrote. The ease of copying electronic information carries with it the ease of undetectable change. Mechanisms to combat this (such as embedded digital signatures and checksums) are only now being developed. Adobe's PDF technology also lets the author of a document lock it against changes and/or require a password before opening or modifying it.

From an archiving point of view, the two de-facto standards for e-journal documents, HTML and PDF, both have problems.

HTML is a rapidly evolving standard which is intended to be upwardly compatible. However, documents written to older versions of the standard do not always display correctly with later browsers. Moreover, there are differences between the extensions supported by Netscape's Communicator and Microsoft's Internet Explorer so that documents designed for one may only display partially (or not at all) with the other. This means that in order to correctly view unchanged archived HTML one would need a succession of versions of both browsers. The likely solution to this is a migration of the Web world to the new standard markup language, XML (see 4.5.4: Document oriented solutions on page 65). This appears to be much more robust and extensible and should provide a much better archival base than HTML.

PDF suffers from being a proprietary (although documented) technology. While there are freeware readers, only Adobe products (at this stage) can produce PDF files. If Adobe decides in the future to drop support for PDF this would cause problems for all those journals who have standardised around PDF. However, Apple Computer plans to use PDF as the native image-file format in the next version of its operating system, Mac OS X (10), replacing the current PICT graphics format. This presumably makes PDF less likely to become an orphan technology.

Last modified: Monday, 18-Sep-2017 03:29:25 AEST

© Andrew Treloar, 2001. * *