Better than Print?

Hypermedia Scholarly Publishing

and the World Wide Web

Andrew Treloar, School of Computing and Mathematics, Deakin University, Rusden Campus, 662 Blackburn Road, Clayton, 3168, Australia. Phone +61 3 9244 7461. Fax +61 3 9244 7460. Email: Andrew.Treloar@deakin.edu.au. Home Page: Andrew Treloar

Paper presented at the Victorian Association for Library Automation (VALA) 1996 Biennial Conference, Melbourne, January 1996. The presentation slides are also available online (make sure you notice the titles!). Last updated June 5, 1996.

Introduction

This paper will consider the possibilities inherent in scholarly publishing on the World Wide Web and compares them to traditional print publishing. The paper starts by considering those technologies used for scholarly publishing to date. The four main networked electronic technologies (listserv archives, AFTP repositories, gopher servers, and the Web) are then contrasted with traditional print publishing technologies. The paper next considers some of the issues for electronic scholarly publishing with particular application to the Web environment. Finally, some tentative conclusions are drawn about the likely direction of scholarly publishing.

The paper will not deal with the available tools for Web publishing. This is a topic in its own right and is extensively covered elsewhere. Nor will it consider electronic publishing in general, or scholarly electronic publishing over other media (such as CD-ROM).

The paper has been designed in a self-exemplary manner for parallel print and electronic publishing. Each piece of underlined text is a hyperlink in the electronic version. In the print version, the footnote contains the URL for the hyperlink. My apologies to those print purists who hoped that underlining had gone the way of Courier and upper-case only headings. The online version is available at <http://www.deakin.edu.au/people/aet/vala96/>.

Electronic Scholarly Publishing

For the purposes of this paper, scholarly publishing will be taken to mean the production of journal articles, both refereed and non-refereed. The focus will be on what Harnad (1995b) calls 'esoteric' publication publication by specialists for other specialists- as opposed to trade publication. Esoteric publishing is a much more significant publishing activity for most academics than either trade publication or the production of monographs, and probably represents the majority of all academics' published output.

Such publishing has traditionally taken place using the technology of print. This is still the primary technology for all disciplines, and is also the technology that provides the official archival record for almost all publications. However, print publication suffers from a number of disadvantages:

Journals tend to be slow to appear, with Harnad 1991 identifying the lag between writing and publication as their major disadvantage.
They cannot be directly searched, leading to a large market for secondary abstracting and indexing services.
They are limited to information that can be represented statically in print.
Their mechanisms for hyper-linking are clumsy at best.
They are costly to produce, distribute and store ( Odlyzko 1995)

For all these reasons, as soon as the available technology made it practicable, pioneering scholars began to use whatever means they could to produce and distribute their writings electronically. Such electronic publishing is sometimes referred to as epublishing, by analogy with email (for an excellent selective bibliography on the subject of scholarly electronic publishing over networks, refer to Bailey 1995). In roughly chronological order, the technologies adopted were:

listserv,
anonymous file transfer protocol (AFTP),
gopher,
World Wide Web,
.Adobe's Portable Document Format (PDF)

New technologies tended to be used in addition to older technologies, rather than supplanting them. Thus, it is not unusual to find journals that were initially distributed by listserv, and which then added AFTP, and later perhaps gopher or Web access.

In order to be successful, all of the above technologies needed to provide either equivalent functionality to print, or if this was not possible, enough alternative functionality to compensate for any deficiencies. In practice, it turns out that for any scholarly publishing medium to be useful, three core sets of functions are needed:

ability to produce and format the publication
ability to notify users of new issues of the publication
ability for users to access the publication.

How did these new technologies provide these functions, and how do they compare to print?

Production and Formatting

To begin with, published information needs to be produced and formatted in a way that the scholar can use. In all cases, the technology chosen places constraints on what can be represented and how.

Listserv archives are usually restricted to documents in 7-bit ASCII. This is because of the need for such documents to pass through email gateways in transit and because no assumptions can be made about the display device at the other end.

Anonymous File Transfer Protocol (AFTP) archives can be used to store any kind of file. In practice, most ejournals using this technology have tended to use 7-bit ASCII text documents. Some journals are experimenting with storing articles in richer formats like Hypertext Markup Language (HTML) or Postscript.

Gopher servers can also provide a range of document types, but most ejournals mounted on gopher servers also store documents in 7-bit ASCII text. A wider range of Multipurpose Internet Mail Extension (MIME) types is now supported by available gopher clients and servers - the lack of adoption of this facility to distribute documents in other formats is probably being affected by the general rise in popularity of the Web.

World Wide Web documents are written in HTML. This provides for formatted text, inline graphics, hyperlinks within documents, links to other HTML documents, and links to documents in other formats altogether. However, the scholar writing for the Web needs to be aware that a wide range of browsers will be used to access their work. Not all browsers format HTML in the same way, and the available range of markup tags is restricted, particularly compared to SGML. Thus a lesser degree of control over the final appearance of the document is inevitable, compared to the richness of print.

Adobe's Acrobat (PDF) format is a dialect of Postscript. PDF provides for device-independent, page-based cross-platform electronic documents. Readers are available for most popular platforms. Future versions of Netscape's Navigator product will include support for page-at-a-time viewing of PDF files. PDF is a very good solution for complex electronic documents with a high graphical content or lots of formulæ. An example of an ejournal using PDF is the Cajun Project being developed by the Electronic Publishing Research Group.

At first glance, print publishing might seem to provide few restrictions; multiple fonts, sidebars and images are all possible. However, hyperlinks within the one publication are clumsy, and links (footnotes and citations) to other publications rely on the scholar having ready access to the publications linked to. As well, print is limited to information that can be represented statically on paper. Audio and video are impossible. For most publications, colour still images are technically possible, but prohibitively expensive.

Notification

In order to access a new scholarly publication, the scholar needs to be notified of its existence.

In the domain of epublishing, the standard solution to the notification problem is to use one of a number of of computer-mediated communication technologies. By far the most popular is electronic mail, with network news a distant second. Two distinct strategies can be employed. The first is to email the entire text of the latest issue of an ejournal direct to a scholar's mailbox. In this case, the notification is directly analogous to the arrival of a print journal. An alternative increasingly being adopted is to notify the scholar of the publication of a new journal, include author, title and abstract information, and provide advice on how to access either the entire journal or particular articles of interest. For FTP, Gopher and Web journals, this access information is usually in the form of a Uniform Resource Locator (URL).

In the print world, notification is often limited to the physical arrival of a new issue of a journal (often on a semi-regular, predictable schedule). If the journal comes to a library, the scholar has to check the shelves periodically, or rely on some sort of alerting service. Such a service might be provided by the library (in the form of photocopied contents pages) or a commercial information provider like DIALOG (via the results of an SDI search on a contents database). Alternatively, scholars can directly search online databases of abstracts and citations looking for relevant information, but this requires them to take the initiative and can easily get crowded out of a busy schedule.

Access

Once notified, the scholar needs to be able to gain access to the information. This includes locating the journal, and being able to identify and read articles of interest.

Listserv archives enable scholars to access information via email. All that is required is to email a GET command to the listserver address requesting that a specified file be sent by return email. As email is the lowest common denominator for users of the Internet, this provides the widest possible audience. As an example, consider the reference in this paper to Harnad (1991). This article in the refereed ejournal Public-Access Computer Systems Review (PACS-R) can be retrieved by sending the e-mail message get harnad prv2n1 f=mail to listserv@uhupvm1. Of course, before issuing a GET command, one needs to know that the file exists. Some journals, including PACS-R , handle this by sending the table of contents and abstracts to users subscribed to the PACS-L or PACS-P mailing lists. Alternatively, it is possible to email commands to some listservers instructing them to search a database and return a list of articles that match the search criteria. These articles can then be retrieved as above.

Scholars can access articles in anonymous FTP archives either by using a dedicated FTP client, or by providing an FTP URL to a Web browser like Lynx, Mosaic or Netscape. If the URL formalism is not being used, then the FTP location of the article will need to specify host machine, directory path and filename. For example, the information encoded in the URL FTP://cogsci.ecs.soton.ac.uk/pub/harnad/Harnad/harnad95.quo.vadis can also be expanded into (more or less) plain English as 'Make an anonymous FTP connection to cogsci.ecs.soton.ac.uk, move into the directory pub/harnad/Harnad/ and get the file harnad95.quo.vadis'. The URL formalism has the advantage of being more compact as well as parseable by both humans and machines. One example of a journal accessed by AFTP is Psycholoquy, edited by Stevan Harnad.

Gopher was initially developed to provide a basis for mounting Campus Wide Information Systems (CWISs). It is based around the idea of hierarchical menus, and allows the server administrators a lot of flexibility in how they structure their information space. One fairly standard way to mount ejournals on a gopher server is to have a menu of possible journals. Each journal points to a menu of issues for that journal. Each issue points to the individual articles. Given unambiguous information about the path to be followed, scholars can navigate through the menus until they locate the files they want. It is also possible to provide Gopher URLs for direct access using a Web browser. An example of a journal available through Gopher is the Mathematical Physics Electronic Journal .

The Web, with its non-hierarchical document-based networked hypermedia architecture provides a much richer environment for electronic publishing. Documents can either be reached by following an existing link, or can be accessed directly by entering a valid URL. Documents can in turn refer to other documents and provide direct links to them (something that is not possible with documents accessed using a Gopher client). Examples of a range of scholarly journals on the Web will be discussed below. The Web can also be used to point to documents in PDF format.

In the print world, if the journal is delivered directly to the user, the problem of journal location is limited to finding the journal within the context of the scholar's own personal information management system. If the journal is delivered to the library, it will be filed in some well-defined sequence. To assist with locating articles within journals, the publishing industry has developed a range of standard tools: contents pages at the front of issues, yearly cumulative printed indexes, and the like.

Better than Print?

As scholarly journal publishing continues what some (Odlyzko 1995, Harnad 1995b) regard as its inevitable transition to an electronic form, a number of issues need to be confronted. A number of these are applicable to all forms of electronic publishing. Others are either specific to, or have the greatest impact on, the Web. The list below is not intended to be exhaustive. Barry (1995) provides another perspective on some of these issues.

Document Durability

This is a term taken from Kaufer and Carley (1993), and refers to the length of time the article is available for communicative transactions. Paper documents printed on paper that is not acid-free have a durability of some 100 years unless corrective action is taken. The durability of Web documents is entirely unknown, but there are no technological reasons for their life to be limited in any way, provided they are archived in some systematic way. At present there are no mechanisms to ensure that this will occur.

In many ways, the digital nature of all electronic publishing can be both a strength and a weakness in the area of durability. A strength, because digital documents can easily be copied and replicated at multiple sites around the world. A weakness, because destroying a digital document is far easier than destroying a physical document. It is easy to assume that the document will exist elsewhere on the Net and that the fate of a single copy is irrelevant. Of course, there is no mechanism to prevent everyone making this assumption and causing the loss for ever of a piece of scholarship. In some ways, the analogy of the single manuscript forgotten on top of a cupboard in a monastery somewhere in the Dark Ages may well be a forgotten directory on a rarely used hard-disk somewhere in a university. Unfortunately, it is all to easy to delete a directory - throwing away a manuscript without realising is somewhat harder. Given the lack of any mechanism to ensure the archiving of print publications, it seems unlikely (although relatively technologically simple) that anything will be done about the situation for digital documents.

Multimedia articles

The Web allows us to dramatically expand our view of what is possible within a scholarly publication. A Web document can directly include colour images, something only reserved for a very few print publications. In addition, HTML documents can provide links to video clips and sound files, as well as access to other programs through Web gateways. This enables a significant enhancement to the traditional published scholarly document. A number of electronic scholarly journals are experimenting with the possibilities inherent in this medium.

PostModern Culture routinely contains hypermedia articles alongside more traditional text-only material. As an example, McNeilly (1995) contains links to a number of sound files which are used to illustrate particular points in the article.

I am not aware of any ejournals that use the gateway facility to provide access to data sourced from other systems. As an illustration of what might be possible, consider ERIN, the Australian Environmental Resources Information Network. While not a scholarly journal itself, this system does provide access to a wide range of scholarly information. Use of a Web gateway allows the user to generate distribution maps for nominated species and run simulation models in real time. Imagine the possibilities if a journal article allowed the reader to run a simulation directly while varying the input data and monitoring the results.

JAIR, the Journal of Artificial Intelligence Research , is using the Web to deliver articles in Postcript or HTML format. As an example, Schlimmer & Hermens (1993) is available in both a PostScript version and an HTML version. JAIR is also experimenting with delivering other forms of supporting information. The Schlimmer and Hermens (1993) article comes with an appendix containing a 1.3MB Quicktime video which illustrates some of their research findings.

At the moment at least three things are limiting the wider use of anything other than text in scholarly publishing:

network bandwidth,
the requirement to code for non-graphical browsers, and
scholarly conservatism.

Bandwidth is widely predicted (Odlyzko 1995]) to become a much less severe limitation as scholarly use of the Internet piggybacks on the infrastructure servicing video on demand and similar services. Bill Gates talks about bandwidth being 'essentially infinite' within the decade. Many developed countries are proposing to run cable connections to individual households that will support 10 Mbps at least. Therefore, bandwidth seems to be a short-term problem at worst.

While there will no doubt be an application for VT100 Web browsers like Lynx for a few years, the computing world is rapidly going graphical. Already the majority of Web browsers run under a GUI, and this trend will continue. Having to code for non-graphical browsers is probably another short-term difficulty.

Scholarly conservatism may prove a more long-term constraint, only susceptible to generational change. Many scholars will no doubt only use the Web (if at all) to publish what they publish already but faster and in electronic form. The habits of centuries of print publishing (in the case of scholars in general) and of decades of practice (in the case of individual scholars) will take a while to change.

Interactivity

The Web makes it possible for authors to provide access to extension material that supplements or complements their primary publications. Stevan Harnad talks about 'scholarly skywriting' (Harnad 1990) and argues for supplementing peer review with interactive publication in the form of open peer commentary on published and ongoing work Harnad (1995a). In the spirit of this suggestion, JAIR, the Journal of Artificial Intelligence Research , has just implemented a facility to allow readers to comment on published articles and to review the comments of others. Harnad himself has archived contributions from readers to a discussion of publicly retrievable FTP archives for esoteric science and scholarship as an example of what is possible.

The High Energy Physics community has already moved to a model of electronic publishing which allows for ongoing corrections and addenda. The hep-th e-print archive which provides this facility 'serves over 20,000 users from more than 60 countries, and processes over 30,000 messages per day' (Ginsparg 1994).

The Web's ability to link to other information makes it possible to envisage a range of extensions to traditional scholarly publishing. These include:

access to the primary data, thus allowing researchers to check the data analysis;
links to earlier versions of a publication, enabling other scholars to track its development over time; and
pre-publication access to related ongoing research reports.

Little of this nature is happening at present, but the possibilities are certainly wider than the few suggestions outlined above.

Designed for Screen or Print?

Deciding how to organize a Web document depends somewhat on whether the document is intended to be read largely on screen, or printed out, read (perhaps annotated) and then filed. In fact, the entire issue of the most appropriate style for HTML documents in general is a vexed one.

Price-Wilkins (1994a) argues that "because the Web does not include structure awareness in its protocol and because HTML markup provides so little support for structural representation of features, the author and the administrator are forced to fragment documents into a sets of reasonably sized components.". This is no doubt true for large documents with complex internal structures, but is less of an issue for the shorter documents typical of scholarly publishing.

Tim Berners-Lee's preferred style is for shortish (up to 5 pages) nodes linked together in some logical sequence, preferably based on a tree structure. On its own, this implies that the reader will have to navigate back up branches in order to access the next section. Documents designed using this model should provide the reader with a link labelled "next" at the end of each node to let them move through the document in a linear manner if desired. This style works well for things like online reference material but seems less appropriate for scholarly publishing. A scholarly article is more of a single entity and should be represented as such. If the article is a long one, it may be appropriate to split it into sections or place a table of contents with links to internal anchors at the beginning. The advantage of keeping the article as an entity is that the user can easily print it out (if required), without having to retrieve multiple segments and ensure that they are collated in the correct order. Until a majority of the intended audience is comfortable with reading entirely from the screen, and has the hardware to make this possible, the likelihood that material will be printed out has to be kept in mind when writing scholarly Web documents.

PDF documents are designed around a page model so the issue of document design is less critical. It is possible to choose a page size smaller than normal paper but then the documents will translate less well to paper output. PDF documents can also be designed to have additional navigation features such as live tables of contents or thumbnails but these only just compensate for the deficiencies of the viewing software relative to flicking through paper.

Conclusion

Electronic publication of scholarly 'esoteric' publication is continuing to grow in popularity. An increasing number of ejournals are adding Web access to their range of access technologies. As the Web continues to grow in popularity, and as the ratio between all potential readers and potential readers with Web access approaches unity, I suspect that older electronic delivery technologies will simply fade away. The shift from text-only production to production in some graphically richer form may well take a little longer. A number of journals now have a Web homepage that points to articles in 7-bit ASCII, but have not yet made the change to HTML or PDF for the articles themselves.

In the longer term, the Web is probably not the future of scholarly publishing. It is both a part of the present, and a pointer to the future. Other technologies will no doubt surpass the Web in time. Hyper-G looms as a possibility, and Project Xanadu may move from virtuality to reality before the end of the millennium. The significance of the Web is the way in which it enables a far more significant break from print than has been achieved to date. It does this because it does all that print does and then more. For scholars, exploring the implications of that more for their publishing and communication is sufficient challenge for the near term.

References

C. Bailey Jr. (1995), "Network-Based Electronic Publishing of Scholarly Works: A Selective Bibliography", The Public-Access Computer Systems Review , Vol. 6, Number 1.

T. Barry (1994), " Publishing on the Internet with World Wide Web ", in Proceedings of CAUSE '94 in Australia , CAUDIT/CAUL, Melbourne.

T. Barry (1995), "Network Publishing on the Internet in Australia", in The Virtual Information Experience - Proceedings of Information Online and OnDisc '95 , Information Science Section, Australian Library and Information Association, pp. 239-249.

P. Ginsparg (1994), "First Steps towards Electronic Research Communication", Computers in Physics , August.

S. Harnad (1990), "Scholarly Skywriting and the Prepublication Continuum of Scientific Inquiry", in Psychological Science , Vol. 1, pp. 342 - 343 (reprinted in Current Contents 45: 9-13, November 11 1991).

S. Harnad (1991), "Post-Gutenberg Galaxy: The Fourth Revolution in the Means of Production of Knowledge", in The Public-Access Computer Systems Review , Vol. 2, No.1, pp. 39-53 .

S. Harnad, (1995a), "Implementing Peer Review on the Net: Scientific Quality Control in Scholarly Electronic Journals", in Peek, R. & Newby, G. (Eds.), Electronic Publishing Confronts Academia: The Agenda for the Year 2000 . Cambridge MA: MIT Press.

S. Harnad, (1995b) "Electronic Scholarly Publication: Quo Vadis?", in Serials Review Vol. 21, No. 1, pp. 70-72.

D. S. Kaufer & K. M. Carley (1993), Communication at a Distance - The Influence of Print on Sociocultural Organization and Change , Lawrence Erlbaum Associates.

K. McNeilly (1995), "Ugly Beauty: John Zorn and the Politics of Postmodern Music", in Postmodern Culture , Vol.5, No.2 (January).

A. Odlyzko (1995), "Tragic loss or good riddance? The impending demise of traditional scholarly journals"in Electronic Publishing Confronts Academia: The Agenda for the Year 2000 , Robin P. Peek and Gregory B. Newby, eds., MIT Press/ASIS monograph, MIT Press.

J. Price-Wilkin (1994a), "Using the World-Wide Web to Deliver Complex Electronic Documents: Implications for Libraries" in The Public-Access Computer Systems Review , Vol. 5, No. 3, pp. 5-21.

J. Price-Wilkin (1994b), "A Gateway Between the World-Wide Web and PAT: Exploiting SGML Through the Web.", in The Public-Access Computer Systems Review , Vol. 5, No. 7 , pp. 5-27.

D. Schauder (1994), Electronic Publishing of Professional Articles: Attitudes of Academics and Implications for the Scholarly Communication Industry , Unpublished Ph. D. Dissertation, University of Melbourne.

J. C. Schlimmer & L. A. Hermens (1993),"Software Agents: Completing Patterns and Constructing User Interfaces", Journal of Artificial Intelligence Research , Vol. 1, pp. 61-89.