Chapter 1

Acknowledgements
My thanks are due first to the English Department for their support of this study, and in particular to my supervisor, Dr. David Bennett, for his constant encouragement, assistance, and advice.

I am also indebted to my wife, Dawn, without whose love, patience, support, and encouragement this study would never have been completed.

Common Abbreviations
AWD: A Writer's Diary - Virginia Woolf's Diary as edited by Leonard Woolf . This contained only direct references to her works, and such additional material as he thought necessary for their comprehension.

DVW: Diary of Virginia Woolf - the complete and unexpurgated text, as edited by her nephew Quentin Bell, and his daughter Ann Olivier.

All references to The Waves are to the Hogarth Press Uniform Edition.

1.1 Background
The computer has been used in literary and linguistic research since the 1950s. Its first uses were to produce conventional research aids such as concordances, dictionaries, lexicons and the like. These had been done previously by hand, but could be now produced more quickly, more accurately, and to a higher standard. Instead of it being a lifetime's work to produce, say, a concordance to Shakespeare, it now became just a matter of preparing the input in machine-readable form, feeding it into the computer, and running a program - for a few hours to a few days, depending on the size of the text. These early uses of the computer created familiar products. The critics had previous experience in their use, their value was obvious, nothing was new. In effect, the computer was being used to provide support materials for the status quo.

During the sixties, the computer began to be used for more complicated lexico-statistical studies. Much of this work was based on theories developed before the advent of the computer, theories which had previously only been tested on small samples of text due to time considerations. Now, the researchers could use much larger samples, even the complete text if necessary. This was a new application of the computer to literary texts, and the products of the research were not always familiar to the critics. Some of the old results were confirmed or modified as they were tested on a larger base, but much of the work was completely new. Among the ground-breaking work done at this time was the setting-up and analysis of the Standard Corpus of Present-Day Edited American English at Brown University ¹ (intended as a standard reference for lexico-statistical measures), an attempt to resolve the disputed authorship of the Federalist papers by Frederick Mosteller and David L. Wallace, ² and the start of Morton's work on the New Testament, ³ with particular reference to the Pauline authorship of the Epistles.

Somewhat later, the computer's worth began to be realized in its ability to do studies that had never been possible before, because only extensive use of the computer made them so. These included such things as complicated stylistic studies, image patterning, and thematic analysis. These were new, based on unfamiliar assumptions, and often developed through techniques that seemed alarmingly mathematical. The analyses were unlike anything that had been attempted before. Many critics were often unable to understand either the results or the thinking behind the work from which they were derived. It is at this stage that we currently find ourselves.

Thus there has been an enormous advance in the use of the computer in literary and linguistic research over the last twenty years, from its use as an adjunct to conventional research to its use as a research tool in its own right. This advance has not been matched by a corresponding development in the critics' use of the results of this research. This is only a problem if it is deemed desirable that the developments should correspond. Is there then, and should there be, any relationship between the modern use of the computer to analyze texts, and the modern schools of criticism?

Although many of the modern schools of criticism talk about the need for a 'science of literature', the science in their work seems notably lacking. Traditionally, a science is based on the sequence of i) observing a set of phenomena, ii) forming a hypothesis to explain the phenomena, and iii) performing some investigation to test the hypothesis. If the hypothesis is found to be proven, an attempt is usually made to generalize it. In addition, if the science is to be applied rather than pure, the results should be related to some application of practical worth.

In contrast to this, what one tends to find in the critical literature (particularly under the banner of 'structuralist' criticism) are numbers of impressive sounding theories and detailed, passionate, involved discussion of those theories, without much evidence of any possible practical use. Attempts to apply the theories or methods to real texts, rather than to theoretical abstractions, are infrequent. When this does occur, the object of investigation is usually a single text and a short one at that. Studies of a text of novel length or the equivalent are the exception rather than the rule. It is true that there is a large corpus of purely theoretical criticism which attempts to address whole oeuvres but it does so without examining the individual texts in detail. Moreover, the process by which these theoretical investigations take place, or by which the theories are refined, is generally lacking in scientific rigour.

How then might one put the 'science' into the 'science of literature'? There are certain basic requirements for any theory within a scientific discipline. Firstly, the theory must be applicable to more than one unique case. Some general applicability, or at least general within a defined group, must be demonstrated. Secondly, the results on which the theory is based must be obtained objectively , not subjectively. Thirdly, the results must be repeatable. Is it possible for literary criticism to approach these ideals, and if so, how closely?

I believe that judicious use of the computer may be able to offer at least a partial solution to this question. Something which could be called Computer-assisted Criticism might well be a useful advance. (I would regard the use of conventional literary aids which had been produced through the use of a computer as only a minor subset of computer-assisted criticism). How then might this new method function, and how would it relate to some of the modernist schools of literary criticism? Before answering this, it is necessary to examine some of the assumptions behind such a computer approach to a text, how these assumptions shape the approach taken, and how these assumptions tie in with existing schools of literary thought. This requires a clear understanding of the differences between a human reader and a computer as regards their perception of a text. For this, some introduction to how a computer processes text is required.

1.2 Computer Text Processing

In the most general terms, the computer is a device which manipulates symbols. The symbols ⁴ with which it deals are represented internally by the use of numbers ranging from 0 to 127 (or occasionally 255). In the case of text, a unique number has been assigned to each letter in the alphabet (both upper and lower case), each digit, and most of the common punctuation marks. These numbers or codes can then be manipulated by the computer in whatever way is desired.

For codes representing characters, this manipulation at its lowest level consists of the movement of characters from one place to another, and the comparison of characters according to alphabetic sequence in order to determine whether one is larger, smaller, or equal to the other. For codes representing numbers, all of the above operations are possible, in addition to the normal arithmetical operations of addition, multiplication, division, and subtraction. Using these basic operations as building blocks it is possible to perform quite complex analyses.

Thus, in order to process text of any sort, it must be converted from marks on paper to these internal codes. Although there has been a good deal of progress made in the development of optical character readers, at present it is still more economical to have the text typed directly into the computer. ⁵ Normally, this transcription preserves the appearance of the original text as much as possible, one line on the screen corresponding with one line of text. The only difference is that special features of the text (such as italics, underlining, or boldface) will need to be signalled explicitly to the computer. This is normally done by prefixing the affected portion of text with a special character, such as a hash sign (#). This encoded text can then be read by both computer and human without great difficulty. In the case of the computer the form of this 'reading' will depend on the type of the analysis being performed.

As far as the computer is concerned the text consists of one long string of characters, with none of the divisions of a book into words, sentences, paragraphs, pages, and chapters. Separated words and explicitly signalled sentences are typographical conventions that are comparatively recent in the history of written communication. It is instructive to remember that they are not strictly necessary for an understanding of the text. Pre-mediaeval scribes managed well without them. Similarly, pages are a by-product of the layout of the book on the paper, and paragraphing and chapter divisions are deliberately signalled in the book medium by typographical conventions. If the recognition of any of these is required by the computer there are two types of solution. Either they must be made explicit via some appropriate coding convention, or they must be defined using rules the computer can interpret. Often a combination of these strategies is required.

The simplest working definition of a word for a computer is everything between two spaces. Punctuation that is typically appended to words, such as commas and the full stop at the end of a sentence, confuses the issue somewhat, but the computer can be instructed as to how to handle these cases. Sentences can be assumed to be everything between two full stops, although reported indirect speech can cause problems. Paragraphs are usually more dependent on the typography of the text under consideration. A space or spaces at the start of a line is often a good means of detection. Page and chapter numbers need to be made distinct from any numbers that may occur naturally in the text by prefixing them with some character that does not occur elsewhere in the text. This prepared text can now be processed in whatever manner is desired.

This processing occurs through the action of computer programs. A program is simply a set of instructions that the computer can follow, describing how to perform a specified task. There are many standard programs, or packages of programs available for text-processing. Some of these have been specifically designed for this purpose, and some have features which can be adapted. In particular, the more conventional aids of lexicons and concordances can be created using a variety of programs. In the case of newer and more speculative analyses, researchers may decide, or be forced, to write their own programs.

As an example of how the computer might process text, consider the production of a concordance. The computer would be instructed to 'read' through the text, one character at a time, assembling a word out of the sequence of read characters. Each time it found a space it would end the current word, and begin another. This word would then be placed in a list, thus gradually building up a list of all the words used. If duplicates were to be eliminated from this list, a list of only distinct words would be obtained. For each word a certain amount of context would also be stored; this might be the whole sentence in which the word occurred, or only a specified number of characters on each side. This list would be kept sorted alphabetically, and at the completion of the process, the computer could be instructed to print either the entire list, or only those portions which involve certain words6.

Whatever the analysis chosen, the output would usually be stored electronically in some way, for later recall or further processing. In particular, re-arrangement of the information or graphical display of selected portions might be required. (For further references to the types and range of possible text-processing operations, the reader is referred to the final section of the Bibliography).

[ Skip to Next Chapter ] [ Table of Contents ] [ E-mail Author ]

Last modified: Monday, 11-Dec-2017 14:42:13 AEDT