A.2.3   Analyzing the Data Table
A.2.3.1    Methodology

The information contained in the main table and sub-tables covers the whole book. It is therefore necessary to break it down into some more readily digestible form. For this reason, I wrote the program TABAN 99 (TABle ANalysis). This accepted either the main table, or any of the sub-tables as input, and proceeded to work sequentially through the table, tabulating for each word type the number of times it was used by counting the references. From this the relative frequency was derived and output in a list, together with the words and their counts. At the end of the list were printed the number of types, the number of tokens, and the ratio of the natural logarithms of the two for the whole list. (See section 6.3 for discussion of the importance of this measure). A breakdown by sections was also produced by this program with all the attendant information. In this case, the frequencies were relative to the number of words used in the particular section, not the whole book.

This meant that the number of times a word was used was given as a percentage of the number of words used in that particular section, rather than relative to usage for the whole book. This provides a much truer picture of the usage in that section, by avoiding the levelling effect caused by a comparison with overall usage. The total possible number of lists that could have been created was sixty three - seven persons (six characters and the narrator) multiplied by nine sections. In fact, the true number was fifty three, as only Bernard speaks in chapter nine and not all the characters speak in any one chapter. These lists, one for each person and section, made possible the examination of changes in word usage for any character from section to section, and the comparison of characters at any stage of the book. These lists of words were printed out in two ways.

Firstly, they were sorted alphabetically by word to enable quicker location of any given word - this was the way they were originally produced by the program. Secondly, they were used as input to the system SORT package and arranged by descending frequency of usage to enable one to find the most frequently used words for each character and each section, as well as to examine the relative importance of various words. The lists were also used as input to the SPSS routines (see below).

