A.2.4 Additional Data

Some information could not be derived from the main Data Table. To obtain this, it was necessary to write a special program. This program, called TEXTAN

^{100}(TEXT ANalysis), was designed to go through the text once and obtain the length of every sentence in words (where a sentence was defined as everything between the sentence delimiters ?, !, ., or a speech code.), and the length of every soliloquy, both in words, and in sentences. These lists were then broken down both by character, and by section. Using these lists of numbers as input to the SPSS BASSTATS routine (see below), it was possible to determine the mean sentence and soliloquy length for each character and section.A.2.5 Using SPSS

SPSS is a package of statistical routines, originally developed in America more than fifteen years ago. The name of the package is derived acronymically from Statistical Package for the Social Sciences, and as this suggests, the package was originally designed for psychological and related research. This is reflected in some of the coding conventions it adopts, and in its selection of statistical techniques. Many of these have no relevance to linguistic and literary research, but certain statistical routines are basic to a wide variety of fields, and it was possible to use these. The list of possible commands to SPSS is quite large, and reflects the sophistication of the package. SPSS requires the data input to it to be in a certain format and consequently special programs had to be written to rewrite the data in an acceptable form. These have not been included as they are basically mechanical in nature.

A.2.5.1 Averages

The SPSS routine BASSTATS (BASic STATisticS) provides several general and fundamental statistical measures. These include the Mean, the Variance, the Range of the input data, the Standard Error, the Kurtosis of the curve (assuming a standard distribution), the Minimum, the Maximum, the Skewness of the curve, and the Standard Deviation. Of all these, the mean or average was of most immediate interest for me. This measure was obtained for two groups of data. Firstly, it was used to determine the average wordlength for all the characters and all the sections. The data for this was taken from the table produced by TABBUILD. Special programs had to be written to extract the data and then reformat it so that SPSS could deal with it. These have also not been included. Secondly, it was used to determine the average sentence and soliloquy lengths for all the characters and sections. The data for this came from TEXTAN (section A.2.4). This too had to be reformatted to meet SPSS requirements.

A.2.5.2 Pearson Correlations

The SPSS routine PEARSON CORR is designed to provide a measure of the correlation between two variables, based on lists of the values taken by the two variables at the same time. (See also section 6.4). By way of illustration, take two variables X and Y. If high values for X are usually accompanied by high values for Y, then the correlation coefficient will positive, and close to +1.0. If high values for X are usually accompanied by low values for Y, then the correlation coefficient will be negative, and close to -1.0. If there is no relationship between values for X and values for Y, then the correlation coefficient will be close to 0. When it is required to correlate word lists for six characters, then the situation becomes more complicated. Here, the variables are the characters themselves, and the values are how often they use each word in the list of all the words used by all the characters. Two types of correlation were performed: firstly, correlations between characters for each section, and secondly, correlations between sections for each character. The first was of interest for the light it threw on the question of differentiation, and the second for development. The input to the SPSS routine was the lists of words used by the characters in each section produced by TABAN. Two further programs, TBYCHARS (Table BY CHARacters) and TBYSECTS (TABle BY SECTionS), had to be written to reformat these lists for SPSS. These have also not been included.

