The original works of famous people, medieval copies of lost originals, and ancient texts on papyrus and stone are studied and published not just in Australia but all over the world. These manuscripts are not perfect. They contains many mistakes, corrections, revisions, different versions of a very complex nature, arising for example in the numerous revisions a poet might make, where the genesis of the work is itself the subject of study. On the other hand, where there are multiple manuscripts of one work the collation process itself produces the same network of versions as created by the natural revision process.
This project description is concerned largely with the phenomenon of textual variants. A variant is a piece of content, usually text, in the transcription of a manuscript, which acts as an alternative to another piece of content in the same location. Variants can arise in several ways:
(a) corrections and replacements
(b) open alternatives, where the original version has not been crossed out
(c) from the collation of several manuscripts of one work
(d) conjectures and corrections by scholars.
All of these are meant when the term `variant' is used in this project description.
The idea for this viewer/editor came out of work done over many years on the Vienna Edition of Wittgenstein [1]. Ludwig Wittgenstein was unusual as a famous philosopher in that he published only one small book, the Tractatus Logico-Philosophicus, during his lifetime. When he died in 1951 he left behind 30,000 pages of unfinished manuscripts, typescripts and collections of cuttings. They were full of corrections, and in particular open variants - multiple alternatives of equal validity. Starting in 1975 a group of researchers at Tübingen in Germany set out to transcribe these texts by recording the exact structure of the originals. Their idea of how to encode the variants, since copied or duplicated by others, was to represent them as sets of alternatives, with the additional possibility of nesting or recursion. An example from that era will make their method clearer:
Recursion here is used in the case of `Wir hier beobachten/sehen/ die einfachen und starren Regeln zu vergleichen'. In reality recursion is never actually necessary, although it does save on typing.
However, this representation also introduces much unsightly copying: for example, in the complex `Solche Überlegungen/Überlegungen wie diese/Solche Überlegungen/' the words `Solche' and `Überlegungen' are repeated several times, although in the manuscript they occur only once.
There are two problems with this:
(a) During editing repeated text may be altered in one version only, leading to an inconsistency.
(b) In complex variant sets the repeated text can actually block the attachment of other variants to the correct points within the structure. Unlike the computer-based transcriber, the writer of a manuscript is not constrained by the limitation of recursive structures and will naturally attach alternatives wherever he or she sees fit.
There were many other groups encoding manuscripts and other works at about this time. The Oxford Text Archive, for example, gathered an archive of electronic texts and stored them for redistribution [2]. The obvious need soon arose for establishing a standard to encode these various texts which, in the case of the OTA were `of any literary type, genre, or period, including transcriptions of manuscripts or other similar materials'. In a joint American/European effort the `Text Encoding Initiative' was formed to lay down guidelines for text encoding in the Humanities [3]. They chose SGML as the basis for their scheme. This was the `Standard Generalised Markup Language' which had been developed by Charles Goldfarb of IBM, originally for computer documentation [4]. Their output was essentially an elaborate `DTD' or `Document Type Definition', which in its first finished form was published as `TEI P3' in 1994 [5]. Since then version P4 has been produced, which changed the encoding standard to xml, which the TEI consortium itself had helped to formulate [6]. Its authors were quite aware that the TEI DTD was too large and cumbersome to be used in its entirety for any one project. It was accordingly divided into sets of element definitions appropriate for each class of texts. By means of parameter entity references a given text could call upon one or more of these modules and even tailor them to its use.
The most popular such customisation of the TEI tag set is called TEI Lite. Yet TEI Lite provides only rudimentary coding for additions and deletions, and does not allow them to be grouped together. It is tempting but impractical to apply this scheme to the coding of manuscripts, since no computer program can ever recover from the bare <del> and <add> elements exactly what is a variant of what.
TEI also offers proper encoding methods for variants, although the encoding model seems more influenced by the structure the critical edition than by the structure of the manuscripts it is meant to represent [7]. The key element is <app>, which can contain one or more <rdg> elements [8]:
<rdg wit="El">Experience though noon Auctoritee</rdg>
<rdg wit="La">Experiment thogh noon Auctoritee</rdg>
<rdg wit="Ra2">Eryment though none auctorite</rdg>
A <rdg> element may contain another <app> element, and this nesting facility makes it very nearly equivalent to the recursive variant sets used by the Tübingen group. However, in the TEI case there is no general requirement for <app> elements to be embedded where the variants actually occur. They can even exist in an external file and are generally attached to the separate base text by one of three methods:
(a) `Location-referenced' method. The `loc'attribute is used to attach the apparatus to a single point in the base text, much like a footnote. Since this method does not spell out what is a variant of what, it is of no use here.
(b) `Double endpoint attachment' connects an entire <app> structure to two points within the base text, which may be represented by the id of a pre-existing element or by a specifically added anchor tag, as in the following example [8]:
<anchor id="a117.1"/> of so parfit
<app from="a117.1" to="a117.3">
<lem wit="Hg">of so parfit wys</lem>
<rdg wit="Ha4">in what wise was</rdg>
<app from="a117.2" to="a117.4">
<lem wit="Hg">wys a wight</lem>
<rdg wit="El Ha4">was a wight</rdg>
</app></l>
The drawbacks with this method are explained by the guidelines thus [9]:
`Because creation and interpretation of double end-point attachment apparatus will be lengthy and difficult it is likely that they will usually be created and examined by scholars only with mechanical assistance.'
To this should be added several more:
(1) It is not possible to have a variant spanning two variants, or spanning one variant and the baseline.
(2) Each variant requires two document-wide unique identifiers. Not only would their uniqueness have to be verified each time the document is loaded but also the references to them in the possibly external apparatus would have to be checked to see that they all match identifiers in the base text. Any deletion in the base text might create dangling references in the apparatus.
(3) The `from' and `to' attributes ought to belong to the <rdg> element, not to <app>.
(4) The use of two separate endpoints is computationally unclean, and can lead to overlapping structures if one of the points lies within an element and the other outside it.
(c) `Parallel segmentation'. This method of attachment requires the <app> element to be embedded directly in the position where the variants occur. The guidelines remark [9]:
`In this method, no two variations can overlap, although they may nest. ... It is ... very easy with this method for an application to extract the full text of any one witness from the apparatus ... (but) it will become less convenient as traditions become more complex and tension develops between the need to segment on the largest variation found and the need to express the finest detail of agreement between witnesses.'
This is just a redescription of the problems noted with recursive variant sets above, to which this method is roughly equivalent.
In summary the TEI guidelines offer no perfect method for encoding variants. Method (b) is flawed and unusable. Method (c) is simple and clear but lacks the necessary expressive power.
Existing software that handles this kind of material can be broadly divided into three types:
(b) TEI based texts displayed in generic xml viewers or editors, then transformed using XSLT.
(c) Critical edition typesetting programs
The process used by (a) is simple. The user takes a canonical example text, makes a copy of it and then for each manuscript edits into the copy any differences found. The software then compares the files line by line. The output is a list of line numbers and variants, which can be formatted into xml or other form for printing. Examples of this approach are Collate, URICA! I and II and the Donne Variorum Textual Collation Program [10,11,12]. The flaws with this method are:
(a) there is a lot of redundant copying of text
(b) it is an offline method that doesn't allow direct comparison between readings in real time
(c) it doesn't handle autographic manuscripts, papyri or inscriptions
(d) it makes too strong a distinction between the base text and the apparatus
In addition (this is not a flaw in the method ), all of these programs are now more or less out of date.
Software of class (b) is currently the system of choice for many projects. The most sophisticated solutions such as XMetal and FrameMaker [13] can hide the xml markup, by attaching paragraph and character styles to xml elements, although in practice these need to be turned on now and again to verify exactly what is there. Variants, however, cannot be handled in any special way, at least in xml[14]. The technical complexity of such solutions and the large amount of markup required, make them unattractive for all but the most computer literate.
Critical edition typesettings programs such as the recent creations by Bernt Karasch and Stefan Hagel embed typesetting instructions directly into the source files [19,20]. Not only is this not a good idea if the transcriptions are intended for archiving, it also creates extra work for editors, who get distracted by formatting the text instead of transcribing it. These programs are geared to producing critical editions only and do not use very sophisticated variant encoding techniques. TUSTEP, the most popular of these programs, and still widely used in Germany, is well designed but very old. It still uses, for example, a line-editor [21].
In short, the existing software is either unprofessional, out of date, difficult to use or doesn't handle variants properly. There is a clear and strong need for a fresh approach.
1. It is a first objective of this project that a unified data structure be developed and tested which can store any kind of variant in the same basic format. For it to work in an editor it must possess enough expressive power to cater for heavily corrected autographs or complex manuscript traditions, and moreover work equally well in three distinct representations:
(a) via a visual user interface
(b) as an in-memory editable data representation
(c) in external storage format
The ability to rapidly and efficiently convert between these representations is an essential requirement of the underlying data structure.
2. The second objective is to create a manuscript viewer, for presenting manuscript transcriptions on the World Wide Web. It will display a set of variants as a stack above the line, and support nesting and line-wrap. It will most naturally take the form of a web browser plug-in.
3. The editor takes the viewer one step further. It will allow the user to create and edit such variant complexes and other manuscript structures without seeing on screen any xml tags or other computer code. Through cooperation with other research groups it is hoped to make this user interface as simple and intuitive as possible.
4. Since the advent of Unicode it has become possible to represent on screen all the world's living languages. The editing method supported by the manuscript editor should work equally well for right to left scripts (e.g. Arabic) as well as top to bottom scripts (e.g Chinese).
5. The software should work on most modern operating systems and platforms.
6. The incorporation into the editor of free software tools for the transformation and automatic publication of xml: namely Xalan (XSLT transformation) and FOP (formatting objects processor). These modern tools allow the creation of such a wide range of documents that the manuscript editor could claim to be a general purpose manuscript publishing tool.
7. The incorporation of TeX, to process mathematics as separate embedded content, will lend the manuscript editor the same power that the Wittgenstein Editon software already has in dealing with manuscripts containing complex mathematics.
8. The establishment of a plug-in API for the editor so that concordance-making, spell checking and any other tools developed for the Vienna Edition as well as third party add-ons suggested by cooperating groups, such as cladistic analysis tools, can be added in a controlled fashion.
9. Negotiate alteration of the TEI standard for manuscript encoding so that the necessary structures used by the editor/viewer can be represented in TEI-xml.
Although some progress has been made by the Text Encoding Initiative in laying down a standard for recording the phenomenon of textual variation, there is a sore lack of the practical means for entering the markup of these often complex structures. This lack of tools means that experts in literature or philosophy or some other discipline of the Humanities, who are usually no computer experts, are often faced with a complex xml-based system that is difficult to use and which they do not fully understand. What is needed is a means of presenting in a highly readable form on screen and in print a non-linear text - one that has one start but takes many possible paths to its end. A editor on a computer represents text in a purely linear form and cannot represent such a network-text. What the human editor or reader of a manuscript needs to see is all the versions at once, displayed in a highly readable form. The main innovative features of this software are:
(a) an editor/viewer for manuscripts that will display variant readings in a stacked structure akin to a musical score.
(b) a clean and efficient data structure for the storage of textual variants that is easy to represent on screen, on paper as well as in external storage, and which is more powerful than any method described in the TEI guidelines.
(c) the incorporation of modern free software publishing tools to convert the generalised xml transcriptions directly into high-quality printed editions, containing all variants, mathematics and graphical elements.
(a) and (b) are wholly innovative. (c) is an important part of the project and is needed to complete the normal edit/view/print cycle. When it was first achieved in 1991for the Vienna Edition it was also innovative. These objectives, directed towards the preservation and presentation of the written cultural heritage of humanity in general, cover such a wide range of disciplines that they constitute a significant and worthy undertaking.
This project falls under the national research priority `Smart Information Use'as detailed in `Funding Rules for Applicants 2004', where `multimedia, content generation and imaging' are cited as examples.
Some of the aims enumerated above have already been achieved. An efficient data structure for storing textual variants has been devised and will be described below. Also, a means of representing this on screen is presented, with a description of msviewer, a JAVA applet that draws variants based on this structure. These solutions cover the most technically difficult part of the whole project and together they constitute a `proof of concept'.
Figure 2 represents a natural sequence of corrections in A. Variants V 1 and V 2 overlap on the baseline. Variant V 3 overlaps partly with variant V 2 and partly with the baseline. B represents a possible on-screen layout of this set of variants using the proposed stacked structure. Note that the solid black lines record the same structure as in A, only drawn using orthogonal lines. This stacked layout would be readable if only the missing portions of the variants, namely b 1 ´, b 3 ´, b 4 ´ and V 2a ´ could somehow be supplied where needed, without disturbing the overlapping nature of the variant structure. Since these strings should always be exact copies of b 1 , b 3 , b 4 and V 2a they can be drawn easily from the originals. Consider the following user action-sequence in the proposed editor:
In figure 3A after the user issues the `Make Variant'command a bracket appears at each end of the selected text and a line opens up above it, while the baseline moves down. Using this simple mechanism the user can enter variants which follow a structure identical to that in figure 2. At some point he or she will need to copy part of an earlier variant to complete the sense. To do this, the user simply selects the desired text and drags it into position. A `clone' of the original text (here shown in grey) appears in the line. This clone cannot be edited, although it can be followed or preceded by editable text. Eventually the user constructs the same structure as in figure 2B. The uneditable strings b 1 ´, b 3 ´, b 4 ´ and V 2a ´ are not copies but clones of b 1 , b 3 , b 4 and V 2a . They address exactly the same memory as the latter, and if the user edits the originals, the clones will automatically reflect that change. In order to draw these cloned strings the program need only follow a path to the original strings. This means that although what the computer sees is a minimalist, overlapping structure, the user sees a clear and readable stack of variants with no overlap. In other words, textual cloning makes it possible to `have our cake and eat it'. The representation in memory of a cloned string is simply an object containing a pointer to the content of the parent. The representation in xml is also straightforward. Within the <rdg> element two addional elements must be defined: <parent>, which takes a single attribute "id=[tag]" where "[tag]" is an identifier unique within a given <app> structure. The other element is <clone/>, which takes the same attribute, but unlike <parent> it has no end-tag. The permissible content of a <parent> element should be the same as that of a <rdg> element, so that even nested variant sets and formatted text can be cloned. Our simple example, using this extended TEI syntax, now looks as follows:
<rdg><parent id="a1">brown </parent>fox <parent id="a2">jumps
</parent><parent id="a3">over </parent></rdg>
<rdg>white ferret <clone id="a2"/><clone id="a3"/></rdg>
<rdg><clone id="a1"/><parent id="a4">otter </parent>
<rdg><clone id="a1"/><clone id="a4"/>runs across </rdg>
Indeed the `double endpoint attachment' method for encoding variants in TEI-xml now seems rather complicated and unnecessary: Why have two different ways of encoding variants when one will suffice? If the <parent> and <clone/> elements were permitted within the <rdg> element, the need for `double endpoint attachment' disappears altogether. The computer's task in linking identifiers is now much easier, since the id-attributes are encapsulated within a single <app> structure, and no longer apply document-wide.
The best preview of the user interface that can be offered at this point is to look at the existing msviewer applet. This program is accessible in active form at http://www.wittgen-cam.ac.uk/cgi-bin/vareddemo.html if you have a JAVA enabled browser. However, the description provided here is sufficient for the reader to get a good idea of its capabilities. It has the ability to display a text containing variant-sets in either of two main modes: `expanded' or `collapsed'. In expanded mode all variants are shown, numbered from the bottom up. In collapsed mode one version, chosen by the user, is displayed. If the text has been correctly encoded each collapsed version should display a complete, coherent text. Here is a series of screen-dumps of parts of the applet showing the German example from figure 1 in expanded and collapsed modes with various versions picked out:
The applet uses colour to represent cancellations (red) and insertions (blue). Since this does not show in black and white photocopies a dotted underlining has been added to the cancellations and a solid underlining to the insertions. The exact status of these red and blue areas differs slightly inside and outside of variant sets. The reason is as follows: if an author simply cancels something he effectively creates a small variant set, one whose first version is the cancelled text and whose second is nothing. Likewise a simple insertion produces a small variant set whose first version is nothing, and whose second is the inserted text:
Now it would look strange to represent in this way long cancelled or inserted passages such as one frequently finds in autographic manuscripts, although it might be idealogically sound to do so. Instead, in expanded mode, msviewer collapses these mini variant sets and displays the text in red or blue. In collapsed mode it merely decides whether or not to display the cancellation or insertion at all. If it is displaying version 2 or more you will see insertions but no cancellations, whereas in version 1 there are no insertions. The situation within a variant set in expanded mode, however, is quite different. Here cancellations and insertions have already taken part in building the actual variant set. The various versions have been teased out, and a cancellation or insertion there can't be shown any more as a mini variant-set. Outside a variant set a cancellation or insertion represents a real structure, but inside it is just a marking. However, since the two cases look the same on screen the reader will probably not notice any difference. It is an inconsistency in theory only.
It should also be noted how easy it is to read text displayed in this form, in spite of the underlying complexity. The reader is free to associate any variant from one set with another variant on a different line in another set, just as he or she would in the original manuscript. As a further simplification the user will eventually be able to collapse individual variant-sets within the viewer. This is the first time anyone has (to my knowledge) represented recursive variant sets on-screen with line wrap. The line-wrapping algorithm in particular was difficult to discover, and a short description of it should convince the reader that the method employed is both elegant and robust.
Consider an ordinary text editor. Tokens are fitted onto a line until one is received that is too long for the remaining space. The program need only move down one line and to the left margin to place the current token. To draw several lines simultaneously no real change is required to this simple algorithm. However, when drawing a line, if the program encounters a variant set it takes these additional steps:
(a) Suspend drawing of the current line, called `the parent'.
(b) Draw a left bracket as high as the number of lines in the variant set.
(c) Create as many children of the parent as there are variants in the set.
Each of these child-lines will now draw themselves as before until they:
(1) encounter an embedded variant set. Take steps (a), (b), and (c) above.
(2) run out of room. Go back to the left margin and move down by the height of the current `big' line. Tell the parent about this, because when even one child breaks the parent must follow.
(3) run out of text. Send a message to the parent to say that one of its children has finished and how far across the page it got. After all children have done this, the parent draws the closing bracket at the x-position that was reported as farthest right, and resumes.
Two passes are required to draw the page: one to measure the heights of each `big' line and one to draw the text. The viewer can also print line-numbers if desired or break on carriage-returns in the source, as it must do when representing poetry. At the moment the text source is defined using a simplified bracketing syntax, and although not xml, it does support textual cloning.
The mseditor program will be composed of a number of existing free JAVA software modules, new ones to be written in JAVA, and other code to be adapted from existing Wittgenstein Edition software. The basic components are:
JAVATEX. This is a mechanical translation of Donald Knuth's original TEX program, made by Timothy Murphy. The existing Wittgenstein Edition software already converts the TEX output, DVI, to MIF (a textual input format for FrameMaker), and it is easy to change this so that it produces SVG instead. This is the format that FOP accepts most readily as an `instream foreign object' for conversion into PDF along with the rest of the document, and so provides a neat way of incorporating TEX formulae into the document.
FOP (Formatting Objects Processor), from the Apache Software Foundation, takes as input XSL formatting objects and formats them, typically into PDF. However, PDF is not very editable, and our experience has shown that small adjustments to the files are necessary before submitting them to the printer. Therefore, a more editable output format such as MIF should be used if possible. This can then be converted to PDF.
Xalan JAVA 2 from the Apache Software Foundation is an XSLT processor for transforming xml documents into, in this case, XSL formatting objects. It requires an xml parser, such as Xerces JAVA 2, also produced by Apache. The function of Xalan is to rearrange and generate new content from the original xml file. For each editorial project an XSL stylesheet needs to be defined, containing both formatting (XSL formatting objects) and transformation (XSLT) instructions.
The mseditor core program, although it will have to be written largely from scratch, will be able to base itself on the JAVA Class Library 1.4, which supports, among other things, ready-made control and window objects to simplify the time-consuming task of designing the user interface.
The existing MIF back-end to the Wittgenstein Edition software could be adapted to translate XSL formatting objects if FOP itself fails to supply this as a practical solution.
The following graphic shows how these modules will work together. It also shows how much of the design already exists (shaded areas), how much can be adapted from existing code (grey-hatched areas) and how much has to be written afresh (white). The speckled areas represent file formats.
Some people might think that JAVA is unsuitable for a project like this. Admittedly the language is not optimised for the creation of an editor. The alternative, however, seems worse: having to write platform-dependent modules for each platform and then having to tease apart the platform dependent/platform specific code every time a new feature is added. This will be done only if it is unavoidable. There is much that can be done with JAVA if it is too slow: use of character arrays, for example, the inclusion of C code at critical points, optimization of the drawing algorithm etc. Since JAVA and C++ share so much syntactically it would be fairly easy to convert one into the other if the need arose.
This is an opportunity for Australia to set a worldwide standard for the representation and editing of manuscripts online in libraries and in research centres across the world. Its chances of being widely adopted are enhanced by an number of factors: (a) the total absense of any user-friendly tools for the viewing and especially editing of manuscripts (b) the success of the Vienna Editon and the interest shown so far in our software design (c) the fact that it would be developed as a free software project. In this capacity it would very probably attract international contributors, particularly in the number of projects desiring collaboration. This would help to rapidly refine the tools along useful lines.
The social benefits are firstly that our own literary cultural heritage could be stored more accurately than ever before and could be displayed online, in print or electronically in a highly readable form. Secondly it is likely to be a very good advertisement for Australian technology within the Arts, and increase our international profile.
Firstly, it is intended to publish the findings of this project description as a paper in Computers and the Humanities, a journal closely connected with the TEI project in the US. Secondly, it will be a project priority to establish a website, based at the host institution, to permit the downloading of versions of the software, provide a forum for discussion, documentation of the software design and submission forms for people who, having read the material, wish to participate in some way. By submitting these pages to the major search engines it should also make it possible for people to find us. Thirdly, presentations will be prepared for seminars and conferences in Australia and overseas. The object of these will intially be to arouse interest, later to win collaboration and promote the software to interested parties. Fourthly, by joining TEI consortium to secure the emendation of the TEI standard to include the extensions presented in the paper metioned above. Finally, by cold-emailing the heads of other research projects to canvas for possible partners by offering free cooperation and the setting up on their website examples of the manuscript viewer in action.
Dr Desmond Schmidt. As Chief Investigator his role lies in coordinating the development of the proposed software tools, as well as doing a large part of the work himself. He will also be responsible for developing presentations of the work, designing and building the project's website, writing and publishing papers or any other material for the promotion of the project. He will also be occupied in attending relevant seminars and conferences in Australia and overseas, as well as initiating cooperations with interested academic institutions to help refine the proposed software tools.
As Partner Investigator Dr Nedo has been involved for many years in this work, since 1975 in Tübingen, where he lectured, then from 1981 to 1993 in Trinity College Cambridge, and then at the Cambridge Wittgenstein Archive. He has been and will be involved in promoting this research throughout Europe, using his extensive network of contacts and the promotional character of his Achive's website as well as the publication of the Vienna Edition of Wittgenstein, which will use this software immediately. He will also, subject to the granting of the UNESCO funding, contribute considerable resources in the form of one full time computer programmer who will work in Cambridge to help develop the software, and communicate regularly with the Australian part of the oeration.
[1] Ludwig Wittgenstein, Wiener Ausgabe Volumes 1-8a, Concordance, Register and Volume 10 Springer, Wien New York (1994-2001).
[2] N. Ide. and J. Veronis, (eds.) The Text Encoding Initiative: Background and Contexts. Kluwer, 1996, which reprints a special triple issue of Computers and the Humanities, 29:1, 1995, 1-3.
[4] Goldfarb, Charles, The SGML Handbook. Oxford 1994.
[5] Sperberg-McQueen, C.M. and Burnard, L. (eds.) (2002). TEI P4: Guidelines for Electronic Text Encoding and Interchange. Text Encoding Initiative Consortium. xml Version: Oxford, Providence, Charlottesville, Bergen.
[8] Sperberg-McQueen, C.M. and Burnard, L., TEI P4, section 19.2.2.
[11] Holton, M.L., The URICA! II Interactive Collation System, Computers and the Humanities 26, 1992, 139-144.
[12] Stringer, G.A. and Vilberg, W.R., The Donne Variorum Textual Collation Program, Computers and the Humanities , 21 (1987) 83-89.
[13] Corel XMetal 4.0. Adobe FrameMaker 7.0 + xml.
[14] Smith, D. Textual Variation and Vrsion Control in the TEI, Computers and the Humanities 33 (1999) 103-112.
Renear, Allen, David Durand, and Elli Mylonas. "Refining our notion of what text really is: The problem of overlapping hierarchies". Research in Humanities Computing . Oxford: Oxford University Press, 1995.
C. M. Sperberg-McQueen, Claus Huitfeldt, "Concurrent Document Hierarchies in MECS and SGML" (Internet)
G Rockwell, J Bradley, and P Monger, "Seeing the text through the trees: visualization and interactivity in text applications", Literary and Linguistic Computing, Volume 14, Issue 1, pp. 115-130.