A Graphical Editor for Manuscripts

1. The Problem

The representation of textual variation as it occurs within manuscripts is a difficult problem that has not yet been satisfactorily tackled. By textual variation is meant any feature that makes a piece of text an alternative to another. In the case of autographic manuscripts this covers, among others, such phenomena as additions, deletions, open alternatives and corrections by a human editor. Textual variation may also arise from the merging of multiple manuscripts produced by the copying of a lost original. In this context conjectures, variant spellings, transpositions, even expansions and abbreviations might also qualify as variants. The description ‘not satisfactorily tackled’ refers to the inadequacies of current philological software that deal with variants (Hagel 2004, Robinson 1994, Ott 1990, TEI 2001). For simplicity, these will be referred to en bloc, even though the problems listed below do not apply to all of them. The main deficiencies that persist are:

  1. Those methods that record all variants in one document copy text between variants.1 This creates an inconsistency problem: the editor has to maintain the sameness of those copies – quite a difficult task if the text consists of mark-up. In addition, the more variants there are, the more those copies interfere with the attachment of new variants.
  2. The collation method requires a copy of the entire text of each witness and compares the separate versions to generate an apparatus. Quite apart from the massive redundancy of information this entails, the worst consequence is the inflexibility of the method. It doesn’t work for single manuscripts and it doesn’t put all the information in one place, so that it can be easily searched or transformed.
  3. There is no separation of structures for the underlying manuscript text from those used to describe variants. Thus the two frequently overlap, a problem which has its origin in the choice of hierarchical content model (Renear, Mylonas and Durand 1993; Fiormonte 2003, p.185). For example, in an autographic manuscript a simple correction joining two paragraphs previously separate, necessitates the duplication of entire paragraphs within the structure recording the change.2
  4. There is no regular mechanism for representing transpositions. These are common features of autographic manuscripts, but are also frequent in ancient works of the multiple-manuscript type.3
  5. Existing solutions concentrate either on the problem of variants in the multiple or single manuscript case, and do not formulate a solution that works equally well for both. The TEI Guidelines, although they contain structures purportedly suitable for both types of manuscript, are generally admitted to be inadequate for the encoding of single manuscripts.4 Given the natural similarities between the two cases, one solution that will work for both should be possible.
  6. The future of editorial interaction with the text is not tags. A human being can only understand mark-up to a certain level of complexity. That limit is readily exceeded in practical applications, for example, when the mark-up of more than a few manuscripts is merged into one document. It is true that XML tags can be partly hidden in generic browsers, which convert them into formatting. However, they cannot represent variants in any meaningful way. The use of mark-up by the end-user dates to the 70s and 80s when ‘dot-commands’ were familiar features of word-processors such as WORDSTAR and TROFF (Goldfarb 1999). When XML was designed, features like ‘code-completion’, present in its predecessor SGML, which supported direct user-input, were removed. Today it is chiefly used to exchange information between computers of possibly different architectures, where the marked-up text is both generated and interpreted by a machine. It is, therefore, clear that an approach which exposes the end-user directly to mark-up, even via a generic browser, cannot effectively represent manuscript variants of the required complexity, nor would it appear to have any future in the modern world of graphical user interfaces.

2. The Text as a Network

Every transcription of a manuscript is an act of interpretation (TEI 2001 §2.0); even a bare textual copy of the original is an interpretation because of what it chooses to omit. Nevertheless it is still the goal of this paper to derive an encoding of manuscripts from the ‘natural’ features of the graphical entity that is a manuscript. The approach here is to proceed from the single to the multiple manuscript case because whatever its provenance, every manuscript exhibits the features that result from the expression of ideas through writing. The multiple manuscript case merely adds another dimension to that problem.

Here is a moderately complex example from the philosopher Ludwig Wittgenstein to kick things off:

Fig. 1

This example is poignant because it contains open alternatives, often transferred into a subsequent typescript, which make it rather difficult for those who believe that the ‘author’s final intention’ must be followed when preparing an edition of his work (Greetham 1994, p.341). In fact the late Elisabeth Anscombe, Wittgenstein’s former pupil, and one of the trustees of his literary estate, tells the story of how she asked Wittgenstein on his death-bed about what they should do to resolve the open variants. Wittgenstein replied that if he couldn’t decide between readings when he was at the height of his powers, how was he supposed to do so now that he was dying and under the influence of powerful drugs? (Nedo 1993, p.75). Even if the reader in general rejects the new philological vision of the text as a process (McGann 1983, pp.75, 122; Fiormonte 2003, pp.185ff.) the plurality of the text in this case is undeniable. And yet the existence of the network is not dependent on the openness of the variants. Even if an earlier variant was cancelled by the author or scribe, the transcription is still supposed to record what is in the manuscript. According to this interpretation a deletion becomes simply a marking of one branch of the network. The reader may well object: ‘why make a simple problem complex? In many manuscripts such open variants do not occur. Something written above the line always replaces what lies below. So in such cases why not just record them as additions and deletions?’5 The answer is: ‘why interpret a text in any particular way? – because it is useful to do so. Without the network it is impossible to tell what is a variant of what. Mere proximity of additions and deletions does not prove they are related; indeed they may be related even if they are separated by other words.’ With a network it is possible to label paths, to say ‘this is the path of the author’s second corrected version’ or ‘this is the path of manuscripts M, P and G’. For the network allows the user to handle the multiple manuscript case as well. It does away with the notion of a privileged copy or base text (a relic of the old paper way of doing things), and allows a new base text to be chosen at any time. It also has the potential to permit the user to see and to edit each version separately, in one document. And that is an interpretation worth making.

3. Representing the Network

To achieve this goal obviously requires some means of representing text as a network that is robust enough to survive the addition of a great many variants, however they arise. By ‘representation’ is meant not only textual representation within a file, but representation on screen or on paper, as well as representation within the computer’s memory as the structure is being edited. An obvious way forward would seem to be to define those parts containing variants as simple alternatives. Part of the example above (translated for clarity) could be written out, using slashes to delimit alternatives, and square brackets to delimit their scope:

... but [his description forgot the pawns and their moves/in his description [he left unmentioned/he didn’t mention] the pawns]/but [he left unmentioned/he didn’t mention] the pawns and their function in the game]] ...

This has already been tried by numerous research groups, including the TEI, in whose guidelines it is called ‘Parallel Segmentation’ (TEI 2001 §19.2.3). And yet a problem is immediately apparent even in this simple example – the variants ‘[he left unmentioned/he didn’t mention]’, which occur only once in the original manuscript, have to be copied into the variant ‘but the pawns and their function in the game’ from the preceding variant ‘in his description [he left unmentioned/he didn’t mention] the pawns’. This is the copying problem mentioned in section 1a. It can be reduced by allowing sets of alternatives to nest, that is, to allow sub-variants within a single variant, but it does not eliminate the problem. The reason why it does not is quite simple: the structure of a text containing variants, whether arising in a unique manuscript, or from the merging of multiple manuscripts, is a different kind of network from that described by simple or nested variant-sets. The nested variant-set model is a hierarchical or recursive structure, whereas the ‘natural’ variant network is an overlapping one. This overlapping problem has nothing whatsoever to do with the problem of overlapping hierarchies.6 An overlapping hierarchy is when two or more structural conceptions of the same text exist side by side, or within one another, but distinct, e.g. linguistic (subject, verb, object) and layout (pages, paragraphs, lines). The question here is rather of overlap within the variants of a single, indecomposable network.

Fig. 2

But can an overlapping network of variants be represented in a computer without copying, and is this structure even computable? In 1936 Alan Turing described a logical ‘Computing Machine,’ which could carry out a series of well-defined and purely mechanical steps to achieve the effective calculation of a given formula (Turing 1936). The set of problems that could be performed by his hypothetical machine is exactly the same set of problems that can be computed by the most powerful computer today. Alonso Church (1936) approached the same problem from a different perspective. He said ‘A function of positive integers is effectively calculable only if recursive’ – a formulation later shown to be equivalent to Turing’s machine. The basis of modern computing languages too: Kleene’s regular expressions (Kleene 1967), Chomsky’s rule-based grammar (Chomsky 1965), the Backus Naur Form for specifying computing languages (Naur 1960) all use recursion as the basis for defining a computable structure, based on these fundamental ideas of Church and Turing. Since the overlapping variants in the network described above are not recursive/hierarchical it seems impossible to define a computer language based on that structure.

This should not be taken to mean that the ‘natural’ overlapping network could not be represented in a computer at all. Algorithms for performing operations on ‘graphs’ (that is, networks) of various kinds have long been studied in computer science. But it is hard to see how their methods for storing a network, for example, by listing connections between vertices, could be adapted to the intuitive editing and display of textual variation. In addition, many of the algorithms that operate on graphs perform so badly that even small problems cannot be solved in a reasonable time (Sedgewick 1998, pp.415ff).

This all comes as a bit of a blow. At the start of this section it was established that a data structure to store the variant-network needed to survive transformation entailed by saving and loading the file and by drawing it on the screen. Now it turns out that this is only possible using the recursive or ‘parallel segmentation’ form of variant network, which includes copying. The problem of copying was twofold: the text of the copied variants had to be kept in sync and the copies interfered with the attachment of new variants. Although it is not possible to get around the copying, it is possible to get around these two problems.

Consider the following complex of overlapping variants where the text, for simplicity, is shown as lines:

Fig. 3

In ‘A’ variants V1, V2 and V3 represent different kinds of overlap: V2 overlaps with the text covered by V1, but both are variants of the base text. V3 overlaps partly with V2 and partly with the base text. In ‘B’ the same structure has been redrawn using orthogonal lines. In order to make the variants readable parts of the base text and part of V2 need to be copied into the other variants. If this copied text is defined only once and merely referred to in places where it needs to be copied, the original overlapping structure is preserved. The terms for the original and its ineditable copy suggested here are ‘parent’ and ‘clone’. To display the contents of a clone the computer need only follow the path to the clone’s parent. If the parent text changes, the text of the clone changes also. The reader should satisfy him or her self that the paths indicated by arrows in ‘B’ are the same as the paths in ‘A’, only drawn using orthogonal lines. The structure is still recursive because each parent or clone is a distinct entity, embedded in just one point of the structure. In this way it is possible to simulate the natural overlapping structure of variants, and render it computable.

But how would cloning work in practice? Would the user have to deliberately clone the necessary text? By requiring a discipline on the part of the user, inadvertent copies would surely creep in over time, particularly if the transcriber was inexperienced. The user should only enter the differences between one variant and the next; all the rest should be cloned automatically. Consider the following user action sequence in the proposed editor, which builds the same structure as in Fig. 3:

Fig. 4

The computer’s task here is simple: if the selection starts or ends in the middle of a variant, it creates a clone of the rest of the variant and extends the variant-set to accommodate it. However, what if the selection includes part or all of an existing clone? Although clones are not editable they are selectable. If only part of a clone is selected when a new variant is created then the program can simply split the parent into two halves and all the clones dependent on it.

One thorny problem, however, remains before cloning and variant-sets can become part of a real world editing solution: what if an expanded variant-set, containing possibly other nested and expanded variant-sets, has to break over a line?

4. The Demonstration Applet

In order to prove the viability of the concept, something that could actually draw nested variant-sets in their expanded form and successfully break them over multiple lines had to be built. It also had to be fast, since in the finished editor the user would be deleting or adding characters at high speed, and the line-breaks would have to be constantly reassessed. This was the crux of the problem. Everything else in the editor’s design was standard computing fare: selection, scrolling, stylesheets, mathematics etc. The design was simple: create an applet, publish it online, and provide it with a box where the user could enter his or her own text for testing, or change an example text, so that they could verify for themselves that it works. The applet has been completed and placed online at http://www.wittgen-cam.ac.uk/cgi-bin/vareddemo.html. It has a number of configuration options, most of which are self-explanatory, but the main one was the ability to display variants in ‘collapsed’ or ‘expanded’ mode:

Fig. 57

The user can edit the text of the example and re-run the applet, or choose another example text in several languages. Clones are shown in grey and fully supported by the toy language, which the applet uses for entering text. The collapsed views labelled ‘Version 1’ etc. represent, in the examples provided, the ‘anonymous’ versions of the manuscript – in other words those that aren’t assigned to a named version, as they would be in a multiple manuscript scenario. The problem with autographic manuscripts is that it is frequently difficult to attribute variants to a particular revision of the manuscript. If a variant has no connection in grammar or sense with another variant or correction later in the line, it is impossible to tell which preceded the other. A sequence can only be established in cases where a correction is itself corrected, or if a variant is added on top of another. In such manuscripts it is only in expanded mode that the reader is wholly free to make associations between unconnected variants; in collapsed mode only some of the possible readings of the text can be shown. In the multiple manuscript case the collapsed versions are more functional, and display the text of individual witnesses.

5. Characteristics of MSEditor

Having completed the experiments, it was time to create a practical tool, which, for the moment may be called MSEditor. Although it is a work in progress, the design and core of the program, which is well advanced, shall be described. The purpose of the program is to add editing and interactive capabilities to the limited facilities of the applet, and to allow the program to load or save in its own format, or to import from or export to other types of file. The loading or saving process is the same for all files, although a different module, or ‘filter’ is required for each format. So far MSEditor supports our Wittgenstein Edition format as input, as well as its own native XML format for loading and saving. The use of XML may surprise the reader given the statements of 1f. above, but there is no confusion. XML is a standard file format that expresses document structure in a textual form that is archivable, transportable and transformable, but that does not mean that the end user need ever see it.

Document Structure

Once loaded, an MSEditor document has a structure, containing a number of components8 arranged into a simple hierarchy. Every experienced software engineer knows that power comes from simplicity, and this has been a guiding principle in the design. At the top level of the document the following may occur:

     insertions, cancellations, transpositions, variant-sets, text

An MSEditor document consists of the natural structures of the variant network and ‘text’. In point 1c. above it was argued that there had to be a separation between the document contents and the variant structure. Rather than express this by using an overlapping hierarchy it seemed more practical simply to remove any unnecessary structures in the text, which might overlap with the variant network. Each of these components will now be described so that the reader knows what the capabilities of MSEditor will be, and what is meant by these terms.

Insertions are bits of text added above the line or in the margin, with or without an insertion marker.

Cancellations are bits of text crossed out or otherwise deleted.

Transpositions are bits of text, which are relocated by the scribe or author to a new position, or otherwise rearranged. Since the actual text can occur only once, a source and a destination must be specified. A transposition also has a set of versions, which controls where the text appears.

Variant-sets group together open variants, cancellations, insertions and portions of base text in cases where they form clear alternatives to one another. In works based on multiple manuscripts they group together variant readings from different witnesses; they may also record conjectures or corrections by the editor. The user may also denote one variant within a set as the preferred reading, to enable the program to display not only the text of each individual witness, but a selective version constructed by the editor.

Text is everything else with which the structures of the document can overlap. It may also contain other stuff such as formulae, pictures, notes, variables, line-breaks and paragraph-breaks, which will be defined below.

Sets

Variant-sets, transpositions, insertions and cancellations are examples of ‘versioned’ components, that is, those which belong to a set of versions. In this way MSEditor can distinguish the text of each manuscript or each layer of correction in an autograph. For example, you might be collating 120 manuscripts – a truly enormous task, but not out of the question. T.W. Allen collated 189 codices and 103 papyri for his edition of Homer’s Iliad in 1902 (there are many more papyri now). You would require at least one version for each manuscript. If there are corrections in any manuscript they can be assigned separate sigla, e.g. A1, A2 etc if the hand can be identified with certainty, or they can be left as anonymous corrected versions, recorded as simple insertions and cancellations. Either way, you will need a few extra versions to cover these cases, so altogether you might need 300 versions. This is best specified in advance for MSEditor, because of the way it represents sets. The bigger the number the more memory it will consume and the program might run a bit slower. There is, however, no limit.

In addition to these ‘documentary’ versions, each versioned component also has a set of ‘editorial’ versions. In constructing a text for publication an editor needs to choose one (or perhaps more than one) version for display or printing, selected from the set of available versions. Each editorial version is assigned a default documentary version, as if it were the ‘copy text,’ which will be displayed if no other is specified. Because editorial versions are also expressed as a set, any number of such selections can be defined, without disrupting the evidence of the documentary versions.

Unversioned Components

However, not all components are versioned. If text is contained by a versioned structure like a variant, it simply inherits that version; otherwise it belongs implicitly to all versions. Those that have this property include:

     runs, variables, paragraph-breaks, formulae, pictures, notes, line-breaks, parents, clones

Runs are simply sequences of text with all characteristics the same. A run also has a direction, which is either left-to-right or right-to-left, as in Arabic. Since the text of the run is UNICODE it may represent any language. Runs can also have named character formats such as ‘wavy underlined’. These refer to structural features of the manuscript text and are not end-result related.

Variables are things like manuscript page numbers or speakers’ names in a play, which might often appear in the margins of a printed text. They are usually genuine features of the manuscript, but they can also be markers of an external reference system, if desired, for example, references to printed volumes.

Paragraph-breaks, like lines, have named formats. Since the name distinguishes one paragraph from another, as in a word-processor that uses styles, specialised components for different paragraph types are not required. Paragraphs are only marked by breaks, as are pages and lines of poetry. The beauty of this design is that it virtually eliminates overlap between textual and versioning structures, which have priority. Since a separation between paragraphs is all that is computationally required, there is no need for any structure based on the XML concept of <start-tag>.... </end-tag>, which just invites overlap.

Mathematical formulae are recorded in either TEX or MathML form. In the initial version of MSEditor they will just appear as empty boxes, but eventually double-clicking on such a box will allow the user to edit the contents in a specialised editor (such as the TEX program). Formulae can, of course, contain corrections. It is intrinsically impossible, since mathematical formulae are really a kind of picture, to fully integrate them with the surrounding text. Most such corrections can be handled by breaking up the formulae into smaller pieces. In large display formulae with corrections this is naturally not possible without large-scale copying. A better solution is to allow variant structures to occur inside a formula, so that the content can be processed first to determine which parts are required by the current display version, and then to format them into a complete formula. Overlap of variants across the text/maths boundary may sometimes occur; it is, however, unlikely, given the different nature of their respective contents. The surrounding text may wrap around a formula if required, or more often they simply appear in-line.

Pictures are references to external files of various types. Text is allowed to wrap around them, as for formulae.

Notes are comments on the text. They may be intended for printing or not. They also have a context on the right or left or both, expressed as a number of words, and a set of versions to which their comment belongs. This will be followed in constructing the context. In the finished editor the user will be able to drag a pointer out into the text to set the context length. This mechanism is designed to avoid creating an overlap.

Line-breaks, unlike paragraph-breaks, are not associated with any named format.

Parents and clones are only available within the scope of the variant-set in which they are defined; otherwise the editor would have no hope of keeping the screen image up to date.

‘Opening out’ the Text

The only versioned feature not yet mentioned is the variant, the component of the variant-set. The TEI Guidelines also define a reading-group, which is just a set of variants closely related in some way, for example a group of orthographic variants (TEI 2001, §19.1.3). Since MSEditor does not impose an interpretation on the contents of a variant-set – it is just a mechanism for representing variants – this distinction is unnecessary. Variants are part of the ‘opening-out’ interface, which is based on the idea that the vastness of variant data can be only made sense of by staggering access to it in a graphical way. If the user clicks on a bit of text with a dotted underline the text opens out into an expanded variant-set. Double-clicking on the bracket closes it again.

Fig. 6

At any time the user can choose a different base-text by selecting it from a dropdown list, as in the applet described above. Then he or she can explore the text, starting from the new baseline as a point of reference. The user can individually open up a single variant-set to delve deeper into the text’s complexity and then collapse it all again. This goes some way to answering the frequent criticism that the pluralistic approach (recording all variants) results in ‘information overload’ (Greetham 1994, p.341; Vanhoutte 2000). What is needed is not to throw away the new method but to reformulate it as a user interface problem. Every abstraction, every tag that is exposed to the user is something that has to be remembered and kept track of. On the other hand, an element that communicates graphically requires no memorisation – its meaning is immediately apparent (Tognazzini 1992, p.136).

Formatting of Manuscipt Text

Although MSEditor is not a word-processor, there are many cases where manuscripts impose formatting on paragraphs and runs of characters. An example of a paragraph format would be a signature at the end of a letter, and an example of a character format would be underlining. These structures need to be presented to the user and they also need to be rendered somehow in the document’s final form, which is usually more complex. For this reason MSEditor adopts simple ‘cascading’ stylesheets in the editor and complex ‘XSL’ stylesheets for the final form. Just about every web page on the Internet these days uses cascading stylesheets, and editing one is fairly easy. Later the program will probably have a user-friendly front end for editing them.

7. ‘Free’ Software

MSEditor will be offered for general use under the terms of the GPL, the GNU General Public License, devised by Richard Stallman of the Free Software Foundation (FSF 2004). Stallman is quite right to say that proprietary software, whose source code is not freely available, is divisive and anti-social. Consider the general situation in humanities computing today. A large number of research projects have broadly similar software requirements. Proprietary software vendors see little point in developing solutions for this small community, and so each project often develops its own software. If one group’s software may be used by other groups they usually charge for it, and in any case they keep the source code secret. If you want to use it but it doesn’t quite do what you want, you may write to the program’s authors. They may not have the time or the inclination to help your project needs, and even if they do they will probably still charge you for the modified program, and sell your suggested modifications to others. So why should you help someone else profit from your work? This is the source of all the divisiveness, and the reason why so much of our research effort is wasted by not cooperating.

A free software project is run differently. The editors in charge are the interested parties who either started the project or who have proven themselves worthy contributors. If someone wants to contribute an idea it is submitted to a peer review process, much like the submission of an article to a journal. If it is accepted your idea becomes part of the shared software. You derive benefit from it and everyone else does too. Even if your idea is rejected you may make the modification yourself, because the source code is freely available. You may distribute the modified program to others who may also find it useful. A project using free software also has the advantage that it can utilise the vast corpus of existing free software without infringing copyright. For all these reasons the free software development model would appear ideally suited to the humanities computing field.

8. Conclusion

If it were possible, it would be a design goal of this manuscript editor to be of equal use to all textual critics, regardless of their preferred methodology; but even a screwdriver is no good at banging in nails. The nature of the tools and the medium in which they operate influence the methods of those who use them. David Greetham (1994, p.370) appears to miss this point when he remarks on the increasing role of computers in his work: ‘Removing a large part of the drudgery from traditional textual scholarship has served to highlight the special role of critical intervention in the most significant moments in the production of edited texts’. Fiormonte (2003, p.245) responds as if to these very words: ‘As we have observed, the instruments of software are not merely of “help” to the critic. It is not a question therefore of any “revision” or adaptation, but of a refoundation’. This paper started from an analysis of six critical problems in the handling of variants by modern philological software. By tackling them afresh it has solved some of them outright and in other cases has clearly indicated how they can be resolved as part of a coherent solution. As the self-styled guru of user-interface design, Bruce Tognazzini (1992, p.131) said: ‘Develop a simple, smooth design model, reflective of the needs of the user, not the limitations of the hardware or the difficulty of the coding process.’ What has been outlined in this paper is an attempt to follow that dictum, and to sketch a real user interface for the manuscript-editing problem. Eventually the future will come, and it will look something like this. The irony is that the tools to create it are already here, in the present. All that remains to do is to make it happen now.

9. Notes

1. No method purporting to get around this problem actually does so. The ‘double-endpoint attachment’ method in TEI (2001) §19.2.2 misuses XML attributes to create, in effect, overlapping elements. There are four problems with this: (a) there is no constraint on the location of the targets of the ‘from’ and ‘to’ attributes, one of which may reside inside an element and the other outside – a situation that would create a variant containing an unmatched start or end tag; (b) in this method it is, incorrectly, the <app> structure, i.e. sets of variants, that overlap, not the variants themselves; (c) the avoidance of copying from the base text into the variants does not prevent copying between variants; and (d) it is simply not possible to specify a variant that spans the baseline and another variant within the <app> structure without including unmatched start- and end-tags from the <app> structure itself, as well as copies of any intervening variants. The variant mechanism in CTE is similar and suffers from the same fault as (c).

2. A problem similar to that pointed out by Smith (1999): transposition of words between lines in a play.

3. e.g. Henrik Ibsen Hærmændene på Helgeland, NKS 3119 4to, p.27, Olaf Liljekrans Ms 8vo 1945, p.2. http://www.dokpro.uio.no/litteratur/ibsen/ms/skuesp.html (accessed 26th June 2004). See also Aeschylus Ag. lines 570-574, Ch. 227-230, 275-277.

4. This is demonstrated by the recent formation of a TEI Manuscript Special Interest Group to improve the ‘Critical Apparatus’ part of the Guidelines. See http://www.tei-c.org/Activities/SIG/Manuscript/mssigr01.html (accessed 26th June 2004). Edward Vanhoutte: ‘the encoding strategies suggested by the TEI for critical apparatus are not suitable for the encoding of variants present in a single text’. See also http://www.iath.virginia.edu/ach-allc.99/proceedings/graver.html (accessed 26th June 2004): Bruce Graver faced with a TEI encoding of Wordworth’s Lyrical Ballads: ‘the differences between a 15th century manuscript, drawn up by a professional scribe long after the author’s death, and a printer’s manuscript, drawn up by the authors themselves, are enormous, and it soon became clear that Robinson’s model would be of little use’.

5. This is the approach of the widely-used TEI Lite: http://www.tei-c.org.uk/Lite/ (accessed 26th June 2004) and the ‘Track changes’ feature in Microsoft Word.

6. Renear, Mylonas and Durand (1993) admit that variant-readings are a counter example to their refined OHCO model of overlapping hierarchies. See also Durusau and O’Donnell (2001) and LMNL: http://www.lmnl.net/ (accessed 26/6/04).

7. Insertions and cancellations are normally shown in blue and red respectively, which are not visible in this black and white version.

8. The term ‘component’ is used deliberately to avoid the XML term ‘element’. The ‘components’ described do not necessarily map directly to elements in the external XML format. Instead, they represent fundamental structures within the editor.

10. References

Chomsky, N. (1965). Aspects of the Theory of Syntax, (Cambridge, Massachusetts: MIT Press).

Church, A. (1936). An unsolvable problem of elementary number theory, American Journal of Mathematics, 58 (1936), pp.345-363.

Durusau, P., O’Donnell, M.B. (2001). Implementing Concurrent Markup in XML: http://www.sbl-site2.org/Extreme2001/Concur.html (accessed 26/6/04).

FSF (2004). The GNU General Public License: http://www.gnu.org/licenses/licenses.html#GPL (accessed 26th June 2004).

Fiormonte, D. (2003). Scrittura e filologia nell’era digitale. (Turin: Bollati Boringhieri).

Goldfarb, C.F. (1999). ‘SGML: A Personal Recollection’, in Technical Communication, available at http://www.sgmlsource.com/history/roots.htm (accessed 26th June 2004).

Greetham, D.C. (1994). Textual Scholarship An Introduction (New York: Garland).

Hagel, S. (2004). CTE (Classical Text Editor): http://www.oeaw.ac.at/kvk/cte/ (accessed 26th June 2004).

Kleene, S.C. (1967). Mathematical Logic. (New York: Wiley) p.232.

McGann, J. (1985). Critique of Modern Textual Criticism (Chicago: Univ. of Chicago Pr.).

Naur, P. (1960). Revised Report on the Algorithmic Language ALGOL 60, Communications of the ACM, 3.5, pp.299-314.

Nedo, M. (1993). Ludwig Wittgenstein, Wiener Ausgabe, Einführungsband (Vienna: Springer).

Ott, W. (1990). Mehr als Kollationshilfe: Automatischer Textvergleich als Editionswerkzeug, in Albert Heinekamp et al. (eds), Mathesis Rationis. Festschrift für Heinrich Schepers. (Münster: Nodus Publikationen), pp.349-372.

Renear, A., Mylonas, E. and Durand, D. (1993). Refining our Notion of What Text Really is: The Problem of Overlapping Hierarchies. http://www.stg.brown.edu/resources/stg/monographs/ohco.html (accessed 26th June 2004).

Robinson, P. (1994). Collate 2: A User Guide, (Oxford: Oxford University Computing Services).

Sedgewick, R. (1988). Algorithms, 2nd edition (Reading, Massachusetts: Addison-Wesley).

Smith, D. (1999). Textual Variation and Version Control in the TEI, Computers and the Humanities. 33 (1999) 103-112.

TEI (2001). Bauman, S., Burnard, L., DeRose, S., Rahtz, S. (eds) Guidelines for Electronic Text Encoding and Interchange: XML-compatible edition. http://www.tei-c.org/P4X/ (accessed 26th June 2004).

Tognazzini, B. (1992). Tog on Interface (Reading, Massachusetts: Addison-Wesley).

Turing, A.M. (1936). On Computable Numbers, with an application to the Entscheidungsproblem, Proc. Lond. Math. Soc. (2) 42 pp.230-265.

Vanhoutte, E. (2000). Where is the Editor? Resistence in the Creation of an Electronic Critical Edition in Deegan, M., Anderson, J. and Short, H. (eds) Selected Papers from the Digital Resources for the Humanities Conference (University of Glasgow, September 1998) (London: Office for Humanities Communication), pp.171-83.