The Generation of Gothic Text.
School of Computer Science,
University of Birmingham,
Edgbaston, Birmingham B15 2TT, UK.
Paper given at the 1994 AHC Conference at Nijmegen
The subject of this paper is intended not as an end in itself, but rather as part of a larger project, namely the recognition of handwritten text. The current study involves a discussion of the methods of representing and generating text and shows how the gothic hand in particular can be represented by a subset of the available methods. Since some of the requirements are imposed by the needs of the recognition process, it is necessary to place this stage in the context of the project as a whole. The rationale is thus. Despite many years of work on handwriting recognition, human readers remain more successful than any existing computer system when it comes to deciphering poor quality handwriting. It therefore seems reasonable to attempt a method of recognition which is closer to that used by the human reader. Consider what happens when you come to read a difficult passage in a letter from a friend or relative. Knowledge of the context restricts the solution to a comparatively small number of possibilities. You will have a mental image of the appearance of each of these possibilities in the handwriting of the person concerned and will choose the solution which is the best fit to the scrawl on the paper in front of you.
Now consider how this approach may be copied in computer terms. For the context information, it is proposed that lists of words and phrases for each type of manuscript document and possibly, in some cases, information on the structure of the document be provided. Instead of the mental image of the phrases in the appropriate handwriting, the intended system will have datafiles describing a wide variety of hands and will use these to generate images of the required phrases in the required hand. The present paper, discussing the representation and generation of gothic handwriting, is the first stage in the production of a prototype system. It is assumed that the manuscripts will be in the form of images, probably in a database of document images in some archive, but software will be needed to pre-process these images into an edited sequence of word-images. The final stage of the process will be a comparison of methods of comparing word-images to determine whether or not they represent the same word. When complete, this will provide an interactive system to help researchers obtain a text version of a manuscript document. In theory, there is no restriction on the language or alphabet used within such a system, although at any time only those manuscripts for which the appropriate datafiles exist can be processed. For others, it will first be necessary to provide datafiles describing the hand and the word-lists.
For such inter-disciplinary studies, it is desirable to call on experience from both computer science and history. In the area of computer science, it is relevant to consider the experience of computer-generated fonts. The most useful contribution is a paper by Knuth (Knuth 1979) in which he discussed the mathematical theory underlying the generation of "beautiful fonts". His historical introduction, starting with Feliciano in 1460, discussed the use of ruler-and-compass methods to define the ideal forms of the various letters and pointed out that all actual implementations included sections of free-hand curves joining various sections of the straight lines and circular arcs making up the letters. From this he concluded, quite correctly, that circles and straight lines are not sufficient to produce aesthetically pleasing shapes for all the letters. He postulated a set of six conditions which would define what he called the "most pleasing curves", but concluded after analysis that they were too restrictive for a solution to exist and so must be relaxed. With this adjustment, he was able to show that a piece-wise smooth cubic curve interpolating the points defining the curve would give the required solution.
Knuth's paper included two different ways of defining the letter. In the one case, the curves defined the outline of the letter, which was then filled in with the desired ink-colour (usually black), thus giving the shape. The second method used a "pen" (which could be circular, elliptical or rectangular) whose centre moved along a pen-path which was a piece-wise smooth cubic. The pen could vary its size and orientation as it moved along the path and the resulting envelope generated the letter. These methods have been developed into the Metafont system, which is widely used for the generation of computer fonts and allows great flexibility in their design. Were the generation of gothic fonts the sole aim of this work, the Metafont system would be quite adequate. However the long-term aim of this study is the generation of word-images which can be used for the recognition of actual manuscripts and this requires a slightly different solution.
On the one hand, the Metafont system is too flexible. Like many CAD packages, it allows more than one method of reaching the same result. This causes problems in classification because different values of the various parameters can correspond to the same output. On the other hand, the extension to describe cursive handwriting is not obvious and this will be essential for the intended system. It is not surprising that the Metafont system is not immediately applicable to this problem - it was never intended as a means of duplicating past letters, whether printed or handwritten. The sole aim was to make available the resources of modern technology and use them to the full to produce pleasing fonts for use with computer output. It achieves this aim excellently and also provides useful insights for other related problems.
Simulating the Handwriting Process.
Unlike Knuth, this work does attempt to copy the letter shapes produced by scribes who wrote the manuscripts. Consequently it is necessary to analyse the tools used and how these may best be imitated in computer terms. The early manuscripts were produced using quill pens, which were manufactured as needed. The quill has a circular cross-section and a hollow centre. First it is cut at an angle, giving a very elongated ellipse. Then the end is cut off (at right angles to the major axis of the ellipse) and finally the nib is split. When the quill is dipped in the ink, the hollow centre fills with ink and stores enough for one stroke of the pen.
If the nib is placed in contact with the paper and moved across it with the whole of the flat end of the nib in contact with the paper, a stroke of constant width is created. A little more pressure will cause the ends of the nib to splay apart and give a slightly wider section of the stroke. Still more pressure, and the ink no longer fills the gap between the two parts of the nib and this is usually avoided. Alternatively the quill may be held so that only the corner of the nib is in contact with the paper and this produces the very thin, curved flourishes which also appear in gothic letters. These three examples are the only possible widths produced by the quill pen, which is very different from the wide variety of sizes and shapes offered in the Metafont system. The metal nib used by the modern calligrapher has been carefully designed to simulate the quill pen and so produces the same three choices of width.
Each letter is made up of a number of strokes. For each stroke, the pen is loaded with ink, placed on the paper at a given angle and with a given pressure (or pen-width), moved across the paper along a pre-defined path and then raised from the paper at the end of the stroke. The length of stroke is limited by the amount of ink available without re-filling the quill, and when studying the gothic hand, there are many examples where the end of one stroke coincides with the start of the next. Since ink-supply is no problem within the computer simulation, these strokes can all be combined into one in the computer representation. A study of examples of gothic manuscripts (e.g. in Friedman 1993 or Hector 1966) suggests that most of the strokes consist of a sequence of straight lines made with a constant pen-width.
The Bock Manuscript.
One of the important sources for this work is a style sheet produced by the writing master Gregorius Bock and dated to around 1510. This shows the stroke-by-stroke build up of each of the letters of the alphabet and so gives an important insight into the way the script was produced. Figure 1 shows an example of the letter "a" from this sheet, greatly enlarged and indicates the six strokes which are used to build up the letter. (I am indebted to John Friedman for these images, which are discussed more fully in (Friedman 1993).)
Figure 2 shows the same letter "a" from the Bock manuscript with strokes and pen-path superimposed and gives an analysis of pen-width, pen-angle and pen-path for each stroke. Since the end-point of stroke four exactly corresponds to the starting point of stroke five, these two strokes have been combined in the computer representation.
The pen-path for stroke one may be represented by three data-points, joined by straight lines. The nib is held at a constant angle of about 35o to the horizontal and most of the stroke has a constant pen-width. However the first data-point corresponds to a wider than normal width, due to greater pressure. This changes to the constant pen-width by the second data-point.
Stroke two has the same angle and a constant pen-width and the pen-path may be represented by three data points, joined by straight lines. From the second to the third point, the pen is moved sideways at the angle of the nib, thus resulting in a thin line.
Stroke three is an example of a fine, curved flourish made with the corner of the nib. The width is zero and so the angle is irrelevant. The path may either be represented by a large number of data points joined by straight lines or by a much smaller number of data points joined by cubics.
Stroke four combines strokes four and five of the original letter and is again of constant pen-width and pen-angle and the path consists of three data points joined by straight lines. Stroke five is very similar to stroke two. These five strokes make up the representation of this letter and all the others may be analysed in a similar way.
The software to generate versions of these letters uses a number of approximations, ordered in degree of complexity.
This is the simplest approximation. It assumes that the pen-path may be represented by a sequence of straight lines (polyline) and that the nib is a straight line of constant width held at a constant angle to the horizontal. The pen-path is assumed to be the path of the centre of the nib and the letter is generated by drawing a line of length 2w at an angle A to the horizontal through each successive point along the pen-path. Stroke one of the above example is the only part which could not be generated by this approximation, although the number of data-points for stroke three may be very large.
The data required to generate this output are values of pen-width, pen-angle and number of data points for the pen path, followed by (x,y) coordinates for each of these points. The data is obtained by printing a greatly enlarged image of the letter on to graph paper and the values read from this. By magnifying the image in this way, the relative size of any error is decreased in the final data file. The data must then be normalised in some way, and the most convenient method is probably to scale so that the nib-width is equal to 1.0. (It will still be necessary to distinguish between the "wide" strokes with width=1.0 and the "narrow" ones with width=0.0. This normalisation will merely take whatever value in mm corresponds to the average nib-width and divide all measurements in mm by this value to obtain a data-set which is independent of the magnification of the image). Then a parameter "size" may be used to scale the generated image before it is produced on either screen or printer.
This differs from approximation 1 in only one respect. The width of the nib is allowed to vary by a small factor (typically between 1.0 and 1.3) as the pen moves along the stroke. For example, when representing the letter a from the Bock alphabet, the first stroke starts with a wider mark (corresponding to a greater pressure on the nib) and then reduces to the standard width. When simulating this, the first two data points are chosen so that the factor changes steadily from the maximum of 1.3 to the minimum of 1.0 as the centre of the pen nib moves from the first data point to the second. This allows a more accurate representation of the letter. This could generate accurate output for the above example, but stroke three would still require a large number of data points for an acceptable representation.
This will require values of "factor" for each data-point of each stroke in addition to the data used for approximation 1. If the value of factor is the same at each end of the stroke, then no variation in width of stroke will be needed. If factor is different, then the length of each line forming the stroke will have to be calculated, to give a linear change in stroke width as the pen-path moves from one end of the segment to the other.
The pen-path is represented by a sequence of piecewise smooth cubic curves (polycube). For gothic hands, this is only required for the thin flourishes and so the pen width remains constant and very narrow and the angle is not relevant.
To allow a polycube representation when the width is zero requires additional data to be recorded, but for fewer data-points along the pen-path. The pen-width of zero and the number of points must still be recorded. Then for each data-point, it will be necessary to record both x and y coordinates and the slope of the curve at this point. With these three values at each end of the segment, the cubic joining them is fully defined. This also guarantees continuity of slope from one cubic to the next. The form of interpolation is determined when the cubics are calculated by fitting them to the curve in the image and there are two main possibilities here.
(a) x,y and slope are measured from the magnified image of the letter.
(b) a large number of (x,y) coordinates are measured from the image and them some form of curve fitting is carried out to obtain a smaller number of points with cubics joining them. For example, it could be decided that the curve shall be represented by six cubics and the "knot-points" at which these are joined shall be equally spaced along the curve. Then the coefficients of the cubics are chosen so that they pass through the knot points and provide the "best-fit" to the other data points within this interval. Least-squares and minimax are possible methods of obtaining this best-fitting cubic, and they will not usually give identical results. (Most numerical analysis texbooks discuss such curve fitting e.g. a brief account in (Williams 1972) and a more detailed one in (Ralston 1978).) Once the best-fitting cubic has been calculated, its equation can be used to calculate the gradients at each knot point, and this value can be stored in the data file. The equations of the cubics can always be calculated from these six values, and so the fitting need only be carried out once.
For the gothic hands, these three approximations are sufficient to produce a good approximation. The fourth and most complex approximation is only needed when these methods are extended to deal with cursive hands.
This is included for completeness. The nib is assumed to move across the paper approximately at right-angles to the direction of the pen-path, but slight variations of angle are expected. Consequently the pen-angle is defined as the angle between the nib and the tangent to the pen-path, and not the angle between the nib and the horizontal. The three possibilities of pen-width still apply - the narrow lines are still produced by the corner of the nib and the wide ones by the whole of the nib in contact with the paper. The factor, indicating greater width corresponding to greater pressure on the pen, may still vary along the pen-path. The pen-path now uses a polycube representation and so needs values of x, y and slope at each of the data points. The stroke for a cursive hand may now include one or more letters or parts of a letter and will require an extended definition of the stroke.
In the previous sections, the stroke covered a very short distance and ran from the position when the pen was lowered into contact with the paper (pen-down) to the position when it was lifted clear of the paper (pen-up). These two conditions to determine the start or end points of a stroke will still apply. However with the cursive script it will be necessary to include a third condition, namely a point at which the direction of the pen-path reverses. With a cursive script, many strokes will include several letters and in general the length of the stroke will be greater than the examples given for the gothic script. The segments joining letters will not be included in the alphabet for that script, but will be an additional cubic curve connecting the start and end points of the letters concerned and so will differ for different letter pairs. At a reversal point, the pen comes to a halt and then starts up again in the opposite direction and this is not the curve generated by a cubic joining two superimposed points with gradients in opposite directions. Thus the polycube representation will only correspond to the original handwriting if a reversal point corresponds to the end of one stroke and the start of another.
The next stage of the work will extend these methods to cursive hands and will involve the implementation of this approximation to generate handwritten text.
Friedman, John F. (1993) 'Computerized Script Analysis and Classification: Some Directions for Research' in Optical Character Recognition in the Historical Discipline Halbgraue series A18 (Gottingen).
Hector, L.C. (1966) The Handwriting of English Documents, second edition, Edward Arnold (London).
Knuth, Donald E. (1979) 'Mathematical Typography' Bulletin (new Series) Amer. Math. Soc. 1(2) 337-372.
Ralston, A. & Rabinowiz, P. (1978) A First Course in Numerical Analysis, second edition, McGraw Hill (New York) chapters 6 & 7.
Williams, P.W. (1972) Numerical Computation, Nelson (London) chapter 7.