|
This page last changed on May 03, 2008 by martinmueller@northwestern.edu.
I spent a little time with the TCP, NCF, and Wright archives, looking for rend attributes, and thinking about whether or how they need to be recognized in the data store and what kinds of decisions need to be made about displaying texts.
In the TCP texts there are hardly any rend attributes. In the 650 parsing files, there are 106 instances of "rend=marginal quotes," which refers to an old way of identifying quoted materials through checks in the margin.
At the end of this memo you find a list of rend attributes with their counts and sorted in descending order.
I take it that the general principle governing text display in MONK is to articulate structurally significant changes in a manner that is visually pleasing and consistent across the entire MONK universe. But there is no obligation to reproduce the particular display conventions of the source texts.
The lists at the end are instructive in terms of their distribution. A few specifications make up the great bulk of phenomena. There are a fair number of low-frequency phenomena. Some of them are mistakes (the Wright encoders had trouble spelling the word 'indent' consistently).
If you look at the data a little closely, correct obvious errors, and identify rare phenomena that can be merged into more common phenomena, it becomes pretty clear that a few hours spent with each collection can produce satisfactory and consistent solutions across the entire MONK space.
The <l> element is a good example. You find different notations in NCF and Wright for marking different types of indentation. It is a nice question whether such indentations are typographical or structural features. They give a shape to stanzas that may draw attention to prosodic features of one kind or another. But those distinctions are not analytically recoverable.
There are approximately 25,000 lines of verse each in the 250 NCF texts and the 300 Wright texts we're going to use. In the TCP texts we find 663,000 <l> elements, but none of them has any further specification. The conclusion I draw from this is to ignore the rend attributes of <l> elements in the Wright and NCF texts and represent all verse as indented in the same fashion. There is no analytical loss in this, it creates visual consistency, and it makes the maintenance of style sheets somewhat simpler.
If we treat other elements in the same way, life will be simpler. The situation is somewhat similar with regard to "type" attributes. Here too a handful of types account for 95% or more of occurrences. The rest can be merged with them or ignored.
All this adds up to a very Spartan regime. I would rather err on the "Spartan" side the first time round. If it works, it's proof that from an analytical perspective less is more. If we overshoot the target, others can later decide to let more flowers bloom.
But it may be a useful to proceed on the maxim that if something has no analytical significance it should have no typographical recognition.
Rend attributes found in 1,000 Wright texts
l rend=indent(1) part=N 25475
l rend=indent(2) part=N 1929
l rend=indent(3) part=N 869
l rend=indent(4) part=N 455
l rend=indent(l) part=N 218
l rend=intend(1) part=N 19
l rend=indent (1) part=N 15
l rend=indend(1) part=N 14
l rend=intent(1) part=N 14
head rend=font(gothic) 13
l rend=indent(1] part=N 12
l rend=indent(5) part=N 12
l rend=ident(2) part=N 8
l rend=indent part=N 6
l rend=(1) part=N 4
l rend=ident(1) part=N 4
l rend=intend(3) part=N 4
emph rend=fsc 3
l rend=indent(1 part=N 3
l rend=indent=(1) part=N 3
l rend=intend=(2) part=N 3
sic rend=font(italics) corr=protégé 3
l rend=indent(1)= part=N 2
l rend=indent(6) part=N 2
l rend=indetn(2) part=N 2
l rend=intend(2) part=N 2
l rend=intend=(1) part=N 2
sic rend=font(italics) corr=june 2
emph rend=italics 1
head rend=hi type=sub 1
hi rend=font(fscaps) 1
hi rend=font(gothic) 1
l rend= part=N 1
l rend=ident(4) part=N 1
l rend=idnent(2) part=N 1
l rend=in dent(1) part=N 1
l rend=inden(1) part=N 1
l rend=indend(4) part=N 1
l rend=indenmt(1) part=N 1
l rend=indent (2) part=N 1
l rend=indent part=N 1
l rend=indent(2) part=i 1
l rend=indent(3 part=N 1
l rend=indent part=N 1
l rend=indent=(2) part=N 1
l rend=index(1) part=N 1
l rend=intend=(3) part=N 1
l rend=intent(2) part=N 1
l rend=intent(3) part=N 1
l rend=intent=(3) part=N 1
milestone rend= unit=typography 1
orig rend=font(italic) reg=prescribed 1
orig rend=font(italics) reg=address 1
orig rend=font(italics) reg=americanized 1
orig rend=font(italics) reg=bambino 1
orig rend=font(italics) reg=circumstances 1
orig rend=font(italics) reg=continually 1
orig rend=font(italics) reg=darkness 1
orig rend=font(italics) reg=diversities 1
orig rend=font(italics) reg=entirely 1
orig rend=font(italics) reg=express 1
orig rend=font(italics) reg=extravagant 1
orig rend=font(italics) reg=gentlemen 1
orig rend=font(italics) reg=grateful 1
orig rend=font(italics) reg=infants 1
orig rend=font(italics) reg=married 1
orig rend=font(italics) reg=myself 1
orig rend=font(italics) reg=necessaries 1
orig rend=font(italics) reg=overwise 1
orig rend=font(italics) reg=particular 1
orig rend=font(italics) reg=processes 1
orig rend=font(italics) reg=publicly 1
orig rend=font(italics) reg=servants 1
orig rend=font(italics) reg=suppose 1
orig rend=font(italics) reg=surprise 1
orig rend=font(italics) reg=vengeance 1
orig rend=font(italics) reg=worship 1
orig rend=italics reg=alabama 1
orig lang=fra rend=font(italics) reg=boudoir 1
orig lang=fra rend=font(italics) reg=tendresse 1
sic rend=aunt 1
sic rend=font(italics) corr=a la bonne heure 1
sic rend=font(italics) corr=account 1
sic rend=font(italics) corr=au contraire 1
sic rend=font(italics) corr=bernous 1
sic rend=font(italics) corr=blasé 1
sic rend=font(italics) corr=c'est assez 1
sic rend=font(italics) corr=carafe 1
sic rend=font(italics) corr=connaitre 1
sic rend=font(italics) corr=cordon-bleu 1
sic rend=font(italics) corr=could 1
sic rend=font(italics) corr=forever 1
sic rend=font(italics) corr=habitué 1
sic rend=font(italics) corr=hierarchy 1
sic rend=font(italics) corr=insouciance 1
sic rend=font(italics) corr=insouciante 1
sic rend=font(italics) corr=irremediable 1
sic rend=font(italics) corr=les convenances 1
sic rend=font(italics) corr=mariages de convenance 1
sic rend=font(italics) corr=necessities 1
sic rend=font(italics) corr=negligé 1
sic rend=font(italics) corr=obsessed 1
sic rend=font(italics) corr=qu'elle 1
sic rend=font(italics) corr=spell-bound 1
sic rend=font(italics) corr=we 1
titlePart rend=font(gothic) type=sub 1
Rend attribute found in NCF
hi rend=i(1) 64480
hi rend=sc(1) 17036 (most of these cases are emphatic renderings of the first letter in a new chapter and should be got rid of)
l part=N rend=indent(1) 3972
p rend=align(r) 3101 (most of these are signatures in letters and similar things)
hi rend=small(1) 960 (this is used a lot for small print in letters and is typographically rather than structurall coded)
p rend=align(c) 915 (this is also used mainly in openers and closers of letters)
hi rend=sup(1) 830 (this can be replaced by <sup>
l part=N rend=align(r) 545
l part=N rend=indent(2) 204
label rend=speaker 133 (changed)
l part=N rend=align(c) 128
hi rend=sub(2) 125 (An odd bundle of uses. Often as the styling of a head element. It may be best to remove or ignore it altogether)
hi rend=b(1) 94
hi rend=sup(2) 48 ( this occurs in seven files and is no different from sup(1) in its use)
l part=N rend=indent(3) 22
closer rend=align(r) 11
closer rend=align(c) 10
hi rend=i(2) 9
head rend=speaker 7 (changed)
hi rend=roman(2) 7 (roman)
hi rend=sc(2) 7
hi rend=sub(1) 6
cell cols=80 n=2 rend=indent(110) role=data rows=1 5
cell cols=80 n=5 rend=indent(395) role=data rows=1 5
cell cols=100 n=2 rend=indent(130) role=data rows=1 5
cell cols=100 n=3 rend=indent(245) role=data rows=1 5
cell cols=100 n=4 rend=indent(360) role=data rows=1 5
cell cols=50 n=2 rend=indent(180) role=data rows=1 4
cell cols=50 n=3 rend=indent(245) role=data rows=1 4
cell cols=50 n=4 rend=indent(310) role=data rows=1 4
cell cols=50 n=5 rend=indent(375) role=data rows=1 4
cell cols=50 n=6 rend=indent(440) role=data rows=1 4
cell cols=50 n=7 rend=indent(505) role=data rows=1 4
cell cols=50 n=8 rend=indent(570) role=data rows=1 4
cell cols=80 n=1 rend=indent(15) role=data rows=1 4
cell cols=80 n=3 rend=indent(205) role=data rows=1 4
cell cols=80 n=4 rend=indent(300) role=data rows=1 4
cell cols=80 n=6 rend=indent(490) role=data rows=1 4
cell cols=100 n=1 rend=indent(15) role=data rows=1 4
cell cols=150 n=1 rend=indent(15) role=data rows=1 4
hi rend=italics 4
cell cols=100 n=5 rend=indent(475) role=data rows=1 3
head rend=caption - pb 3
quote rend=align(c) 3 (changed)
cell cols=100 n=5 rend=indent(475) role=data rows=1 2
head rend=align(c) 2
label rend=caption - pb 2 (changed to head)
opener rend=align(c) 2
cell cols=80 n=1 rend=indent(15) role=data rows=1 1
cell cols=80 n=3 rend=indent(205) role=data rows=1 1
cell cols=80 n=4 rend=indent(300) role=data rows=1 1
cell cols=80 n=6 rend=indent(490) role=data rows=1 1
cell cols=100 n=1 rend=indent(15) role=data rows=1 1
head rend=align(r) 1
hi rend=roman(1) 1 (changed)
hi rend=small(2) 1
l part=N rend=i(1) 1
l part=N rend=indent(5) 1
opener rend=align(r) 1
p rend=indent(1) 1
quote rend=caption - div 1
quote rend=i(1) 1 (changed)
salute rend=align(r) 1
signed rend=align(c) 1
Line and linegroup elements found in 1,000 Wright texts
l part=N TEIform=l 79851
l rend=indent(1) part=N TEIform=l 25475
lg type=quotation part=N TEIform=lg 9705
lg type=stanza part=N TEIform=lg 9593
lb TEIform=lb/ 2160
l rend=indent(2) part=N TEIform=l 1929
l rend=indent(3) part=N TEIform=l 869
lg type=para part=N TEIform=lg 785
l rend=indent(4) part=N TEIform=l 455
l rend=indent(l) part=N TEIform=l 218
l part=f TEIform=l 157
lg type=speech part=N TEIform=lg 118
l part=i TEIform=l 90
l rend=intend(1) part=N TEIform=l 19
l rend=indent (1) part=N TEIform=l 15
l rend=indend(1) part=N TEIform=l 14
l rend=intent(1) part=N TEIform=l 14
l rend=indent(1] part=N TEIform=l 12
l rend=indent(5) part=N TEIform=l 12
lg type=sub part=N TEIform=lg 11
l rend=ident(2) part=N TEIform=l 8
lg type=chapter part=N TEIform=lg 7
l rend=indent part=N TEIform=l 6
l rend=(1) part=N TEIform=l 4
l rend=ident(1) part=N TEIform=l 4
l rend=intend(3) part=N TEIform=l 4
lg lang=lat type=quotation part=N TEIform=lg 4
l rend=indent(1 part=N TEIform=l 3
l rend=indent=(1) part=N TEIform=l 3
l rend=intend=(2) part=N TEIform=l 3
lg type=poem part=N TEIform=lg 3
lg type=quotatoin part=N TEIform=lg 3
l part=m TEIform=l 2
l rend=indent(1)= part=N TEIform=l 2
l rend=indent(6) part=N TEIform=l 2
l rend=indetn(2) part=N TEIform=l 2
l rend=intend(2) part=N TEIform=l 2
l rend=intend=(1) part=N TEIform=l 2
lg type=section part=N TEIform=lg 2
l n=1 part=N TEIform=l 1
l rend= part=N TEIform=l 1
l rend=ident(4) part=N TEIform=l 1
l rend=idnent(2) part=N TEIform=l 1
l rend=in dent(1) part=N TEIform=l 1
l rend=inden(1) part=N TEIform=l 1
l rend=indend(4) part=N TEIform=l 1
l rend=indenmt(1) part=N TEIform=l 1
l rend=indent (2) part=N TEIform=l 1
l rend=indent part=N TEIform=l 1
l rend=indent(2) part=i TEIform=l 1
l rend=indent(3 part=N TEIform=l 1
l rend=indent part=N TEIform=l 1
l rend=indent=(2) part=N TEIform=l 1
l rend=index(1) part=N TEIform=l 1
l rend=intend=(3) part=N TEIform=l 1
l rend=intent(2) part=N TEIform=l 1
l rend=intent(3) part=N TEIform=l 1
l rend=intent=(3) part=N TEIform=l 1
lg type=epigraph part=N TEIform=lg 1
lg type=paragraph part=N TEIform=lg 1
lg type=parar part=N TEIform=lg 1
lg type=stana part=N TEIform=lg 1
lg type=stanz part=N TEIform=lg 1
Line elements in NCF
l part=N 18186
l part=N rend=indent(1) 3972
l part=N rend=align(r) 545
l part=N rend=indent(2) 204
l part=N rend=align(c) 128
l part=N rend=indent(3) 22
l part=N rend=i(1) 1
l part=N rend=indent(5) 1
|