This page last changed on Feb 23, 2008 by martinmueller@northwestern.edu.

At the end of this memo there is a crude grouping of the elements in
TEI-A. I count 124. The final list may be a little shorter or longer,
but I will be very surprised if additions or deletions make much
difference to that list.

The P5 documentation at http://www.tei-c.org/Guidelines/P5/ is very
full and very good, allowing both for discursive introductions and
look-ups by individual elements or the 'modules' into which the
elements have been grouped. If one knows the rough number of elements
in play, this is the place to get a sense of how elements interact
hierarchically. There is a lot of parameterization in the TEI. Very
difficult for ordinary users, but probably very helpful to programmers.

While the number of elements in TEI-A will be ~120, the number of
elements that bear on the question of 'mid-level' data is much
smaller, roughly two dozen. The large majority of elements consists
of elements that are used only in the header or inline elements of a
bibliographical, editorial, or typographical kind.

The elements that contain the text blocks potentially of interest to
analytical operations are usefully divided into the levels
1) above the div, 2)at the div, 3) below the div. It is below the div
where the shoe pinches. The following remarks are not intended to
solve problems but to point to the problems that need to be solved
first. I count 23 children of div, not all of them of equal status,
and some better behaved than others.

The two most important direct children of div are <p> and<l>. They do
the heavy lifting for prose and verse and are mixed elements in that
they can contain both PC data and elements.

Three other children, <q>, <quote>, and <said> may be children of
<div> or children of <p>. They also are mixed content elements.

The three children <lg>, <sp> and <floatingText> cannot contain PC
data, but only element children. <floatingText> differs from the other
three in that its children must travel up to the level above the div.

My hunch is that the question of midlevel data is first and foremost a
question of what to do with these eight children of <div>. They are
an unruly and devious lot. A <p> element cannot contain a <p>
directly, but it can contain a <quote> or <q> element that may contain
a <p> element. It may contain a <floatingText> element so that <body>,
<div>, and <p> elements may be granddhildren or great-grandchildren of
a <p> element. These cases may not be very common, but they do happen.

The <floatingText> element in particular will require attention. It
will in most cases appear as a direct child of a <div>, but its
immediate children will be above the <div> level. Most instances of
<floatingText> will be of the type="letter" kind.

Two other non-identical twin children of <div> are <stage> and <note>.
Both of them always contain text that is in some way secondary. A
stage direction is a kind of note. It is in the nature of these
children that they pop up at different levels of the text hierarchy.
But they can probably be treated pretty much in the same way.

I count 13 div children that may appear only at the begining or end of
divs. The nice thing about these divTop, divBottom, and divWrapper
children is that by and large they know their place.

In practice, then, the question of midlevel is mostly a question about
the relationship of roughly two dozen elements in the TEI-A files to
their representation in the data store and to the kinds of things that
users might want to do with them via analytics of various kinds. Two
dozen children is quite a few children.

There are also lists and tables. These are not going to be very
common, and in most cases the <item> or <cell> elements will contain
simple text. While the TEI content model for both these elements is
quite complex, I doubt whether any of these complexities show up in
the texts we have to deal with.

Below is the provisional list of elements for TEI-A, as I understand
it from conversations with Brian and Steve:

DEFAULT TEXT STRUCTURE: (8)

teiCorpus, TEI, teiHeader, text
front, body, back, group

ELEMENTS THAT APPEAR ONLY IN THE HEADER (24)

encodingDesc, fileDesc, sourceDesc, editorialDecl, extent, idno,
keywords,
langUsage, language, notesStmt principal, profileDesc, projectDesc,
pubPlace, publicationStmt, publisher, resp, respStmt, revisionDesc,
series, seriesStmt, taxonomy, textClass, titleStmt

DIVISIONS OF THE BODY: (1)

div

ELEMENTS at the beginning and end of divs: (13)

model.divTopPart or elements that can appear only at the beginning of
a div:
head, opener, salute
model.divBottomPart or elements that can appear at the end of a div:
closer, postscript, signed, trailer
model.divWrapper or elements that can appear at the beginning or end
of a div:
argument, byline, dateline, docAuthor, docDate, epigraph

STRUCTURAL CONTAINERS BELOW THE DIV LEVEL: (9)

l, lg, note, p, q , quote, said sp, floatingText

"INLINE" ELEMENTS OF VARIOUS KINDS: (38)

abbr, add, addrLine, address, date, email, emph, foreign, hi, name,
num, sub*, sup*, term

Elements that have to do with textual variants or errors:

choice, corr, gap, add, orig, reg, sic, unclear

Elements of a bibliographical kind:

bibl,cit, author, editor, imprint,imprimatur, docAuthor, docDate,
docEdition, docImprint, docTitle, edition, editor, title, titlePage,
titlePart

LISTS: (3)

list, item, label

EMPTY ELEMENTS MARKING SOME DIVISION THAT IS NOT 'CONTAINERIZED'(4)

lb, milestone, pb, sb*

SIMPLE ANALYTIC ELEMENTS: (3)

c, s, w

PERFORMANCE TEXTS: (9)

castGroup, castList, castItem, role roleDesc, epilogue, prologue,
stage, speaker

TABLES, FORMULAE, AND GRAPHICS (6)

figure, figDesc
formula
table, row, cell

LINKING, SEGMENTATION, AND ALIGNMENT (6)

ab, link, ptr, seg, ref, rs

Document generated by Confluence on Apr 19, 2009 15:04