This page last changed on Feb 23, 2008 by martinmueller@northwestern.edu.

Martin points out that "In the checked files of the Wright Archive there are some 21,000 instances of the following tagging:

<p TEIform="p">"Neither have I done so," cried I, in a <orig reg="towering" TEIform="orig">tower-</orig>
   <pb TEIform="pb"/> ing passion.

"It is pretty clear what is going on here with a word break at a page boundary. It is less clear whether this is a good way of tagging the phenomenon, since it leaves the second part of the split word as if it were an independent token."

To handle this, MorphAdorner includes an XML fixer in the XML processor in MorphAdorner, similar to the one used for fixing <gap> elements in EEBO texts. The new fixer works on <orig> elements instead. Words broken by a page break following an <orig> tag are now treated like any other word split by a soft tag. Here is a short sample text comprised of two sentences lifted from Moby Dick. In the first sample sentence, a trailing hyphen is both a part of the word as well as a word continuation marker. In the second sample sentence, the trailing hyphen is not part of the original word.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE TEI.2
  PUBLIC "-//TEI P4//DTD Main DTD Driver File//EN" "http://ariadne.northwestern.edu/monk/dtds/teixlite.dtd">
<TEI.2 id="wright2-1701">
   <teiHeader>
   </teiHeader>
   <text TEIform="text">
      <body TEIform="body">
         <div1 type="chapter" node="wright2-1701:9" part="N" TEIform="div1">
            <p TEIform="p">He wears a beaver hat and
            <orig reg="swallow-tailed" TEIform="orig">swallow-</orig>
               <pb TEIform="pb"/> tailed coat, girdled with a sailor-belt
            and sheath-knife.</p>
         </div1>
         <div1 type="chapter" node="wright2-1701:18" part="N" TEIform="div1">
            <p TEIform="p">But the directions he had given us about keeping
            a yellow warehouse on our starboard hand till we opened a white
            church to the larboard, and then keeping that on the larboard
            hand till we made a corner three points to the starboard, and
            that done, then ask the first man we met where the place was:
            these crooked directions of his very much puzzled us at first,
            especially as, at the outset, Queequeg insisted that the yellow
            warehouse—our first point of departure—must be left on the
            <orig reg="larboard" TEIform="orig">lar-</orig>
               <pb TEIform="pb"/> board hand, whereas I had understood
             Peter Coffin to say it was on the starboard. However, by dint
             of beating about a little in the dark, and now and then
             knocking up a peaceable inhabitant to inquire the way, we at
             last came to something which there was no mistaking.</p>
         </div1>
      </body>
   </text>
</TEI.2>

Here is the MorphAdorner output for the immediate context of the first example.

          <w eos="0" lem="a" pos="dt" reg="a" spe="a" tok="a" xml:id="mobysmall-0050" ord="3" part="N">a</w>
          <c> </c>
          <w eos="0" lem="beaver" pos="n1" reg="beaver" spe="beaver" tok="beaver" xml:id="mobysmall-0060" ord="4" part="N">beaver</w>
          <c> </c>
          <w eos="0" lem="hat" pos="n1" reg="hat" spe="hat" tok="hat" xml:id="mobysmall-0070" ord="5" part="N">hat</w>
          <c> </c>
          <w eos="0" lem="and" pos="cc" reg="and" spe="and" tok="and" xml:id="mobysmall-0080" ord="6" part="N">and</w>
          <orig TEIform="orig" reg="swallow-tailed">
            <c> </c>
            <w eos="0" lem="swallow-tailed" pos="j" reg="swallow-tailed" spe="swallow-tailed" tok="swallow-tailed" xml:id="mobysmall-0090.1" ord="7" part="I">swallow-</w>
          </orig>
          <pb TEIform="pb"></pb>
          <w eos="0" lem="swallow-tailed" pos="j" reg="swallow-tailed" spe="swallow-tailed" tok="swallow-tailed" xml:id="mobysmall-0090.2" ord="7" part="F">tailed</w>
          <c> </c>
          <w eos="0" lem="coat" pos="n1" reg="coat" spe="coat" tok="coat" xml:id="mobysmall-0100" ord="8" part="N">coat</w>
          <w eos="0" lem="," pos="," reg="," spe="," tok="," xml:id="mobysmall-0110" ord="9" part="N">,</w>

Here is the MorphAdorner output for the immediate context of the second example.

          <w eos="0" lem="must" pos="vmb" reg="must" spe="must" tok="must" xml:id="mobysmall-1150" ord="110" part="N">must</w>
          <c> </c>
          <w eos="0" lem="be" pos="vai" reg="be" spe="be" tok="be" xml:id="mobysmall-1160" ord="111" part="N">be</w>
          <c> </c>
          <w eos="0" lem="leave" pos="vvn" reg="left" spe="left" tok="left" xml:id="mobysmall-1170" ord="112" part="N">left</w>
          <c> </c>
          <w eos="0" lem="on" pos="p-acp" reg="on" spe="on" tok="on" xml:id="mobysmall-1180" ord="113" part="N">on</w>
          <c> </c>
          <w eos="0" lem="the" pos="dt" reg="the" spe="the" tok="the" xml:id="mobysmall-1190" ord="114" part="N">the</w>
          <orig TEIform="orig" reg="larboard">
            <c> </c>
            <w eos="0" lem="larboard" pos="av" reg="larboard" spe="larboard" tok="lar-board" xml:id="mobysmall-1200.1" ord="115" part="I">lar-</w>
          </orig>
          <pb TEIform="pb"></pb>
          <w eos="0" lem="larboard" pos="av" reg="larboard" spe="larboard" tok="lar-board" xml:id="mobysmall-1200.2" ord="115" part="F">board</w>
          <c> </c>
          <w eos="0" lem="hand" pos="n1" reg="hand" spe="hand" tok="hand" xml:id="mobysmall-1210" ord="116" part="N">hand</w>
          <w eos="0" lem="," pos="," reg="," spe="," tok="," xml:id="mobysmall-1220" ord="117" part="N">,</w>

The original <orig> tag and contents are passed through to the MorphAdorned output. The spelling split by the <orig> now follows the same general scheme MorphAdorner uses to mark tokens split by soft tags in XML. See the section "Split tokens" in

https://apps.lis.uiuc.edu/wiki/display/MONK/MorphAdorner+XML+Output

for a description of how MorphAdorner handles split tokens.

Document generated by Confluence on Apr 19, 2009 15:04