This page last changed on Mar 21, 2008 by martinmueller@northwestern.edu.

Present: Brian, John N., Martin, Phil, Sara, Steve

Brian reported that the conversion of TCP texts to TEI-A is almost complete. 77/84 of the witchcraft texts and 90% of some 270 drama texts parse. (Texts that don't parse may require some modification of the current TEI-A scheme, which does not yet accommodate some very sensible extensions of TEI P4 by the TCP dtd). Clean-up operations involve

  • treatment of some gap elements
  • incorporating Steve's 'superchanger' script for fixing the SGML way of marking superscripted character
  • converting the tilde character to a combining macron

Sara reported that she has been able to work with the workbench interface (no data), and with a Meandre component (no interface). Current results on NB still have very low confidence thresholds.

Phil reported that he is working on

  • marking sentence boundaries with milestone elements that use unit="startSentence" and unit="endSentence" attributes
  • making Java play nicely with relaxNG schemas, which
  • developing a "pseudo-page" pagination system that will use start and end milestone elements, counting words and creating page numbers with quintiles for reader orientation. He hopes that these pseudo-pages can in most cases respect div boundaries.
Document generated by Confluence on Apr 19, 2009 15:04