This page last changed on Feb 20, 2008 by martinmueller@northwestern.edu.

This was prepared for Sara Steger's request for cells' responsibility/activity statements, I floated the following description by the data cell. Martin improved the description of the 1st item about the 'TEI simple'. The other item seem to have met with approval with from the data cell. If you need less/more or something different, let us know.

  • Bill

The Data cell is responsible for standards selection, morphological annotation, ingest workflow, data and text access, data stor implementation, and analytics implementation. Data cell works with the other cells to form specific data access and analytic requirements. As of November 2007 we have:

Defined a tightened TEI schema called TEI simple. This is intended to help with MONK but should also provide a useful model for migrating collections maintained in TEI P4 under Library Level 4 Guidelines. Many library collections fall in this group.
Define a part-of-speech tagset called NUPOS
Developed a trainable part-of-speech tagger / lemmatizer / spelling regularizer called MorphAdorner
Developed a richly parameterized data access implementation called MonkDataAccess
Processed an initial collection of 250 novels from the Chadwyck-Healey Nineteenth Century Fiction and ingested its data into a provisional data stor and Fedora repository.
Leveraged existing D2K analytic algorithms and Nora interface specifications to perform early analytics on the NCF collection.

We are currently preparing the Wright American Fiction archive and parts of the Text Creation Partnership's Early English Books Online collection for MorphAdorner tagging and MONK ingest.

Plans or early 2008 include:
Development of the Monk data stor
Review, definition, and implementation of Interface components' data access interface
Review, definition, and implementation of D2K components' data access interface
Review, definition, and implementation of users' data access interface
Development of any D2K / SEASR analytic components required by Analytics cell

Document generated by Confluence on Apr 19, 2009 15:04