This page last changed on Feb 22, 2008 by martinmueller@northwestern.edu.

Meeting Minutes

In attendance: Pib, Loretta, Tanya, Martin, Brian, Steve.

Update on Repetition

Loretta, Tanya, and other involved in this project have the algorithm running, but the output is being sent to a domain expert (James) who is looking over the data. Loretta is now thinking about visualization of this data.

The existing Nora app serves this domain (and Erotics and Sentimentality) fairly well, so should we conceive of the existing Nora app as a development domain in MONK that handles this general class of problems.

Martin asks: How does this work? Shouldn't we be describing this analytic in general terms (over and above the particular implementation in D2K)?

Loretta will post the details of the method to the Analytics wiki, and will also make the D2K itinerary available.

SOUNDEX/Metasound

These algorithms are trivial to implement, though (says Steve) the particular path forward on this Use Case is not absolutely clear. Question: Does D2K currently implement SOUNDEX and/or Metasound? Morphadorner undoubtedly will, and may be used to create lexica that include these data points as a matter of course.

Named-entity Extraction

A couple of the analytics require named-entity extraction.

Pib: The algorithms for named entity extraction (whether probabilistic or heuristic) are reasonably good, but we are hindered in this regard by the lack of good training data for literary texts (particularly a corpus of literary texts that spans a large amount of time).

The group agrees that creating such a training set would be of critical importance to MONK generally, but also a valuable deliverable in its own right.

Document generated by Confluence on Apr 19, 2009 15:04