|
MONK : Conference call, 2008 April 11
This page last changed on Apr 12, 2008 by martinmueller@northwestern.edu.
Present: Amit, Brian, Duane, John N. Loretta, Martin, Sara, Tanya Brian reported on the progress of Abbot. Of 695 TCP-EEBO texts (~ 60 million words), all but 67 now parse under TEI-Analytics. Additional post-meeting comments by Martin: We will work with the ~630 texts that currently parse. As for the remaining texts, the majority will parse once certain adjustments have been made to the TEI-Analytics schema, notably the content model for <sp> and <postscript>. We will want these adjustments to be friendly amendments to TEI and are still working on that process. We can now say that all the texts currently envisaged for inclusion in MONK I exist in TEI-A format. They include
It would be trivial to include selections from Early American fiction or DocSouth if there are use cases that require them. These collections will not pose new parsing problems. Amit discussed visualization routines for the results that might work with Meandre and with the Monk interface Duane and Loretta reported about the porting of text analytics from D2K into SEASR. These fall under the three broad headings of classification, clustering, and information extraction. Loretta and Tanya reported on some experiments with extracting named entities and associating them with other entities or groups of words from particular semantic fields (e.g. color) Loretta will talk about Monk and FeatureLens at the forthcoming ICDM data mining conference in Atlanta. We discussed a possible trip by Amit and Steve or Brian to Evanston in May to tackle workflow problems. |
| Document generated by Confluence on Apr 19, 2009 15:04 |