|
MONK : July 23 conference call
This page last changed on Jul 23, 2007 by mkirschenbaum@gmail.com.
Workflow application is close to being done: by Monday there should be something people can use. jBPM and Spring are being used, will cover upload and pre-processing into noraDB (Lucene plus eXist); next step (a week or two from now) will be to add MorphAdorner as an alternative to OpenNLP (need Phil's documentation for MorphAdorner for this step). Next step after that will be for John Norstad to work with Amit to add workflow pre-processing for WordHoard's relational database (end of first week of August). Single Sign-on, distributed resources, local servers, etc.: Bill and Amit will work out mirroring or single sign-on, most likely by centralizing shared resources on monk.lis and making sure all monk participants have log-ons that don't expire. ProQuest materials are covered for this purpose: monk participants at any institution can use these texts for MONK purposes. Analytics: work ongoing on the Wright fiction archive, xslt translator for tei-xml to monk-xml. Interface: Stan has his folks working as much as possible; more money may be required; some help on proxy layer is likely forthcoming from SEASR. Recent work has included work on understanding state, featurelens (additional data pre-processing beyond what is supported in MONK). Social network analysis might be another example of additional pre-processing that use-cases could generate. Graph work that Steve's been doing is similar: you grab some data and you want to pull it into some structure, use with some tool. In this case, put the data in a graphML file, a secondary representation. So, what's the process for managing the tension between experimentation and generalization? Experimentation driven by use cases, with least-possible programming, is fine; once an experiment produces any interesting results, it's time to review the process it represents to see if it is a candidate for generalization. If it is to be generalized, the process of producing the secondary representation needs to be added to the workflow pre-processing, a proxy-level access for that representation needs to be defined, an itinerary for analysis of this representation needs to be produced, and an interface for visualizing the results needs to be created, or added to an existing interface. Proof of concept moves to proof positive when it can be generalized across multiple works. Next candidates for generalization, based on experiments already done in Featurelens and in Steve's experiments with civil war texts: named entities (people names, place names) and graphs of links between entities. Talk about next time:
|
| Document generated by Confluence on Apr 19, 2009 15:05 |