|
MONK : January 21 conference call
This page last changed on Jan 21, 2008 by unsworth.
Supercell call January 21 Bill Parod will be leaving AT and MONK, to join the librray and work on Fedora repositories, in about three weeks time. MONK institutions that are also Fedora institutions: Maryland (Susan Schreibman, who is leaving for Ireland, and she might not be replaced immediately) Shared semantics for collection access and analysis would be more important to interoperability than shared deployment of Fedora. Milestones, January-March: 1. Wright, NCF, EEBO in TEI-A: Witchcraft text from EEBO has been run through the process and looks pretty good. About 80 of the 500 texts are witchcraft texts, and we have a user for them, and they are typical. Wright is pretty well done, and in general we're on track. 2. TEI-A ingest routine designed for Monk Datastore: current agreement is to model the process with shell scripts, when it is stable re-instantiate in a way that is easier for curators and end-users. 3. Wright, NCF, EEBO ingested into Monk Datastore: trivial for Wright; NCF is done in Tei-Simple, need to be re-run them with TEI-A; EEBO is non-trivial. Note that if we will have multiple representations we will need persistent IDs, so putting Fedora up front in the ingest process probably makes sense. 4. SEASR Web Services: now working. 5. SEASR clustering components: Weka clustering components available. 6. SEASR Place-name extraction, limited by region: PEAR integration would make available to SEASR name-entity extraction that exists in UIMA. Phil Burns has some input to offer here too: Morphadorner offers name tags, there might be value in working on a second pass that would look for name phrases. Bring this up in MONK/SEASR meeting; when do the names become visible in the interface? Can they be fixed or tinkered with? Personal names and place names are very hard to tell apart. 7. SEASR Map-mashup component: not available, may be something that they work out for a Mellon meeting. 8. Interface integration with new Proxy and Datastore: this is progressing; Andrew has a hybrid workbench that uses both nora and monk calls, moving nora calls into monk. New calls also, monk-specific. should be up to dtae with known requirements by the hackfest. [work is progressing generally on interface design, based on UMD all-hands conversations; conversations with Fluid Project are ongoing--we should contribute components in order to get uptake and community] 9. Ability to ship visualizations to ManyEyes via URL/API: no news yet; Matt will follow up. JMU will also bring this up in conversations with SEASR. Word-Cloud, Scatterplot, Bubble Chart, for starters. Piotr has also been thinking about text visualizations, and this is what he's working on for his Ph.D. at Reading. 10. Ability to export MONK data to CSV: this has implications for interface, collaboration, analytics, etc. Amit might be the best positioned to suggest where this CSV format should be created--early in data exploration, as an internal format, at the end of a process, at the interface level. JMU will raise the question in SEASR meeting. Cells, chairs, etc.: Collapse Data and Analytics, collapse Users/Interface/Collaboration? Seems like a good idea for efficiency, limit number of conference calls. Data Cell call tomorrow will work out new chair. Hackfest: 15 coming in for the 7th, leaving the 10th. JMU has made arrangements. Travel should be billed to the home instituion if funding permits, referred to UIUC if not. JMU will not be there; Martin will come on Sunday for debriefing. End-of-year accounting will be requested from the sites soon. |
| Document generated by Confluence on Apr 19, 2009 15:05 |