|
MONK : 6-28-07 Minutes
This page last changed on Jun 28, 2007 by mkirschenbaum@gmail.com.
PRESENT: Matt K., Greg, Tanya, Matt B. Stan, James, Stefan, John U. DISCUSSION Thoughts on Google / OCA:
Could untrained classification strategies be used to clean up OCR scans in the first place?
Workflow?
People would be able to withdraw materials from OCA and load them as a corpus How many books are they producing?
What kinds of projects can we take on using dirty books vs. clean ones?
Data not at a place where we could use it for serious analytics
Operations to perform on large-scale corpora:
If we can begin articulating specific scenarios w/ Google/OCA, that's how we can open up avenues toward those resources Annotation: Annotation for FeatureLens, or moving toward more abstract/generalizable model for project as a whole John talked to Roy Rosenzweig and Dan Cohen about Zotero, possibly moving two projects together, shared project management
Tim Cole Key Questions: Are we annotating texts or states? What's the object of annotation? Matt B.: "we're going to do history at the state level, why not do annotation at state level too?" ManyEyes
User registration for MONK
Multiple layers: Distributed proofreading FeatureLens text-analysis environment, sub windows, views (like Eclipse) for interface Stefan mentioned idea for a Web-based implementation of Eclipse ACTION ITEMS John U. will push SuperCell about availability of a common data model. Catherine will post requirements for FeatureLens annotation to list. Matt will contact Loretta about ManyEyes/UIMA. Matt B. will recommend some places to start for doing deep background reading on how state/history are conceptualized in software architectures James will send information about user registration system in Nora |
| Document generated by Confluence on Apr 19, 2009 15:04 |