|
MONK : Conference call, 2007 Oct. 30, Data
This page last changed on Feb 23, 2008 by martinmueller@northwestern.edu.
Present: Bill Parod (chair, secretary), Phil Burns (Pib), Amit Kumar, Vered Goren, Martin Mueller, Sara Steger, Brian Pytlik Zillig, Tim Cole Data Cell welcomes Brian Pytlik Zillig to the cell. Agenda Item: Sara has four sets of chapter level work sets with the following classifications: Amit will meet with Vered to discuss selection and operation of an appropriate D2K itinerary to use these sets to rank all NCF chapters for sentimentality. Bill provided a D2K InputModule for data access and sparse matrix creation which Vered has reviewed and finds straighforward. Vered suggests we use itineraries 305 309 Agenda Item: Brian has been busy updating the teisimple content model for <w>. Brian asked about <orig> hanPib has a fix for the <orig> problems in Wright. The MONK wiki has a memo from Pib on how MA handles split tokens in general. <orig> handling conforms to this. MorphAdorner XML Output see also [Handling Tim Cole asked how we verify the results of adornment. Martin described his verification process using Microsoft Access. In addition to XML output, MA can provide tabular output with KWIC. Martin uses group and sort routines as well as sampling of 10k or so words to check results. He can usually extrapolate from there whether there are problems or results are good. Pib has a fix for the <orig> problems in Wright. The MONK wiki has a memo from Pib on how MA handles split tokens in general. <orig> handling conforms to this. MorphAdorner XML Output See also Handling orig tags in Wright texts (archive) We expect to do approximately 300 Wright texts taken from the first 3 years, last 3 years, and (Civil) War years. Martin: Wright conversion is the first process of monkification where we use all the tools that we are likely to have. We should take special note of this as it informs process in the future. Tim: How important is the 'collection' model for processing? What about 'loose' individual texts? This is also a relevant processing scenario for us. |
| Document generated by Confluence on Apr 19, 2009 15:04 |