This page last changed on Feb 23, 2008 by martinmueller@northwestern.edu.

Present - Joe Paris, Martin Mueller, Bill Parod (chair and secretary), Tim Cole, Amit Kumar, Loretta Auvil, Bernie A'cs, Brian Pytlik Zillig, James Chartrand

Sara's sentimentalism analysis progress

  • Amit got results and results were dismall - training set was too small. Played with lemma, filtered stop words..
  • Martin - what is an appropriate size of training data?
  • Loretta - Depending on what you want to predict, needs to be representative of the data. Important terms need to be well represented. Normally have training set and testing set. Cross validatation 4-fold and 10 fold. If 4, train on 3 and test on 1. Check score. Using Weka - tries NB, SVM,
  • Martin - perhaps it's that data is insufficient or that phenomenon is uncatchable
  • Loreta, might use Wordnet to add synonymns. Stemming / stop words / . Data is a mess - when accessing Lemmas we ended up stemming on the lemmatized versions. Found dashes and apostrophes at beginning and end. Found dashes in words. Word together and word with hyphens. Roman numberals and numerical values - do we want to use those?

Sara and others will need to look at the word list and review what is needed or not.

In terms of the lemma version, there were times when you would see a spelling difference.

Do you want to have apostrophes and dashes in?
Martin - these sound like data errors. If singletons, do these matter? Loretta - it's hard to know what are singletons. Does past tense matter?

Loretta will send some of the example problem data.

Amit will be away the week of December 7.

Spelling, POS, standardized spelling, lemma (with WC). Loretta would prefer to separate

Loretta need models for data cleaning - synonyms,...
----------
Proxy layer API discussions
Workbench / Data stor integration
Wright American Fiction -

Brian - a few problems to sort out. Have app that almost converts to monk. Some required fields aren't present in the originals so problems. Is it critical that the outputs are fully MONK validated?

Martin - if it's a matter of default values in attributes, it doesn't matter. We will probably run these several times, so we can address this later if at all. Amit can provide MA as a zip file to Brian.

D2K/SEASR
JSON

  • What is the proper shade of grey that works for Interface call. There are a couple of calls that they need in JSON. Andrew would like everything in JSON and a couple would need customization.

For us to create a generic service, do they have consumers? If generic XML to JSON can be done, let's just do it. For James, Maven work is probably more important at this time.

— Next steps

1) NU datastor in proxy (Maven)
2) Revise XML messages
3) JSON converstion


All hands meeting
Amit, Loretta, James (not), Tim (not), Bernie (perhaps), Joe (not),

Document generated by Confluence on Apr 19, 2009 15:04