This page last changed on Oct 27, 2008 by amitku.

Present: John, Stan, Stefan, Martin, Amit, Steve

Agenda:

Brian, Steve, and Andy produced as part of a start-up grant with NEH a text-reader with lexical data attached to each word. Done in javascript and css? Could it be imported into the workbench? When the code is released...which has to be by the end of the calendar year. Right now, gets its data from a flat file that's been run through a POS tagger etc.; hooking it up to MONKdb would be work, and not in the plan for this phase of the grant. Could MONK do the work of hooking it up to MONKdb, as a first step in generalizing? Steve will ask.

  • Budget:

See attached file. Basically all allocated out to subawards, except for NcSA ($40K). No-cost extension until February for Northwestern?

  • Interface integration:
    • Bubble-lines, DialR (a feature-lens light), Magic circle (all from Alberta): all these exist as prototypes outside of the workbench. We're hoping to get Stefan's time to integrate these into the workbench. Where does the data come from? DialR calls MONK, Magic Circle uses an XML file (callable from Fedora in MONK?); Bubble-lines may be calling the API now.
    • TeksTale (NCSA): now offers unsupervised clustering, but the most time-consuming part of using it is picking the workset, and if you want to recluster by different criteria you need to start over. What you need, obviously, is the ability to pick and create a workset, which we have in MONK. Two weeks ago, we talked about integrating TeksTale with the datastore, but not integrating it into the workbench; in the last couple of weeks, the quality of the workbench has improved, so we could reassess that, but Amit wants to make sure that we don't throw a lot of new stuff in to the interface at the last minute, which will cause a decline in quality. A middle way for TeksTale might be to use the workbench for creating the workset, and then fire off an independent application (TeksTale), or use integrated tools, with the same workset. Also, Duane implemented a version of Wordle/Dunnings inside TeksTale, profiling one work against a reference corpus; but you want to profile corpus against corpus.
    • Amit's also been working on some stuff inside the workbench that could help here: a week or so ago, he started using Flamenco (faceted browser) to look at the Google Advanced Chart config functionality: actually, this is faceted browsing with charts. Six or seven facets can be charted against one another, grouping by facets also (publication date, author, etc.). And within this, you can get opencloud (sourceforge) dunning's comparisons. It's completely outside the workbench right now, and it's not clear whether it should be inside the workbench (perhaps as an alternative to workset creation by tree-browsing or by advanced search), but there could be technical issues (Flamenco is a python application). This question will be discussed in the interface group.
    • Stefan wrote nice comparison tool, but it doesn't use Dunnings; Amit's got some new calls that will allow you to use Phil's code for scaling and OpenCloud for tagging, and this could be available for the workbench after Nov. 1st, when it could be called by Stefan's tool. At the moment, Amit's calls allow comparison of two texts only (works or workparts), but it will allow worksets when it's done. Large worksets will bog things down, but experience in WordHoard suggests that a good strategy for dealing with this is to set a floor for the frequencies that you pay attentiion to (e.g., must occur 100 times to be included). Adjust the call to the datastore so that it queries only for things that occur above that floor. That floor might be scalable according to the size of the workset (small workset, five occurences; medium workset, 25 words; large workset, 100 words).
    • Mandala browser is close to being integrated. Matt's been working on the communication issues between Javascript and Java. Now we need to figure out how it could be used in the workbench. Could be used for browsing worksets, browsing frequencies, etc.
    • Decision tree: this works now, and is integrated into the current release (M2).
    • Workbench in general is now not being driven as much by use cases as we might be. We could fill the remaining time with implementing the things we think are cool and useful, but that makes us the users; we could, on the other hand, go back to the use cases and see what can be done for them. Let's ask the four use case people (Martin, Tanya, Kirsten, Sara) to look at these notes and express some preferences and priorities.
  • Proposals:
    • Stan and Kirsten have one in, for SSHRC, for unsupervised clustering.
    • Steve, Brian and Andrew have the NEH grant discussed above.

monk.xls (application/vnd.ms-excel)
Document generated by Confluence on Apr 19, 2009 15:05