This page last changed on Nov 04, 2008 by plaisant@cs.umd.edu.

Members

  • Catherine Plaisant, UMD, Chair (and scribe for now)
  • Tanya Clement, UMD
  • Sara Steger, Georgia
  • Steve Ramsay, Nebraska
  • Milena Radzikowska, Alberta
  • Martha Nell Smith, UMD
  • Ted Underwood, UIUC
  • John Unsworth, UIUC
  • Kirsten Uszkalo, Alberta
  • Cheryl Wilkinson, Alberta

CURRENT STATUS

See short status updates next to the individual use cases for Tanya, Sara, Kirsten and Steve  (only Tanya and Sara's use cases have seen much activity with tools)

At first meetings were one-to-one-or-two to develop the use case scenarios.
Now the users are active in the individual cells and/or waiting for tools to become available for testing and use.
This cell will become more active when we can start evaluating interfaces or analytics components.

We had picked Friday early afternoon as a good time for potential conference call or for individual chats.
Contact Catherine: plaisant@cs.umd.edu, (301)405-2768, cplaisant on skype (on demand)

Feedback on supercell list of tools

bubble line etc,

• Bubble-lines, DialR (a feature-lens light), Magic circle (all from Alberta): all these exist as prototypes outside of the workbench. We're hoping to get Stefan's time to integrate these into the workbench. Where does the data come from? DialR calls MONK, Magic Circle uses an XML file (callable from Fedora in MONK?); Bubble-lines may be calling the API now.
Here are URLs (warning: sometimes they are working; sometimes not)
http://www.radicalgolem.com/workspace/CircleMagic/labelsOnOff_SpecialHoover/circleMagic_1.html
http://www.cs.ualberta.ca/~orodrigu/dial/
http://staticred.net/bubblelines/
Also, there is a demo for Knots in the workbench (DO NOT WHERE EXACTY - WHERE TO FIND IT??). You choose one Work, then search for words, assigning a new colour for each word you search. The display shows a line representing 100% of the work, that bends at the percentages where the word appears.

      • We has not seen it at all or not recently (we only now have URLs to look at)Teksta

Teckstale

• TeksTale (NCSA): now offers unsupervised clustering, but the most time-consuming part of using it is picking the workset, and if you want to recluster by different criteria you need to start over. What you need, obviously, is the ability to pick and create a workset, which we have in MONK. Two weeks ago, we talked about integrating TeksTale with the datastore, but not integrating it into the workbench; in the last couple of weeks, the quality of the workbench has improved, so we could reassess that, but Amit wants to make sure that we don't throw a lot of new stuff in to the interface at the last minute, which will cause a decline in quality. A middle way for TeksTale might be to use the workbench for creating the workset, and then fire off an independent application (TeksTale), or use integrated tools, with the same workset. Also, Duane implemented a version of Wordle/Dunnings inside TeksTale, profiling one work against a reference corpus; but you want to profile corpus against corpus.
http://norma.ncsa.uiuc.edu:1710/index.action

      • Here there is a clear sense that clustering is something everyone would like to use. Will try/look in more detais.

Flamenco

• Amit's also been working on some stuff inside the workbench that could help here: a week or so ago, he started using Flamenco (faceted browser) to look at the Google Advanced Chart config functionality: actually, this is faceted browsing with charts. Six or seven facets can be charted against one another, grouping by facets also (publication date, author, etc.). And within this, you can get opencloud (sourceforge) dunning's comparisons. It's completely outside the workbench right now, and it's not clear whether it should be inside the workbench (perhaps as an alternative to workset creation by tree-browsing or by advanced search), but there could be technical issues (Flamenco is a python application). This question will be discussed in the interface group.
http://monk.lis.uiuc.edu/cgi-bin/flamenco.cgi/monkfl/Flamenco
login: monk

      • Looked promising but all those 3 use case folks have our workset defined so it is less important to us in short term.
        But useful later or for other
      •  

Tagclouds

  Stefan wrote nice comparison tool, but it doesn't use Dunnings; Amit's got some new calls that will allow you to use Phil's code for scaling and OpenCloud for tagging, and this could be available for the workbench after Nov. 1st, when it could be called by Stefan's tool. At the moment, Amit's calls allow comparison of two texts only (works or workparts), but it will allow worksets when it's done. Large worksets will bog things down, but experience in WordHoard suggests that a good strategy for dealing with this is to set a floor for the frequencies that you pay attentiion to (e.g., must occur 100 times to be included). Adjust the call to the datastore so that it queries only for things that occur above that floor. That floor might be scalable according to the size of the workset (small workset, five occurences; medium workset, 25 words; large workset, 100 words).
http://monk.lis.uiuc.edu:8888/wbench/trunk/apps/workflow_new/
login: monk
THEN try the bottom toolset (search by example with wordle). You can just select a work and then see a wordle for it. Not sure what measure it display. But clearly it is would be easy to have a toolset that 1) select 2 toolsets, and 2) display a wordcloud of the dunning likelihood ratio comparison of those 2 workset.

      • Clear sense that this will be very useful as both Tanya and Sara did this kind of analysis successfully already (by hand and with Wordle)
        But need to do a comparison of the 2 texts, not side by side. Picking 2 workset and seeing the wordcloud of dunning log likelihood is the minimum needed.

Mandala

• Mandala browser is close to being integrated. Matt's been working on the communication issues between Javascript and Java. Now we need to figure out how it could be used in the workbench. Could be used for browsing worksets, browsing frequencies, etc.

      • Not clear where the latest version is? Was not clear if it would be useful for anything else than picking a workset (and as for Flamenco therefore less important to us). Not clear how easy to use practically.

Decision TRee

• Decision tree: this works now, and is integrated into the current release (M2). 
Go to workbench, search by example, then in the options pick Naïve bayes with decision tree.

  • Sorry NOT WORKING QUITE YET.
    We Need to try, and see if it returns the right thing. But Sara clearly liked it when it was not in the workbench so should like it here too.



(OLD) Derived requirements

Monk high level requirements - V1 started by Catherine
You Want to . . . from Martin
Required Texts from all

see the Interface Cell, Analytics Cell or Data Cell where the related activity is now

The USE CASES

How do you add a use case? A user writes up the narrative of the process that's envisioned, noting possible objects of interest in the text, ways in which those objects might be identified by the user, identified for the software, made visible or useful by the software, subjected to analysis by the user, etc. (JU)

Template of questions

Repetition (SCHOLAR: Tanya) Work on Gertrude Stein's use of repetition, using tools that expose lexical or grammatical patterns with special attention to the concept of variation.
For status update see: repetition-status-Dec2007
DONE/ON HOLD: This remains a part of Tanya's thesis, but no new work was added recently.  2 Papers written

Character Development in Stein and NCF (SCHOLAR: Tanya) 
IN PROGRESS. Special tool developed by Romain at UMd with Amit's help for data processing.

Sentimentality (SCHOLAR: Sara) Work on British and American sentimentality, using quantitative tools that look for patterns of affect at various levels of the text (vocabulary, sentence structure, structure of the work as a whole).
Sentimentality Storyboard
Status updates: Sentimentality Status June 2008 - Sentimentality-status-April2008 - sentimentality-status-Dec2007
ONGOING...  Soon to work with the Monk Workbench.

Deathbed Use Case (SCHOLAR: Sara) A linguistic study of Victorian deathbed scenes
NEW - ONGOING - Added Summer 08 - Done with using WordHoard

Transformation(SCHOLAR: Kirsten) Work on Early Modern English Witchcraft Tracts, using tools that allow the transformative elements in these tracts to be traced thematically, temporally, and geographically.
For status update see: transformation-status-Dec2007
ON HOLD...  This text not accessible from the Monk workbench yet. 

Geographical Awareness (SCHOLAR: Steve) Work on Austen and Empire, for example by looking at mentions of place, using some named entity recognition in combination with a gazetteer. NOTE: Steve has started to write related materials in the Anlytics page [https://apps.lis.uiuc.edu/wiki/display/MONK/Analytics+Cell]
NOT ACTIVE: Steve says: "In part because [it wasn't] going anywhere, but mostly because I started to devoting myself to TEI Analytics and the curatorial app."

Erotics (SCHOLAR: Martha)
NEVER DEVELOPPED

Curator work (CURATOR: virtual "Jack") Edited by Catherine for now. 
RELATED WORK: Sara is doing curatorial work during summer 08.  We should be able to report on that...

SOUNDEX (SCHOLAR: Steve)
NOT ACTIVE: Steve says: "In part because [it wasn't] going anywhere, but mostly because I started to devoting myself to TEI Analytics and the curatorial app."

Lexicon (SCHOLAR: Martin)
NOT ACTIVE

Syntactic Fragments (SCHOLAR: Martin)
NOT ACTIVE

Profiling an Author (SCHOLAR: Martin)
NOT ACTIVE

Simple Searching (SCHOLAR: Digital Neophyte (aka Stéfan))
NOT ACTIVE


Other possible use cases from other colleagues

Allegory (SCHOLAR: Matt Wilkens) 

Who are Monk users? (general descriptions)

A) SCHOLAR: an individual in a large university environment, with access to collections of literary texts
For the individual user, MONK should provide tools for text-Analysis on pre-processed collections

  • Tools for text-mining with pre-processed texts
    • Supervised learning
    • Unsupervised learning
  • Tools for text analysis with pre-processed texts

B) CURATOR: a librarian-collection curator who is responsible for providing MONK services alongside those collections

For the curator, MONK should provide administrative tools, for example

  • tools that a systems librarian can use for installing MONK
  • tools for producing a formal description of the collection that functions as a configuration file that software can leverage to do pre-processing, populate interfaces, etc. (e.g., the nora chunk file)
  • tools for building the MONK index once the curator has created the configuration fil

C) SCHOLAR/GUEST-CURATOR: a user who has his or her own collection of texts and wants to submit them to MONK-processing and is willing to act as collection curator.

D)(Possibly) NON SCHOLARS. e.g. a school teacher and her students could define lesson plans and activities

Original Milestones (i.e. what we wrote at the Feb. 2007 Chicago meeting):

  • Three or four use cases documented, with actual users who are interested in doing them. At least one macro example, where the focus is at the collection level rather than on reading the individual texts.
  • Have a users meeting with users who are within the project
  • Generalize from these use-cases: what other like questions could be asked with this or other collections? What do these use cases have in common? How are they different from one another?
  • Consider whether to recruit specific user communities (e.g., biblical scholars) and solicit their use cases, build tools for them
  • Have a users meeting that snowballs from the project's users to larger communities

Document generated by Confluence on Apr 19, 2009 15:05