This page last changed on Oct 16, 2007 by sgs@mcmaster.ca.

Bill Parod has asked if the interface cell could try to document metadata requirements in one spot. I thought it might be helpful to break them down by type of tool and proposed tools of each type. This will be an iterative process, expanding as we turn our attention to each type of tool and the variations we can try within it. Each list goes from simplest to most complex. I use a taxonomy of three kinds of tool: basic, enhanced, and experimental. Please assume that the more-advanced tools will use all the metadata from the less-advanced tools, plus more as indicated.

1. Type of tool: Corpus Browser

Corpus browsers are intended to show users what we have in the monk library, and to allow them to select items to construct a workset.

1.1 Tree Browser (basic)
This browser displays a list that can be expanded or collapsed. Checkboxes beside each item allow the user to select documents for inclusion in a workset.

  • names of corpora/collections
  • short title
  • authors of works
  • size (ideally words, but possible filesize)

1.2 Document Selector (enhanced)
The document selector displays the list of works in multiple columns. The list can be sorted.

  • (basic metadata and..)
  • full title (in addition to short title)
  • first circulation date range start
  • first circulation date range end

1.3 Text Tile Browser (experimental)
This is a visualization tool that begins with a TreeMap-style display of the entire library, which the user subdivides dynamically using facets. It is primarily for grouping and subgrouping rather than sorting, which works better in the Document Selector.

  • (enhanced metadata and...)
  • sex of the author
  • author life date range start
  • author life date range end
  • place of origin
  • major genre
  • keywords subject terms already used to index the documents. I think, for instance, of papers published through the ACM, which are indexed by the authors using the ACM Computing Classification System (http://www.acm.org/class/1998/overview.html)
  • word counts (for each level of the hierarchy: corpus, work, and chunk)
  • sentence counts
  • chunk counts each chunk node would have counts for its full descendency (collection knows how many works, chapters, paragraphs etc. it has)

1.4 Mandala Browser (experimental)
This is another visual tool that focuses on dynamically-assembled queries that can be nuanced combinations of metadata from various levels.

  • xml structure. For browsing one work, the mandala uses the structure in two ways: first, the dots represent chunks based on the structure. Second, the magnets can be defined using fields from the structure. I think what we want in the first case will depend on the type of work. For plays, we've used speeches and speakers. We could also use stage directions and scenes. If we have any epic poetry, or even long lyric poems, I'd say stanzas would be good. For prose, we may want paragraphs. For browsing collections, the dots represent whole works, but the magnets can be defined using fields from the structures within works.
  • parts of speech
  • lemmata

For all tools in this section, it would be useful to have a generic proxy call for returning metadata hierarchy with specific arguments, for example:

// get the full chunks hierarhcy from collections to div-level chunks
// this is a ton a data and should probably be avoided
getCollectionsMetadataHierarchy

// get the full chunks hierarhcy from collections to a depth of 2
getCollectionsMetadataHierarchy?depth=2

// get the chunks hierarhcy from collections to works (chunkType as array)
getCollectionsMetadataHierarchy?chunkType=collection&chunkType=work

// get the chunks hierarchy from a start node
getCollectionsMetadataHierarchy?node=1111

// get the chunks hierarchy from a start node to a given depth
getCollectionsMetadataHierarchy?node=1111&depth=2

// get the chunks hierarchy with limited fields
getCollectionsMetadataHierarchy?field=title&field=author

2. Type of tool: Data Frame Search and Sort

We need to be able to search and sort in almost every tool on our list. What I am calling the Data Frame Search and Sort, however, is a separate type of tool rather than a component of something else. Martin describes this type of tool in WorkBenchAnalytics.doc. It should accommodate the following kinds of data (also from Martin's document). The closest concept we have for this so far in the MONK tools is the Mandala Browser, which is a loose fit at best.

1. The unique ID of the word occurrence
2. The spelling at the word occurrence
3. the standardized spelling
4. the lemma
5. the part of speech
6. the author
7. sex of author
8. short title
9. date range
10. place of origin
11. major genre
12. the preceding word
13. the following word
14. a KWIC output consisting of the preceding and following n words or n characters
15. the count of the spelling in the work
16. the POS count of the work
17. the word count of the work
18. a flag that marks whether the word occurs within an <l> element or not
19. the work part in which it occurs

3. Type of tool: Text Reader

This tool lets the user read one or more works or chunks.

3.1 Workbench Text Reader (basic)
This version currently lets the reader see any work or chunk in its entirety, rendered into HTML and styled with CSS. Only one work or chunk at a time is available. We need to consider multiple items at once, and also more than one Text Viewer pane, each with a different document, for simple comparisons.

  • work title
  • chunk label
  • text

4. Type of tool: Rate Examples

Search By Example is what I am currently calling Supervised Classification through D2K. The Rate Examples tool lets the user specify chunks that represent a phenomenon of interest. The same tool will also show the system ratings that D2K returns.

4.1 Workbench D2K Manager (basic)
I think we need a new name for this tool, but this is what we have now. It lets the user view a chunk list (in the Workset Manager) and rate individual chunks. We should add the ability to rate each chunk in a number of ways, as Sara has explained in her use case. She's interested in sentimentality, so may want to simultaneously rate for chunks that contain "words associated with motherhood, light imagery, words associated with innocence, words associated with nature, words associated with diminutiveness, etc."

  • work title
  • chunk label (which needs to link to the Text Reader)
  • manual chunk ratings by the user (1-5)
  • system chunk ratings from D2K

5. Type of tool: View Features

In addition to similar chunks, D2K returns the underlying features used to identify them. The user views those features here. It seems to me that FeatureLens provides several formats for viewing such features, so we may begin there. In addition to the D2K features, FeatureLens also views repetition with variation, which may be a different kind of tool, or it may not. I think we need to give this some more thought.

4.1 Workbench Feature Viewer (basic)
The basic feature viewer is a grouped list of the features that D2K considers significant in identifying good examples and bad examples, as well as features that seem to be neutral.

  • Feature list (words, pos, lemmata, combinations of these three)
  • System rating for each feature
  • User rating for each feature
  • Clicking on features serve to trigger them as search terms for the Text Reader

4.2 FeatureLens (experimental)
This is an aggregate toolset that currently includes a Document Selector, Feature Selector, Line Graph, Feature Viewer, and Text Reader. We will likely separate out at least the Document Selector and Text Reader.

  • Feature list with the addition of N-grams
  • Feature offsets in works (for highlighting)
  • Features matched to each of the 6 patterns (increase, decrease, saddle, etc)
  • Feature frequency plotted across works

6. Type of tool: View Relationships

Here the data consists of social networks.

7. Type of tool: Work with Timelines

We help the user work with data from the perspective of changes over time and other kinds of chronologies.

8. Type of tool: Visualize Sonic Colouring

The user of this tool is interested in phonetic renderings of text.

9. Type of tool: Work with Geography

Steve has suggested geographic awareness, which I take to be a special case of the general question of how MONK can support interest in questions related to geography.

10. Type of tool: Project Manager

This tool lets the user save multiple projects and access them later. A project should contain a full history of activity that can be recreated, as well as workbench states (location and size of tools, for instance).

11. Type of tool: Workset Manager

Here the user can keep track of a subset of the collection that has been chosen as the basis for a project.

12. Type of tool: History

This tool provides access to the chronological history of state-changing actions. It can be used to back up to any point.

12.1 Workbench History (basic)
Here we will show a list that the user can access and navigate, and we can use for logging purposes to study user interactions with MONK.

  • User activity records (date and time stamp, action category, type of action)

12.1 Workbench History (enhanced)
I don't know if this is possible, but I would like to suggest the possibility of a branching list, so that a person could go back and change a step without losing subsequent steps. I have to admit I have never seen this kind of history working.

  • User activity with more than a single forward path
Document generated by Confluence on Apr 19, 2009 15:04