|
This page last changed on Jun 12, 2007 by tclement@wam.umd.edu.
- I see to which collections I have access.
- I can search within collections to see where particular docs live.
- I can see previous studies that others have done with these collections (whether it be in article form or in the form of "saved" histories).
- When I pull out a collection or a specific text I can see what processes have been done on them and what I might choose to do, what features I can choose to measure (e.g., n-grams, POS, stopwords, stemming, lemmatization). Maybe even what others have found interesting?
- I choose to see texts from two collections from DocSouth, The Making of Americans and Three Lives.
- I can see 3-grams that have been processed with stemming and lemmatization. I can see 3-grams that have been processed without these analytics. I can see 3-grams that take into account punctuation (so that I might be able to isolate patterns that occur at the beginning or end of a paragraph or sentence) and I can see 3-grams that are processed without respect to punctuation. I also can play with (or at least know) the parameters. So, if my threshold is set at "1" or "10" I somehow know it (at the very least) and can change it (if all my wishes were to come true).
- Once I send off my feature requests to D2K I can see how much time it will take to get my results back.
- When the results come back I can see and sort those results in the different ways we've chosen for FeatureLens; the default settings might be:
- somewhere a toggle or something to a table of basic metrics on the text(s)
- the list of features are by frequency and this includes not only the list of 3-grams but also the singular words (that said, I can "hide" singular words sometimes too so that i can just search n-grams and vice-versa)
- the metric distributions are all shown at once on the top (but not in the text within the center panel) so i can immediately see spikes, etc.
- the list of frequent patterns are organized in such a way (clustered) so that I can somehow see which frequent patterns are related to other ones in the list (FL is not doing this). Also, this list is in the format in which it occurs in the original text (not stemmed or lemmatized).
- All the clusters appear on the overview of the text with the most frequent clusters blobbed in different colors all over the overview (FL does not do this).
- I can sort the lists by:
- trends across the collection
- trends within the sections or chunks of the collection (the list also says which section a pattern is coming from; FL does not do this yet)
- Frequency of patterns
- Length of patterns
- when I click on a frequent pattern I can see where they show up in an overview of the text (already in FL)
- I can also see where that pattern occurs in the context of the original doc (colored, etc. as it already appears in FL)
- I can see a history of what I'm doing and have done.
- I can annotate steps as I go.
- I can save a session.
- I can compare sessions.
- I can somehow share this session with others who are using MONK.
|