|
This page last changed on Mar 17, 2008 by martinmueller@northwestern.edu.
Present: Martin, Matt, Stan, John, Catherine
Not Present: Steve
Agenda Items:
Not going for the March 18 deadline; more likely going in to NEH Preservation and Access in July (budget up to about $500K). NWU/Northwestern/UIUC Still, we need to figure out what it is we want to do. Latest draft got a fairly negative reaction from Steve on the grounds that there's too much hand work involved. Joe Paris, John Norstad, Phil Burns, Martin had a meeting a NWU last week about the grant; James Chartrand has signed on for work with the library, to do a project over the next six months or so, on a Mellon grant, to do work on a Kirtas project broadly related to collaborative text-keeping. What do MONK folks think are the critical things to do? What is there in it that adds value sufficient to be funded as a separate project from MONK. One answer is making Abbot (purpose-built for MONK) re-usable in other contexts. Another answer: the potential for generating good-enough editions from OCR (OCA texts, for example, or Gutenberg, or TCP: ugly, but systematically so; a plug-in for big text collections like that could pull it through the plug-in, into Abbot, into MONK or something else). JMU will send comments to Martin and Steve. One major question: where will the texts (improved or otherwise) live? Tim Cole and Bill Parod might need to work this out; could have implications for the CIC shared repository (for Google Books materials etc.).
http://booklamp.org/
This looks like a start-up commercial venture, much on the blog about the drama of waiting in the lobby at Google, not much about the technology. Seems like a recommender service for Amazon; using books that you like, find other books (modeled after Pandora, explicitly). They are thinking about text in terms of divisions of content (sufficient granularity to track plot development, stylistic shifts, etc.: slow beginning, suddenly the plot picks up, lots of action after that (if you like that, then they can find you books like that). If they can do these things, then there has to be some fairly sophisticated text analysis under the hood, and it would seem to be compatible with MONK. No documentation, but the video on the web site, "try booklamp" feature. Matt can try to get through the press person to someone who would actually be willing to discuss the technology, on a conference call. NB: Catherine discovered that if you choose 1984 as your book, the best match is the USA Patriot Act. Dialogue and action indicators change quite a bit from beginning.
Geoff would like to have a conference call about collaboration with MONK
Another call waiting to happen, with Dan Cohen
- Google Books API
http://googleblog.blogspot.com/2008/03/book-info-where-you-need-it-when-you.html
Could be useful for generating book-cover thumbnails for various visualizations. Possibly useful to us in connection with books that we don't have permission to publish. We could allow people to do some text-mining with TCP texts, for example, giving them not more than a few lines of context, unless they want to go off and look at Google Books to see the text. Implications for interface and proxy: a flag of some kind that identifies publishable and non-publishable texts, and then different retrieval and display strategies for the two. Could we have a look soon at what we have in possession or in view that will be public domain? The only part that's completely public is the 300 novels from the Wright American Fiction collection (plus texts from NORA, WordHoard). TCP is off-limits, and that's a lot of stuff. We could go off to add all the novels in Wright, the material in DocSouth, Perseus materials, etc. Next supercell call, look more closely at this?
- ManyEyes
Student Worker
Stefan wrote to Martin Wattenberg, to see if the student worker needed to be a student. Chad Fullerton could be the person, if it needs to be a student. If not, then we could send Andrew or Mike or someone. Hasn't heard back, wrote again last night. JMU met with Martin Wattenberg and Fernanda V. last week; they seemed very interested in generalizable strategies for visualizing information about literary texts that would actually be useful to people interested in those texts: first lines of novels (last lines, too, or both), or first lines of chapters, etc. Embedding the visualization in the third-party web site is pretty close to Matt's idea for a publishing model in MONK.
- sequential WordClouds
(Milena and Kirsten)
Look like tagclouds. Catherine has an example she thinks works better. The idea is to show changes in different editions of a text over time. On the up side, people are getting used to this kind of visualizations, but on the down side, there are lots of problems with the way weights are calcluated--long words count more, for example. The wow factor is there, though. If we set up a connection to ManyEyes, we have this for free. Use this to look at Leaves of Grass? Many of our examples of visualizations are single-book examples; this at least allows you to look at multiple instances or editions of a book, or whole collections. Carlos's information glyph is an example. We need to come up with classes of visualizations that apply to classes of literary investigations, and we need to group and present and explain those.
- hackfest plans
August? Set some deadlines, milestones, things to be done. Perhaps in the interim we could get a couple of developers together with a single user or use case. Sara: April, Ontario. Confirm milestones for August, and confirm a booster visit for Sara, next supercell call.
|