|
MONK : MonkDataAccess Readme (archive)
This page last changed on Feb 20, 2008 by martinmueller@northwestern.edu.
From the MonkDataAccess code by John Norstad Read Me First This Tomcat servlet is used for testing, debugging, tuning, and exploring the Monk static datastore implementation. The Monk static datastore stores and makes available to the higher layers of the Monk server all of the static data for Monk. This includes bibliographic data, structural data, and tagging data for all the texts which have been ingested into the system, plus HTML-formatted versions of the texts themselves. The datastore provides Java access methods for extracting data from the store for any and all purposes, including searching for objects for direct presentation in end-user applications as tables, lists, concordances, or in other presentation formats, getting feature counts and frequencies for analysis by data-mining and other analytic procedures, and getting tokenized streams of text for working with n-gram and other colocation analyses, repetition analyses, and corpus query language pattern matching operations. The current datastore contains 5 corpora, 306 works by 108 authors, 15,387 work parts with their text, and approximately 41 million words with NUPOS morphological tagging data. The datastore should scale well to the 200-300 million words planned for the first release of Monk. Scaling beyond that level, e.g., to a billion words or more, would require significant work on a distributed datastore architecture. While this is certainly possible in theory with the current datastore architecture, in practice it is beyond the scope of what is feasible in the limited time remaining under our current Monk grant. All monks, not just the programmers, should benefit from reading the overview documentation, which is presented in the section titled "Architecture and Programming". Pay special attention to the architecture drawing. Non-programmers may safely skip the Java programming examples and the detailed javadoc formal specifications. This servlet is not and never will be an end-user application. It's just an old-fashioned page-flipping web app, with way too many forms and tables, with no user sessions or state. It is not the Monk server, although the core datastore engine will be part of that server. The documentation presented here is programmer documentation, not end-user documentation. This first release is missing many features. It is based on a subset of WordHoard data, and does not yet incorporate any of the many additional features we have talked about wanting for Monk. This includes but is not limited to: User data. |
| Document generated by Confluence on Apr 19, 2009 15:04 |