|
This page last changed on Feb 25, 2007 by unsworth.
Nora-Chunk and Monk Data Object Model
Martin/Bill's Approach
- Common DTD Model
An entry level DTD close to TEI DTD that is what file would be converted to
Bill and Martin have been working on it; Like TCP EBO 12,500 novels (TEI 70 Tags)
MM: There would be a non trivial process...
JN: Need to architect this and resources available in the monk project such that a domain a process can take care off, subset of TEI markup.
A brief discussion by Pib and JN about Word Hoard data model and if required Nora Data model.
- Read only database and swing apps with Hibernate.
A brief discussion about Spring/ACEGI for the MonkServices.
- Using Existing Framework for authentication and Model View Control. Dynamic binding.
ACEGI/AJAX for session handling.
- MONK Object Format Workflow would involve Both Scholars and Programmer.
Adornment Pipeline
Regarding Sharing data
MM thinks this can be done. Pib thinks he can share the code, but will share
the data files once it's cleared from MM.
Javadocs: http://panini.northwestern.edu/~pib/morphadorner/
Pib's Adornment should Know
#Structure elements/ simple formatting/ side text/ main text.
#Certain XML texts cross file boundaries.
...
Preliminary discussion about data models / Distributed indices Vs one large index.
This discussion is postponed for email. Key issue is response time for the User
- Using Distributed Databases
- Using Web Services for combining Sparse Matrix.
- Issues of Intellectual property rights.
Preliminary discussion of the proxy/architecture.
Postponed
-Distributed database model.
Documentation and Wiki update Person.
John N will be our wiki person and Documentation wip.
- Collection of Fiction from 1600-1920 and some public domain text. We Will use:
-Fiction from 1600-1900
-Fiction,Poetry,Prose,Drama and Other for 1600
-Chadwyck-Healey Archive 1800 -96 texts and 1900:250 texts
-Early English prose fiction 250 texts
-Wright American Fiction Archive: Known novels published civil war +- 10years 2500 archive -1000 fully (can be shared publicly)
-Early American Fiction: 107 Novels
- A Total of 120-150 Million words
- Collection of Fiction from 1600-1920 and some public domain text. -Cover 1500-1900 time line -Fiction, Prose, Drama and Other
- Fiction is a simple model as far structure is concerned; Drama Prose have more complicated structures.
- What is the minimum standard?
A Total of 250 Million words.
Testing and Integration Person.
-JUNIT Testing Framework.
Next Steps.
- Email discussions about specifications and data model.
Explore
- UIMA CAS Apache Site:
- Managing Gigabytes http://www.cs.mu.oz.au/mg/ http://mg4j.dsi.unimi.it/
Tasks and dependencies.
Who is doing what.
Alternatives and Points of failure.
Time lines.
|