This page last changed on Nov 20, 2008 by amitku.

The following corpora are all located at Northwestern

Update from Pib

This share is available
from on campus at Northwestern as

\\ariadne.northwestern.edu\monk

and from off campus via webdav as

https://ariadne.northwestern.edu:8443/ariadne/ .

You can enter username "xxxx" and password "xxxxx" when prompted,
matching the access credentials for the Monk datastore. On both
Windows and Mac systems you need to accept the self-signed SSL
certificate when prompted. Please feel free to copy these texts
locally for Monk project purposes. However, since the eebo and ncf
texts are licensed, please do not distribute these credentials
outside the Monk group, and please do not distribute the texts
outside the Monk group either.

When we decide a batch of texts are "good enough" for ingestion into
Prior, we place them in the monk shared file area. Likewise when we
make corrections, the corrected files replace the previous
versions. Unadorned files appear in directory "unadorned" for each
collection, while adorned files appear in directory "adorned".

The monk share is read-only. You cannot add, delete, or modify files.

The webdav access is a lightweight shell over the Windows monk share
using a modified version of a program called Davenport. Davenport
accesses the Windows share to generate web pages from directory
listings, and interfaces with the standard Windows authentication
methods for each directory. The Jetty servlet server takes care of
the web server needs.

I have tested access to the monk share using Windows web folders and
the third-party program WebDrive under Windows XP. Both work fine,
although as usual for a Windows 2003 server like Ariadne, you have to
reenter your username/password more often than you would
like. Saving the password when prompted is one way to avoid this
nuisance. On Mac OS X, using command-K (or Go, Connect To Server)
and fill out the forms requesting the URL address (given above), the
domain (blank for a local ariadne account like "monk", AT for a NetID
account), your username, and your password. N.B. Your Mac OS X
system has to be reasonably up-to-date in order to access a secure
webdav site using https:. If your have an older system the Goliath
webdav client is supposed to work. I am not a Mac person so I
haven't tried Goliath.

I have not been able to get Konqueror on Linux to connect, but this
is probably not too important right now. There are other clients for
Linux I haven't tried (e.g., davfs2) that might work better. If
worst comes to worst, you can always access the share as a plain web
site using a web browser.

The "monk" share on ariadne now also contains adorned versions of the
eebo, ncf, and wright collections with the extended bibliographic
information John added to his latest version of the Monk
datastore. The bibliographically enhanced texts are found in the
"bibadorned" subdirectories of each collection. The Shakespeare
(sha) collection does not have the extended bibliographic information
yet, so there is no bibadorned subdirectory for Shakespeare, nor are
the Shakespeare texts in the Monk datastore yet.

There are five eebo texts for which there is no enhanced
bibliographic information.

a04632_02.xml
a04632_03.xml
a04632_09.xml
a04633_02.xml
a16636_02.xml

There are no bibadorned versions of these files yet, nor do these
files appear in the Monk datastore. I expect the "bibadorned"
versions of these files, as well as those for the Shakespeare files,
to become available within a couple of weeks.

For the moment I am leaving both the original adorned texts and the
bibliographically enhanced adorned texts on ariadne. Once the
Shakepeare texts receive the enhanced bibliographic information, and
the missing eebo bibliographic information is also supplied, we will
probably store only the bibliographically enhanced texts in the
"adorned" subdirectories and do away with the separate "bibadorned"
subdirectories.

The adorned and bibadorned texts do not pass XML validation with the
TEI-A schema. The TEI-A schema needs to be updated to reflect the
new "monkHeader" element and its subelements. A problem with the
"part=" for "w" elements also needs correction. Once the TEI-A
schema is updated with these corrections, the adorned and
bibliographically enhanced adorned files can be validated.

– Phil "Pib" Burns
Northwestern University, Evanston, IL. USA

NCF texts

The 250 Chadwyck-Healey nineteenth century British novels are stored in /contents/monk/collections/ncf . Both the adorned and the unadorned texts are available. Both sets of texts should validate against the teisimple DTD.

Last updated 2007/08/28 .

Northwestern University is the keeper of these texts.

MOA Stein texts

/contents/monk/collections/moa

Tanya Clement is the person responsible for getting the texts.

MorphAdorned Stein texts

MorphAdorned versions of the two Gertrude Stein texts Making of America and Three Lives are stored in /contents/monk/stein . Also available are unadorned and adorned versions of the texts with named entities added by MorphAdorner's version of the Gate named entity extractor. The "entitified" texts demonstrate the limitations of the Gate named entity extractor for literary purposes. These texts do not yet validate against the teisimple DTD.

Last updated 2007/08/03 .

Northwestern University is the keeper of the adorned versions of the texts.

Early American Fiction

Adorned and unadorned versions of The Scarlet Letter are stored in /contents/monk/collections/eaf . The adorned text should validate against the teisimple DTD.

Last updated 2007/08/03 .

Northwestern University is the keeper of this text.

Wright Fiction Archive

Adorned and unadorned versions of Moby Dick and Uncle Tom's Cabin are stored in /contents/monk/collections/wright . The adorned version of Uncle Tom's Cabin demonstrates how the lack of training data for the dialectical language adversely impacts morphological adornment. These texts do not yet validate against the teisimple DTD.

Last updated 2007/09/17 .

Northwestern University is the keeper of these texts.

University Of Nebraska is the keeper of these texts. -Last provided by Steve Ramsay

DTDs

The XML DTDs used by various Monk corpora are stored in /contents/monk/collections/dtds .

Last updated 2007/08/01 .

Northwestern University is the keeper of the DTDs.

Training data

The MorphAdorner training data for nineteenth century British fiction is stored in /contents/monk/collections/ncf/monk/ncf/trainingdata .

Last updated 2007/08/01 .

Northwestern University is the keeper of the training data.

Document generated by Confluence on Apr 19, 2009 15:04