|
This page last changed on Apr 15, 2007 by amitku.
Components of the chunk file.
Schema: http://www.noraproject.org/demo/chunk/chunk.rng
A chunk file defines a single collection and it has collection as the root element.
Brief description of elements
- <head> element holds the information about how to retrieve the collection from xml database, the meta data information, the file system files that make the collection, the chunk hierarchy (semantic divisions) and an identifier for the collection @id along with a
label. An example of the head element is below

- <label> element holds the actual label content for example "Emily Dickinson collection".
- <metadata> element holds information about collection level metadata.
- <files> and <file> element refer to files on the file system that make the collection.
- <chunkDesc> and <default> hold information about the available chunks and the default unit of data mining (work usually).
- <transform> element stores the transformation chain that is required to be carried out on the collection for tokenization
and other analytical pre-processing -this part of the chunk has not been developed.

- <body> element holds the information about the structure of the collection, it has nested <div> elements. Each <div> element has
a name, id and an equivalent attribute. These are described below
- <div> element has three attributes, the id which is unique and identifies the chunk, the name which defines the type of chunk semantically work/chapter/para etc and the equivalent attribute which is used to reconcile collection level differences.
- <label> element holds the title of the chunk usually /TEI.2/text/div/title or /TEI.2/teiHeader/fileDesc/titleStmt/title[@type="?"]
It can in it's unprocessed state have an <xpath> element or when processed string value for example

- <resources> element is a collection of <resource> elemements that refer to the ids of the <file> in the header.
- <resource> element currently refers to the file type resource and the <xpath> element points to exact text fragment that
constitutes the chunk, see the example below.

Example Fragments:
The Hierarchy of work volume and chapter, notice the nested divs.

The Hierarchy of chapter, paragraph

|