|
This page last changed on Apr 15, 2007 by amitku.
Components of Nora-DB
-Nora-chunk.
-eXist database.
-Lucene full text search engine.
-OpenNLP/Gate for feature extraction.
Each of these components can be replaced with an equivalent module except for Lucene database engine.
Nora-chunk
Nora-chunk provides the semantic chunks at the level of collections. The XPATH expressions are mapped to semantic divisions
like work, paragraph, letter etc. An analytics library can query for feature properties at the level of these
semantic divisions. The semantic divisions cut across the file system boundaries, so a work can span multiple files on the
file system and a single file can have two volumes of the work.
eXist database
eXist database provides two broad functionalities
- It stores the XML Collection and returns XML fragments for a XPATH, and when used with nora-chunk it returns semantic units.
- It provides a stream of text which is used by Lucene to interject feature position start and end tokens.
Lucene Full text search engine.
Lucene search engine provides the feature frequency and position information (start and end) of the feature instances. The features are required to be pre determined and an index is created for each feature type and for each chunk type. The examples of features are word/bigrams/trigrams.
OpenNLP/Gate and other NLP toolkits
These toolkits detect and extract the features and pass it over to the Lucene for adding to the lucene indices.
|