edu.northwestern.at.monk.model
Class Corpus

java.lang.Object
  extended by edu.northwestern.at.monk.model.TaggedObject
      extended by edu.northwestern.at.monk.model.CoreObject
          extended by edu.northwestern.at.monk.model.Corpus
All Implemented Interfaces:
Container, java.lang.Comparable<Corpus>

public class Corpus
extends CoreObject
implements Container, java.lang.Comparable<Corpus>

A corpus.

See Also:
MONK Datastore Overview, Licensing Agreement

Nested Class Summary
static class Corpus.Comparator
          A multi-column corpus comparator.
static class Corpus.SortOption
          Corpus sorting options.
 
Method Summary
 int compareTo(Corpus other)
          Compares this instance with another.
static java.util.Collection<Corpus> find(java.util.Collection<SearchCriterion> criteria)
          Finds corpora.
static java.util.Collection<Corpus> find(SearchCriteria criteria)
          Finds corpora.
static java.util.Collection<Corpus> find(SearchCriterion... criteria)
          Finds corpora.
static Corpus get(java.lang.String tag)
          Gets a corpus by tag.
static java.util.Collection<Corpus> getAll()
          Gets all the corpora.
 java.util.Collection<Author> getAuthors()
          Gets the authors.
 int getNumAuthors()
          Gets the number of authors.
 long getNumSentences()
          Gets the number of sentences.
 long getNumSentencesMain()
          Gets the number of sentences in main text.
 long getNumWordBigrams()
          Gets the number of word bigrams.
 long getNumWordBigramsMain()
          Gets the number of word bigrams in main text.
 long getNumWordPartBigrams()
          Gets the number of word part bigrams.
 long getNumWordPartBigramsMain()
          Gets the number of word part bigrams in main text.
 long getNumWordParts()
          Gets the number of word parts.
 long getNumWordPartsMain()
          Gets the number of word parts in main text.
 long getNumWordPartTrigrams()
          Gets the number of word part trigrams.
 long getNumWordPartTrigramsMain()
          Gets the number of word part trigrams in main text.
 long getNumWords()
          Gets the number of words.
 long getNumWordsMain()
          Gets the number of words in main text.
 long getNumWordTrigrams()
          Gets the number of word trigrams.
 long getNumWordTrigramsMain()
          Gets the number of word trigrams in main text.
 int getNumWorks()
          Gets the number of works.
 long getSummaryCount(CumKind cumKind, FeatureKind featureKind, Arity arity, MainKind mainKind)
          Gets a summary count.
 SummaryCounts getSummaryCounts()
          Gets the summary counts.
 SummaryCounts getSummaryCounts(CumKind cumKind)
          Gets the summary counts.
 java.lang.String getTitle()
          Gets the title.
 java.util.Collection<Work> getWorks()
          Gets the works.
static Corpus[] sort(java.util.Collection<Corpus> collection, Corpus.SortOption... options)
          Sorts a collection of corpora.
static void sort(Corpus[] array, Corpus.SortOption... options)
          Sorts an array of corpora.
 
Methods inherited from class edu.northwestern.at.monk.model.TaggedObject
getTag
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

get

public static Corpus get(java.lang.String tag)
Gets a corpus by tag.

Parameters:
tag - Tag.
Returns:
Corpus, or null if none.

getAll

public static java.util.Collection<Corpus> getAll()
Gets all the corpora.

Returns:
Unmodifiable collection of all the corpora in case and diacritical-insensitive alphabetical order by stripped title.

find

public static java.util.Collection<Corpus> find(SearchCriterion... criteria)
                                         throws ModelException
Finds corpora.

Parameters:
criteria - Search criteria.
Returns:
Collection of corpora, in an undefined order. If you want the result to be ordered, you must call a sort method.
Throws:
ModelException -
Unable to execute search

find

public static java.util.Collection<Corpus> find(java.util.Collection<SearchCriterion> criteria)
                                         throws ModelException
Finds corpora.

Parameters:
criteria - Collection of search criteria.
Returns:
Collection of corpora, in an undefined order. If you want the result to be ordered, you must call a sort method.
Throws:
ModelException -
Unable to execute search

find

public static java.util.Collection<Corpus> find(SearchCriteria criteria)
                                         throws ModelException
Finds corpora.

Parameters:
criteria - Search criteria.
Returns:
Collection of corpora, in an undefined order. If you want the result to be ordered, you must call a sort method.
Throws:
ModelException -
Unable to execute search

sort

public static void sort(Corpus[] array,
                        Corpus.SortOption... options)
Sorts an array of corpora.

Parameters:
array - Array of corpora.
options - Sort options, or null to use the natural ordering.

sort

public static Corpus[] sort(java.util.Collection<Corpus> collection,
                            Corpus.SortOption... options)
Sorts a collection of corpora.

Parameters:
collection - Collection of corpora.
options - Sort options, or null to use the natural ordering.
Returns:
Sorted array of corpora.

getTitle

public java.lang.String getTitle()
Gets the title.

Returns:
The corpus title.

getNumWorks

public int getNumWorks()
Gets the number of works.

Returns:
The number of works in the corpus.

getWorks

public java.util.Collection<Work> getWorks()
Gets the works.

Returns:
An unmodifiiable collection of all the works in the corpus in case and diacritical-insensitive alphabetical order by stripped title.

getNumAuthors

public int getNumAuthors()
Gets the number of authors.

Returns:
The number of authors who have works in the corpus.

getAuthors

public java.util.Collection<Author> getAuthors()
Gets the authors.

Returns:
An unmodifiiable collection of all the authors in the corpus in case and diacritical-insensitive alphabetical order by name.

getSummaryCounts

public SummaryCounts getSummaryCounts()
Gets the summary counts.

Returns:
The summary counts.

getSummaryCounts

public SummaryCounts getSummaryCounts(CumKind cumKind)
Gets the summary counts.

Specified by:
getSummaryCounts in interface Container
Parameters:
cumKind - Kind of count: CumKind.CUM or CumKind.NON_CUM. (Ignored.)
Returns:
The summary counts.

getSummaryCount

public long getSummaryCount(CumKind cumKind,
                            FeatureKind featureKind,
                            Arity arity,
                            MainKind mainKind)
Gets a summary count.

Specified by:
getSummaryCount in interface Container
Parameters:
cumKind - Kind of count: CumKind.CUM or CumKind.NON_CUM. (Ignored.)
featureKind - Feature kind: FeatureKind.WORD or FeatureKind.WORD_PART.
arity - Arity: Arity.UNIGRAM, Arity.BIGRAM or Arity.TRIGRAM.
mainKind - Main kind: MainKind.ALL_TEXT or MainKind.MAIN_TEXT.
Returns:
Summary count.

getNumWords

public long getNumWords()
Gets the number of words.

Returns:
The number of words.

getNumWordsMain

public long getNumWordsMain()
Gets the number of words in main text.

Returns:
The number of words in main text.

getNumWordBigrams

public long getNumWordBigrams()
Gets the number of word bigrams.

Returns:
The number of word bigrams.

getNumWordBigramsMain

public long getNumWordBigramsMain()
Gets the number of word bigrams in main text.

Returns:
The number of word bigrams in main text.

getNumWordTrigrams

public long getNumWordTrigrams()
Gets the number of word trigrams.

Returns:
The number of word trigrams.

getNumWordTrigramsMain

public long getNumWordTrigramsMain()
Gets the number of word trigrams in main text.

Returns:
The number of word trigrams in main text.

getNumWordParts

public long getNumWordParts()
Gets the number of word parts.

Returns:
The number of word parts.

getNumWordPartsMain

public long getNumWordPartsMain()
Gets the number of word parts in main text.

Returns:
The number of word parts in main text.

getNumWordPartBigrams

public long getNumWordPartBigrams()
Gets the number of word part bigrams.

Returns:
The number of word part bigrams.

getNumWordPartBigramsMain

public long getNumWordPartBigramsMain()
Gets the number of word part bigrams in main text.

Returns:
The number of word part bigrams in main text.

getNumWordPartTrigrams

public long getNumWordPartTrigrams()
Gets the number of word part trigrams.

Returns:
The number of word part trigrams.

getNumWordPartTrigramsMain

public long getNumWordPartTrigramsMain()
Gets the number of word part trigrams in main text.

Returns:
The number of word part trigrams in main text.

getNumSentences

public long getNumSentences()
Gets the number of sentences.

Returns:
The number of sentences.

getNumSentencesMain

public long getNumSentencesMain()
Gets the number of sentences in main text.

Returns:
The number of sentences in main text.

compareTo

public int compareTo(Corpus other)
Compares this instance with another.

Corpora are ordered in case and diacritical-insensitve increasing alphabetical order by stripped title.

Specified by:
compareTo in interface java.lang.Comparable<Corpus>
Parameters:
other - The other instance to be compared.
Returns:
A negative integer, zero, or a positive integer as this instance is less than, equal to, or greater than the specified instance.