This element indicates the beginning of a context. filename is the name
of the original corpora file from which this context was extracted.
paras indicates that this document contains paragraph delimiters.
Start of a new paragraph. paragraph_number is an integer. The first
paragraph in a context is numbered 1 , and paragraph numbers are
incremented sequentially.
Start of a new sentence. sentence_number is an integer. The first
sentence in each context is numbered 1 , and sentence numbers are
incremented sequentially throughout the context . Sentence numbers do
not restart at 1 in each paragraph.
word
This element represents a word form. word is the orthographic form as
it appears in the original document. All of the syntactic and semantic
information is stored as attribute/value pairs described below.
cmd= cmd
Indicates the status of the wf element.
cmd Meaning
tag word is to be tagged
done word is semantically tagged
ignore word should not be tagged
update used during semantic concordance development only
retag used during semantic concordance development only
pos= pos
pos is the syntactic tag assigned by Eric Brill's stochastic
part-of-speech tagger. See Syntactic Tags below for a list of
possible values.
lemma= lemma
The base form of the word or collocation that the other
attribute/value pairs in this wf pertain to. This is the form of
the string used to search the WordNet database. If rdf is present,
lemma is the base form of the redefinition. When pn is present,
redefinition , lemma and category all have the same value.
wnsn= sense_number
sense_number is the integer sense number corresponding to the
WordNet output display.
lexsn= lex_sense
lex_sense , when concatenated onto lemma using % as the
concatenation character, creates a sense_key that indicates the
WordNet sense to which word should be linked. This is the semantic
tag for word . The format of a sense_key is described in
senseidx(5WN) .
pn= category
Indicates that word is a proper noun categorized as one of the
values of CATEGORY. When pn is present, redefinition , lemma and
category all have the same value.
rdf= redefinition
If present, word has been "redefined" to something else. This is
mainly used to define discontinuous collocations, correct
typographical errors in the text, or to enter a string that should
be used to search WordNet instead of word in order to find an
appropriate sense for the semantic tag. When pn is present,
redefinition , lemma and category all have the same value.
dc= distance
Indicates that word is part of a discontinuous collocation in
which the words comprising the collocation are not adjacent.
distance is an integer that specifies how many wf elements away
the semantic tag for the collocation is. It may be either
negative, indicating wf elements prior to this one, or positive,
indicating wf elements following in the file.
sep=" separator_string "
Indicates that the space between this wf element and the next
should be displayed as separator_string . The string may be one or
more character. The default word separator is one space.
tagnote= tagnote_type
A tagnote attribute/value pair is always present if cmd is update
or retag . This is used only during semantic concordance
development, and indicates the type of problem encountered during
semantic tagging. See TAGNOTE_TYPE for a list of possible values.
note=" note"
A note attribute/value pair is always present with tagnote . note
may contain a string that provides additional information about
the tagnote , or may be empty.
ot= other_tag
If present, a semantic tag cannot be assigned to word for one of
the reasons listed in OTHER_TAG.
Syntactic Tags
The following tags are assigned by Brill's stochatstic part-of-speech
tagger.
Syntactic Tag Interpretation
CC Coordinating conjunction
CD Cardinal number
DT Determiner
EX Existential "there"
FW Foreign word
IN Preposition or subordinating conjunction
JJ Adjective
JJR Adjective, comparative
JJS Adjective, superlative
LS List item marker
MD Modal
NN Noun, singular or mass
NNP Proper noun, singular
NNPS Proper noun, plural
NNS Noun, plural
NP Proper noun, singular
NPS Proper noun, plural
PDT Predeterminer
POS Possessive ending
PP Personal pronoun
PR Pronoun
PRP Pronoun
PRP$ Pronoun, plural
RB Adverb
RBR Adverb, comparative
RBS Adverb, superlative
RP Particle
SYM Symbol
TO "to"
UH Interjection
VB Verb, base form
VBD Verb, past tense
VBG Verb, gerund or present participle
VBN Verb, past participle
VBP Verb, non-3rd person singular present
VBZ Verb, 3rd person singular present
WDT Wh-determiner
WP Wh-pronoun
WP$ Possessive wh-pronoun
WRB Wh-adverb
SEE ALSO
escort(1WN) , senseidx(5WN) , wndb(5WN) , morphy(7WN) , semcor(7WN) .
----------------------------------------------------------------------------
Table of Contents
* NAME
* DESCRIPTION
o File Format
o Nomenclature
o File Structure
o Interpretation of SGML Elements
o Syntactic Tags
* SEE ALSO