Sophie

Sophie

distrib > Fedora > 13 > x86_64 > by-pkgid > 88dbd18b02b4182773ef89b435549e82 > files > 20

wordnet-3.0-14.fc13.i686.rpm

 <!-- manual page source format generated by PolyglotMan v3.0.3a12, -->
<!-- available via anonymous ftp from ftp.cs.berkeley.edu:/ucb/people/phelps/tcltk/rman.tar.Z -->

<HTML>
<HEAD>
<TITLE>SENSEIDX(5WN) manual page</TITLE>
</HEAD>
<BODY>
<A HREF="#toc">Table of Contents</A><P>
 
<H2><A NAME="sect0" HREF="#toc0">NAME </A></H2>
index.sense, sense.idx - WordNet's sense index  
<H2><A NAME="sect1" HREF="#toc1">DESCRIPTION </A></H2>
The WordNet 
sense index provides an alternate method for accessing synsets and word 
senses in the WordNet database.  It is useful to applications that retrieve 
synsets or other information related to a specific sense in WordNet, rather 
than all the senses of a word or collocation.  It can also be used with 
tools like <B>grep </B> and Perl to find all senses of a word in one or more 
parts of speech.  A specific WordNet sense, encoded as a <I>sense_key </I>, can 
be used as an index into this file to obtain its WordNet sense number, 
the database byte offset of the synset containing the sense, and the number 
of times it has been tagged in the semantic concordance texts. <P>
 Concatenating 
the <I>lemma </I> and <I>lex_sense </I> fields of a semantically tagged word (represented 
in a <B>&lt;wf&nbsp; </B>...&nbsp;<B>&gt; </B> attribute/value pair) in a semantic concordance file, using 
<B>% </B> as the concatenation character, creates the <I>sense_key </I> for that sense, 
which can in turn be used to search the sense index file. <P>
 A <I>sense_key 
</I> is the best way to represent a sense in semantic tagging or other systems 
that refer to WordNet senses. <I>sense_key </I>s are independent of WordNet sense 
numbers and <I>synset_offset </I>s, which vary between versions of the database. 
Using the sense index and a <I>sense_key </I>, the corresponding synset (via 
the <I>synset_offset </I>) and WordNet sense number can easily be obtained.  A 
mapping from noun <I>sense_key </I>s in WordNet 1.6 to corresponding 2.0 <I>sense_key 
</I>s is provided with version 2.0, and is described in <B><A HREF="sensemap.5WN.html">sensemap</B>(5WN)</A>
. <P>
 See 
<B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A>
 for a thorough discussion of the WordNet database files.  
<H3><A NAME="sect2" HREF="#toc2">File 
Format </A></H3>
The sense index file lists all of the senses in the WordNet database 
with each line representing one sense.  The file is in alphabetical order, 
fields are separated by one space, and each line is terminated with a 
newline character. <P>
 Each line is of the form: <P>
  <blockquote><I>sense_key&nbsp;&nbsp;synset_offset&nbsp;&nbsp;sense_number&nbsp;&nbsp;tag_cnt 
</I>  </blockquote>
<P>
 <I>sense_key </I> is an encoding of the word sense.  Programs can construct 
a sense key in this format and use it as a binary search key into the 
sense index file.   The format of a <I>sense_key </I> is described below. <P>
 <I>synset_offset 
</I> is the byte offset that the synset containing the sense is found at in 
the database "data" file corresponding to the part of speech encoded in 
the <I>sense_key </I>.  <I>synset_offset </I> is an 8 digit, zero-filled decimal integer, 
and can be used with <B><A HREF="fseek.3.html">fseek</B>(3)</A>
 to read a synset from the data file.  When 
passed to the WordNet library function <B>read_synset() </B> along with the syntactic 
category, a data structure containing the parsed synset is returned. <P>
 <I>sense_number 
</I> is a decimal integer indicating the sense number of the word, within 
the part of speech encoded in <I>sense_key </I>, in the WordNet database.  See 
<B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A>
 for information about how sense numbers are assigned. <P>
 <I>tag_cnt 
</I> represents the decimal number of times the sense is tagged in various 
semantic concordance texts.  A <I>tag_cnt </I> of <B>0 </B> indicates that the sense 
has not been semantically tagged.  
<H3><A NAME="sect3" HREF="#toc3">Sense Key Encoding </A></H3>
A <I>sense_key </I> is represented 
as: <P>
  <blockquote><I>lemma </I><B>% </B><I>lex_sense </I>  </blockquote>
<P>
 where <I>lex_sense </I> is encoded as: <P>
  <blockquote><I>ss_type<B>:<I>lex_filenum<B>:<I>lex_id<B>:<I>head_word<B>:<I>head_id 
</I></B></I></B></I></B></I></B></I>  </blockquote>
<P>
 <I>lemma </I> is the ASCII text of the word or collocation as found in the 
WordNet database index file corresponding to <I>pos </I>. <I>lemma </I> is in lower case, 
and collocations are formed by joining individual words with an underscore 
(<B>_ </B>) character. <P>
 <I>ss_type </I> is a one digit decimal integer representing the 
synset type for the sense.  See <FONT SIZE=-1><B>Synset Type </B></FONT>
  below for a listing of the 
numbers corresponding to each synset type. <P>
 <I>lex_filenum </I> is a two digit 
decimal integer representing the name of the lexicographer file containing 
the synset for the sense. See <B><A HREF="lexnames.5WN.html">lexnames</B>(5WN)</A>
 for the list of lexicographer 
file names and their corresponding numbers. <P>
 <I>lex_id </I> is a two digit decimal 
integer that, when appended onto <I>lemma </I>, uniquely identifies a sense within 
a lexicographer file. <I>lex_id </I> numbers usually start with <B>00 </B>, and are incremented 
as additional senses of the word are added to the same file, although 
there is no requirement that the numbers be consecutive or begin with 
<B>00 </B>.  Note that a value of <B>00 </B> is the default, and therefore is not present 
in lexicographer files.  Only non-default <I>lex_id </I> values must be explicitly 
assigned in lexicographer files.  See <B><A HREF="wninput.5WN.html">wninput</B>(5WN)</A>
 for information on the 
format of lexicographer files. <P>
 <I>head_word </I> is only present if the sense 
is in an adjective satellite synset.  It is the lemma of the first word 
of the satellite's head synset. <P>
 <I>head_id </I> is a two digit decimal integer 
that, when appended onto <I>head_word </I>, uniquely identifies the sense of 
<I>head_word </I> within a lexicographer file, as described for <I>lex_id </I>.  There 
is a value in this field only if <I>head_word </I> is present.  
<H3><A NAME="sect4" HREF="#toc4">Synset Type </A></H3>
The 
synset type is encoded as follows: <P>
  <blockquote><B>1 </B><tt> </tt>&nbsp;<tt> </tt>&nbsp;NOUN <BR>
 <B>2 </B><tt> </tt>&nbsp;<tt> </tt>&nbsp;VERB <BR>
 <B>3 </B><tt> </tt>&nbsp;<tt> </tt>&nbsp;ADJECTIVE <BR>
 <B>4 </B><tt> </tt>&nbsp;<tt> </tt>&nbsp;ADVERB 
<BR>
 <B>5 </B><tt> </tt>&nbsp;<tt> </tt>&nbsp;ADJECTIVE SATELLITE <BR>
  </blockquote>
 
<H2><A NAME="sect5" HREF="#toc5">NOTES </A></H2>
For non-satellite senses the <I>head_word 
</I> and <I>head_id </I> fields have no values, however the field separator character 
(<B>: </B>) is present.   
<H2><A NAME="sect6" HREF="#toc6">ENVIRONMENT VARIABLES (UNIX) </A></H2>

<DL>

<DT><B>WNHOME</B>  </DT>
<DD>Base directory 
for WordNet.  Default is <B>/usr/local/WordNet-3.0 </B>. </DD>

<DT><B>WNSEARCHDIR</B>  </DT>
<DD>Directory in 
which the WordNet database has been installed.   Default is <B>WNHOME/dict 
</B>. </DD>
</DL>
 
<H2><A NAME="sect7" HREF="#toc7">REGISTRY (WINDOWS) </A></H2>

<DL>

<DT><B>HKEY_LOCAL_MACHINE\SOFTWARE\WordNet\3.0\WNHome</B>  </DT>
<DD>Base directory 
for WordNet.  Default is <B>C:\Program&nbsp;Files\WordNet\3.0 </B>. </DD>
</DL>
 
<H2><A NAME="sect8" HREF="#toc8">FILES </A></H2>

<DL>

<DT><B>index.sense</B>  </DT>
<DD>sense 
index </DD>
</DL>
 
<H2><A NAME="sect9" HREF="#toc9">SEE ALSO </A></H2>
<B><A HREF="binsrch.3WN.html">binsrch</B>(3WN)</A>
, <B><A HREF="wnsearch.3WN.html">wnsearch</B>(3WN)</A>
, <B><A HREF="lexnames.5WN.html">lexnames</B>(5WN)</A>
, <B><A HREF="wnintro.5WN.html">wnintro</B>(5WN)</A>
, 
<B><A HREF="sensemap.5WN.html">sensemap</B>(5WN)</A>
, <B><A HREF="wndb.5WN.html">wndb</B>(5WN)</A>
, <B><A HREF="wninput.5WN.html">wninput</B>(5WN)</A>
. <P>

<HR><P>
<A NAME="toc"><B>Table of Contents</B></A><P>
<UL>
<LI><A NAME="toc0" HREF="#sect0">NAME</A></LI>
<LI><A NAME="toc1" HREF="#sect1">DESCRIPTION</A></LI>
<UL>
<LI><A NAME="toc2" HREF="#sect2">File Format</A></LI>
<LI><A NAME="toc3" HREF="#sect3">Sense Key Encoding</A></LI>
<LI><A NAME="toc4" HREF="#sect4">Synset Type</A></LI>
</UL>
<LI><A NAME="toc5" HREF="#sect5">NOTES</A></LI>
<LI><A NAME="toc6" HREF="#sect6">ENVIRONMENT VARIABLES (UNIX)</A></LI>
<LI><A NAME="toc7" HREF="#sect7">REGISTRY (WINDOWS)</A></LI>
<LI><A NAME="toc8" HREF="#sect8">FILES</A></LI>
<LI><A NAME="toc9" HREF="#sect9">SEE ALSO</A></LI>
</UL>
</BODY></HTML>