Sophie

Sophie

distrib > Fedora > 13 > x86_64 > by-pkgid > 2dc7ae7102ce788eb8a15dec0caf7708 > files > 348

xapian-core-devel-1.0.21-1.fc13.i686.rpm

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.5: http://docutils.sourceforge.net/" />
<title>Xapian Synonym Support</title>
<style type="text/css">

/*
:Author: David Goodger (goodger@python.org)
:Id: $Id: html4css1.css 5196 2007-06-03 20:25:28Z wiemann $
:Copyright: This stylesheet has been placed in the public domain.

Default cascading style sheet for the HTML output of Docutils.

See http://docutils.sf.net/docs/howto/html-stylesheets.html for how to
customize this style sheet.
*/

/* used to remove borders from tables and images */
.borderless, table.borderless td, table.borderless th {
  border: 0 }

table.borderless td, table.borderless th {
  /* Override padding for "table.docutils td" with "! important".
     The right padding separates the table cells. */
  padding: 0 0.5em 0 0 ! important }

.first {
  /* Override more specific margin styles with "! important". */
  margin-top: 0 ! important }

.last, .with-subtitle {
  margin-bottom: 0 ! important }

.hidden {
  display: none }

a.toc-backref {
  text-decoration: none ;
  color: black }

blockquote.epigraph {
  margin: 2em 5em ; }

dl.docutils dd {
  margin-bottom: 0.5em }

/* Uncomment (and remove this text!) to get bold-faced definition list terms
dl.docutils dt {
  font-weight: bold }
*/

div.abstract {
  margin: 2em 5em }

div.abstract p.topic-title {
  font-weight: bold ;
  text-align: center }

div.admonition, div.attention, div.caution, div.danger, div.error,
div.hint, div.important, div.note, div.tip, div.warning {
  margin: 2em ;
  border: medium outset ;
  padding: 1em }

div.admonition p.admonition-title, div.hint p.admonition-title,
div.important p.admonition-title, div.note p.admonition-title,
div.tip p.admonition-title {
  font-weight: bold ;
  font-family: sans-serif }

div.attention p.admonition-title, div.caution p.admonition-title,
div.danger p.admonition-title, div.error p.admonition-title,
div.warning p.admonition-title {
  color: red ;
  font-weight: bold ;
  font-family: sans-serif }

/* Uncomment (and remove this text!) to get reduced vertical space in
   compound paragraphs.
div.compound .compound-first, div.compound .compound-middle {
  margin-bottom: 0.5em }

div.compound .compound-last, div.compound .compound-middle {
  margin-top: 0.5em }
*/

div.dedication {
  margin: 2em 5em ;
  text-align: center ;
  font-style: italic }

div.dedication p.topic-title {
  font-weight: bold ;
  font-style: normal }

div.figure {
  margin-left: 2em ;
  margin-right: 2em }

div.footer, div.header {
  clear: both;
  font-size: smaller }

div.line-block {
  display: block ;
  margin-top: 1em ;
  margin-bottom: 1em }

div.line-block div.line-block {
  margin-top: 0 ;
  margin-bottom: 0 ;
  margin-left: 1.5em }

div.sidebar {
  margin: 0 0 0.5em 1em ;
  border: medium outset ;
  padding: 1em ;
  background-color: #ffffee ;
  width: 40% ;
  float: right ;
  clear: right }

div.sidebar p.rubric {
  font-family: sans-serif ;
  font-size: medium }

div.system-messages {
  margin: 5em }

div.system-messages h1 {
  color: red }

div.system-message {
  border: medium outset ;
  padding: 1em }

div.system-message p.system-message-title {
  color: red ;
  font-weight: bold }

div.topic {
  margin: 2em }

h1.section-subtitle, h2.section-subtitle, h3.section-subtitle,
h4.section-subtitle, h5.section-subtitle, h6.section-subtitle {
  margin-top: 0.4em }

h1.title {
  text-align: center }

h2.subtitle {
  text-align: center }

hr.docutils {
  width: 75% }

img.align-left {
  clear: left }

img.align-right {
  clear: right }

ol.simple, ul.simple {
  margin-bottom: 1em }

ol.arabic {
  list-style: decimal }

ol.loweralpha {
  list-style: lower-alpha }

ol.upperalpha {
  list-style: upper-alpha }

ol.lowerroman {
  list-style: lower-roman }

ol.upperroman {
  list-style: upper-roman }

p.attribution {
  text-align: right ;
  margin-left: 50% }

p.caption {
  font-style: italic }

p.credits {
  font-style: italic ;
  font-size: smaller }

p.label {
  white-space: nowrap }

p.rubric {
  font-weight: bold ;
  font-size: larger ;
  color: maroon ;
  text-align: center }

p.sidebar-title {
  font-family: sans-serif ;
  font-weight: bold ;
  font-size: larger }

p.sidebar-subtitle {
  font-family: sans-serif ;
  font-weight: bold }

p.topic-title {
  font-weight: bold }

pre.address {
  margin-bottom: 0 ;
  margin-top: 0 ;
  font-family: serif ;
  font-size: 100% }

pre.literal-block, pre.doctest-block {
  margin-left: 2em ;
  margin-right: 2em }

span.classifier {
  font-family: sans-serif ;
  font-style: oblique }

span.classifier-delimiter {
  font-family: sans-serif ;
  font-weight: bold }

span.interpreted {
  font-family: sans-serif }

span.option {
  white-space: nowrap }

span.pre {
  white-space: pre }

span.problematic {
  color: red }

span.section-subtitle {
  /* font-size relative to parent (h1..h6 element) */
  font-size: 80% }

table.citation {
  border-left: solid 1px gray;
  margin-left: 1px }

table.docinfo {
  margin: 2em 4em }

table.docutils {
  margin-top: 0.5em ;
  margin-bottom: 0.5em }

table.footnote {
  border-left: solid 1px black;
  margin-left: 1px }

table.docutils td, table.docutils th,
table.docinfo td, table.docinfo th {
  padding-left: 0.5em ;
  padding-right: 0.5em ;
  vertical-align: top }

table.docutils th.field-name, table.docinfo th.docinfo-name {
  font-weight: bold ;
  text-align: left ;
  white-space: nowrap ;
  padding-left: 0 }

h1 tt.docutils, h2 tt.docutils, h3 tt.docutils,
h4 tt.docutils, h5 tt.docutils, h6 tt.docutils {
  font-size: 100% }

ul.auto-toc {
  list-style-type: none }

</style>
</head>
<body>
<div class="document" id="xapian-synonym-support">
<h1 class="title">Xapian Synonym Support</h1>

<!-- Copyright (C) 2007 Olly Betts -->
<div class="contents topic" id="table-of-contents">
<p class="topic-title first">Table of contents</p>
<ul class="simple">
<li><a class="reference internal" href="#introduction" id="id1">Introduction</a></li>
<li><a class="reference internal" href="#model" id="id2">Model</a></li>
<li><a class="reference internal" href="#queryparser-integration" id="id3">QueryParser Integration</a></li>
<li><a class="reference internal" href="#current-limitations" id="id4">Current Limitations</a><ul>
<li><a class="reference internal" href="#explicit-multi-word-synonyms" id="id5">Explicit multi-word synonyms</a></li>
<li><a class="reference internal" href="#backend-support" id="id6">Backend Support</a></li>
</ul>
</li>
</ul>
</div>
<div class="section" id="introduction">
<h1><a class="toc-backref" href="#id1">Introduction</a></h1>
<p>Xapian provides support for storing a synonym dictionary, or thesaurus.  This
can be used by the Xapian::QueryParser class to expand terms in user query
strings, either automatically, or when requested by the user with an explicit
synonym operator (<tt class="docutils literal"><span class="pre">~</span></tt>).</p>
<p>Note that Xapian doesn't offer automated generation of the synonym dictionary.</p>
</div>
<div class="section" id="model">
<h1><a class="toc-backref" href="#id2">Model</a></h1>
<p>The model for the synonym dictionary is that a term or group of consecutive
terms can have one or more synonym terms.  A group of consecutive terms is
specified in the dictionary by simply joining them with a single space between
each one.</p>
</div>
<div class="section" id="queryparser-integration">
<h1><a class="toc-backref" href="#id3">QueryParser Integration</a></h1>
<p>In order for any of the synonym features of the QueryParser to work, you must
call <tt class="docutils literal"><span class="pre">QueryParser::set_database()</span></tt> to specify the database to use.</p>
<p>If <tt class="docutils literal"><span class="pre">FLAG_SYNONYM</span></tt> is passed to <tt class="docutils literal"><span class="pre">QueryParser::parse_query()</span></tt> then the
QueryParser will recognise <tt class="docutils literal"><span class="pre">~</span></tt> in front of a term as indicating a request for
synonym expansion.  If <tt class="docutils literal"><span class="pre">FLAG_LOVEHATE</span></tt> is also specified, you can use <tt class="docutils literal"><span class="pre">+</span></tt>
and <tt class="docutils literal"><span class="pre">-</span></tt> before the <tt class="docutils literal"><span class="pre">~</span></tt> to indicate that you love or hate the synonym
expanded expression.</p>
<p>A synonym-expanded term becomes the term itself OR-ed with any listed synonyms,
so <tt class="docutils literal"><span class="pre">~truck</span></tt> might expand to <tt class="docutils literal"><span class="pre">truck</span> <span class="pre">OR</span> <span class="pre">lorry</span> <span class="pre">OR</span> <span class="pre">van</span></tt>.  A group of terms is
handled in much the same way.</p>
<p>If a term to be synonym expanded will be stemmed by the QueryParser, then
synonyms will be checked for the unstemmed form first, and then for the stemmed
form, so you can provide different synonyms for particular unstemmed forms
if you want to.</p>
<p>If <tt class="docutils literal"><span class="pre">FLAG_AUTO_SYNONYMS</span></tt> is passed to <tt class="docutils literal"><span class="pre">QueryParser::parse_query()</span></tt> then the
QueryParser will automatically expand any term which has synonyms, unless the
term is in a phrase or similar.</p>
<p>If <tt class="docutils literal"><span class="pre">FLAG_AUTO_MULTIWORD_SYNONYMS</span></tt> is passed to <tt class="docutils literal"><span class="pre">QueryParser::parse_query()</span></tt>
then the QueryParser will look at groups of terms separated only by whitespace
and try to expand them as term groups.  This is done in a &quot;greedy&quot; fashion, so
the first term which can start a group is expanded first, and the longest group
starting with that term is expanded.  After expansion, the QueryParser will
look for further possible expansions starting with the term after the last
term in the expanded group.</p>
</div>
<div class="section" id="current-limitations">
<h1><a class="toc-backref" href="#id4">Current Limitations</a></h1>
<div class="section" id="explicit-multi-word-synonyms">
<h2><a class="toc-backref" href="#id5">Explicit multi-word synonyms</a></h2>
<p>There ought to be a way to explicitly request expansion of multi-term synonyms,
probably with the syntax <tt class="docutils literal"><span class="pre">~&quot;stock</span> <span class="pre">market&quot;</span></tt>.  This hasn't been implemented
yet though.</p>
</div>
<div class="section" id="backend-support">
<h2><a class="toc-backref" href="#id6">Backend Support</a></h2>
<p>Currently synonyms are only supported by flint databases.  They work with a
single database or multiple databases (use Database::add_database() as usual).
We've no plans to support them for the deprecated Quartz backend, nor for
InMemory, but we do intend to support them for the remote backend in the
future.</p>
</div>
</div>
</div>
</body>
</html>