Sophie

Sophie

distrib > Mandriva > 2010.0 > x86_64 > by-pkgid > 61b13275c228657a5e0ed84b12cb78e4 > files > 7

apache-mod_estraier-0.3.2-8mdv2010.0.x86_64.rpm

mod_estraier

== Abstract

mod_estraier is an apache module that registers web pages processed by
the apache and search from them using the node API of Hyper Estraier.
Especially, indexing and searching the documents through the proxy or
dynamic contents like Wiki or BBS is the main object of mod_estraier.

See below URL for Hyper Estraier:

http://hyperestraier.sourceforge.net/

The project page of mod_estraier in sf.net is the following:

http://sourceforge.net/projects/modestraier/

mod_estraier is distributed under GPL2.

== Environment

mod_estraier is tested in the following environments:

* linux 2.6.11-mm4
* gcc 3.4.3, 4.0.1
* Apache 2.0.52, 2.0.54, 2.1.7
* Hyper Estraier 1.0.2
* QDBM 1.8.33

== How to run

=== Compile

Extract the archive of mod_estraier, and do

 $ ./configure
 $ make
 # make install

You can use --with-tidy configure option to set the location of tidy,
and you can use --with-apxs configure option to set the location of
apxs.

=== Configuration

Next, you should configure the apache. For example, if you want to use
mod_estraier as proxy, you may add to httpd.conf like following:

 LoadModule estraier_module modules/mod_estraier.so

 ProxyRequests On
 <Proxy *>
  Order deny,allow
  Deny from all
  Allow from 127.0.0.1
  SetOutputFilter estraier
  EstraierNode http://localhost:1978/node/test
  EstraierUser admin
  EstraierPass admin
  EstraierDenyURI http://[a-z]*.?google.co
  EstraierAllowURI http://labs.google.com/
  EstraierDenyURI favicon.ico
  EstraierUseWeight On
  EstraierFilterCommand ^application/pdf H@/usr/local/share/hyperestraier/filter/estfxpdftohtml
  EstraierFilterCommand ^application/msword H@/usr/local/share/hyperestraier/filter/estfxmsotohtml
  EstraierFilterCommand ^application/vnd.ms-(excel|powerpoint) H@/usr/local/share/hyperestraier/filter/estfxmsotohtml
 </Proxy>

or, if you want to use as reverse proxy, you may add to httpd.conf
like following:

 <Location /my_web/>
  SetOutputFilter estraier
  EstraierNode http://localhost:1978/node/test
  EstraierUser admin
  EstraierPass admin
  EstraierDenyURI favicon.ico
  EstraierUseWeight On
 </Location>

The detail of the each options is described in the below section.
There are other samples doc/recipes.conf in package.

=== Restarting of Apache

Then you should restart the apache. You may do like the following:

* > /etc/init.d/httpd restart
* > /etc/init.d/apache2 restart

=== Execution of estmaster

If you have no DB, you may run

 estmaster init casket

to initialize the DB. After the initialization, you do

 estmaster start casket

to execute estmaster. In the above configuration case, you need the
node named "test", and you should create the node with
http://localhost:1978/master_ui

=== Run

Set the proxy host of your browser to "localhost" and the port to
"80". And you browse web sites as usually, your DB becomes larger.

You can search the DB with the node API clients. For example, you can
use the search interface of the node-master with the following URL:

http://localhost:1978/node/test/search_ui

== Options

You may use some options in httpd.conf for mod_estraier settings.

=== EstraierNode

EstraierNode directive specifies the node-server and the node of Hyper
Estraier. If you don't set it, mod_estraier will not work.

=== EstraierUser, EstraierPass

Specifies the username and the password of node-server. If you don't
set them, mod_estraier will not work.

=== EstraierProxyHost, EstraierProxyPort

Specifies the HTTP proxy used when mod_estraier accesses the
node-server. If you don't set them, mod_estraier uses no proxy.

=== EstraierTimeout

Specifies the timeout for accessing the node-server. The default value
is 5 seconds.

=== EstraierDenyURI

Specifies the URI that mod_estraier doesn't register. If you want
mod_estraier not to register google, you may specify like the
following:

 EstraierDenyURI http://[a-z]*.?google.co

You can use this option several times.

=== EstraierAllowURI

Specifies the URI that mod_estraier register. If you want mod_estraier
to register only google, you may specify like the following:

 EstraierDenyURI .*
 EstraierAllowURI http://[a-z]*.?google.co

You can use this option several times.

The effect of the letterer EstraierDenyURI and EstraierAllowURI is
valid.

=== EstraierLanguage

Specifies the language that Hyper Estraier use when registering. If
you are Japanese, specify the following:

 EstraierLanguage ja

You can choose en, ja, zh, ko, misc. And the default is en.

=== EstraierDetachThread

If you specifies the directive On, mod_estraier registers in detached
thread. The response speed may be faster with this option. The default
value is Off.

=== EstraierDenyRequestHeader

Specifies the value of request header you don't want to register. For
example, with

 EstraierDenyRequestHeader Authorization .*

setting, mod_estraier doesn't register authorized place. The condition
of the header and the header-value is described by regular expression.

=== EstraierDenyResponseHeader

Specifies the value of response header you don't want to register.

=== EstraierUseWeight

If you specifies the directive On, mod_estraier adds score weight info
to the URLs viewed more than once. The default value is Off.

=== EstraierFilterCommand

Specifies the filter command for specified Content-Type. For example,
if you specify like the following:

 EstraierFilterCommand ^application/pdf H@/usr/local/share/hyperestraier/filter/estfxpdftohtml

mod_estraier use estfxpdftohtml to convert pdf to html. Use H@ prefix
when filtered file type is html, T@ when text, and no prefix when
Hyper Estraier's document draft.

=== EstraierFilterTmpdir

Specifies the temporary directory for the above filter commands. An
typical example is the following:

 EstraierFilterTmpdir /tmp

=== EstraierDocumentSizeLimit

Specifies max size of the document mod_estraier processes. The default
value is 10000000, that is, 10MB.

== mod_estraier_search

mod_estraier_search is additional module with which you can search
Hyper Estraier DB.

The configuration example is the following:

 LoadModule estraier_search_module modules/mod_estraier_search.so

 <Location /moe/>
  SetHandler estraier_search
  EstsearchNode http://localhost:1978/node/test
  EstsearchUser admin
  EstsearchPass admin
  EstsearchTimeout 5
  EstsearchNodeDepth 0
  EstsearchTemplateHead /home/i/wrk/mod_estraier/tmpl/estseek.head
  EstsearchTemplateFoot /home/i/wrk/mod_estraier/tmpl/estseek.foot
 </Location>

You should rewrite EstsearchTemplate* to right place.

With this configuration, You can use search engine accessing
http://localhost/moe/

In this search engine, you can use google-like syntax for search word.
You can do or-search with "OR" or "|". You can use "-" started search
word to exclude the word from results. You can use link:URI syntax to
select the documents that include the link to URI. You can use
site:URI syntax to select the documents whose URI include specified
URI.

== mod_estraier_cache

mod_estraier_cache is experimental module. mod_estraier_cache
generates document from the DB of Hyper Estraier.

The configuration example is the following:

 LoadModule estraier_cache_module modules/mod_estraier_cache.so
 
 Listen *:8081
 <VirtualHost *:8081>
   SetHandler estraier_cache
   EstcacheNode http://localhost:1978/node/web
   EstcacheUser admin
   EstcachePass admin
 </VirtualHost>

And you set proxy and browse.