mod_estraier == Abstract mod_estraier is an apache module that registers web pages processed by the apache and search from them using the node API of Hyper Estraier. Especially, indexing and searching the documents through the proxy or dynamic contents like Wiki or BBS is the main object of mod_estraier. See below URL for Hyper Estraier: http://hyperestraier.sourceforge.net/ The project page of mod_estraier in sf.net is the following: http://sourceforge.net/projects/modestraier/ mod_estraier is distributed under GPL2. == Environment mod_estraier is tested in the following environments: * linux 2.6.11-mm4 * gcc 3.4.3, 4.0.1 * Apache 2.0.52, 2.0.54, 2.1.7 * Hyper Estraier 1.0.2 * QDBM 1.8.33 == How to run === Compile Extract the archive of mod_estraier, and do $ ./configure $ make # make install You can use --with-tidy configure option to set the location of tidy, and you can use --with-apxs configure option to set the location of apxs. === Configuration Next, you should configure the apache. For example, if you want to use mod_estraier as proxy, you may add to httpd.conf like following: LoadModule estraier_module modules/mod_estraier.so ProxyRequests On <Proxy *> Order deny,allow Deny from all Allow from 127.0.0.1 SetOutputFilter estraier EstraierNode http://localhost:1978/node/test EstraierUser admin EstraierPass admin EstraierDenyURI http://[a-z]*.?google.co EstraierAllowURI http://labs.google.com/ EstraierDenyURI favicon.ico EstraierUseWeight On EstraierFilterCommand ^application/pdf H@/usr/local/share/hyperestraier/filter/estfxpdftohtml EstraierFilterCommand ^application/msword H@/usr/local/share/hyperestraier/filter/estfxmsotohtml EstraierFilterCommand ^application/vnd.ms-(excel|powerpoint) H@/usr/local/share/hyperestraier/filter/estfxmsotohtml </Proxy> or, if you want to use as reverse proxy, you may add to httpd.conf like following: <Location /my_web/> SetOutputFilter estraier EstraierNode http://localhost:1978/node/test EstraierUser admin EstraierPass admin EstraierDenyURI favicon.ico EstraierUseWeight On </Location> The detail of the each options is described in the below section. There are other samples doc/recipes.conf in package. === Restarting of Apache Then you should restart the apache. You may do like the following: * > /etc/init.d/httpd restart * > /etc/init.d/apache2 restart === Execution of estmaster If you have no DB, you may run estmaster init casket to initialize the DB. After the initialization, you do estmaster start casket to execute estmaster. In the above configuration case, you need the node named "test", and you should create the node with http://localhost:1978/master_ui === Run Set the proxy host of your browser to "localhost" and the port to "80". And you browse web sites as usually, your DB becomes larger. You can search the DB with the node API clients. For example, you can use the search interface of the node-master with the following URL: http://localhost:1978/node/test/search_ui == Options You may use some options in httpd.conf for mod_estraier settings. === EstraierNode EstraierNode directive specifies the node-server and the node of Hyper Estraier. If you don't set it, mod_estraier will not work. === EstraierUser, EstraierPass Specifies the username and the password of node-server. If you don't set them, mod_estraier will not work. === EstraierProxyHost, EstraierProxyPort Specifies the HTTP proxy used when mod_estraier accesses the node-server. If you don't set them, mod_estraier uses no proxy. === EstraierTimeout Specifies the timeout for accessing the node-server. The default value is 5 seconds. === EstraierDenyURI Specifies the URI that mod_estraier doesn't register. If you want mod_estraier not to register google, you may specify like the following: EstraierDenyURI http://[a-z]*.?google.co You can use this option several times. === EstraierAllowURI Specifies the URI that mod_estraier register. If you want mod_estraier to register only google, you may specify like the following: EstraierDenyURI .* EstraierAllowURI http://[a-z]*.?google.co You can use this option several times. The effect of the letterer EstraierDenyURI and EstraierAllowURI is valid. === EstraierLanguage Specifies the language that Hyper Estraier use when registering. If you are Japanese, specify the following: EstraierLanguage ja You can choose en, ja, zh, ko, misc. And the default is en. === EstraierDetachThread If you specifies the directive On, mod_estraier registers in detached thread. The response speed may be faster with this option. The default value is Off. === EstraierDenyRequestHeader Specifies the value of request header you don't want to register. For example, with EstraierDenyRequestHeader Authorization .* setting, mod_estraier doesn't register authorized place. The condition of the header and the header-value is described by regular expression. === EstraierDenyResponseHeader Specifies the value of response header you don't want to register. === EstraierUseWeight If you specifies the directive On, mod_estraier adds score weight info to the URLs viewed more than once. The default value is Off. === EstraierFilterCommand Specifies the filter command for specified Content-Type. For example, if you specify like the following: EstraierFilterCommand ^application/pdf H@/usr/local/share/hyperestraier/filter/estfxpdftohtml mod_estraier use estfxpdftohtml to convert pdf to html. Use H@ prefix when filtered file type is html, T@ when text, and no prefix when Hyper Estraier's document draft. === EstraierFilterTmpdir Specifies the temporary directory for the above filter commands. An typical example is the following: EstraierFilterTmpdir /tmp === EstraierDocumentSizeLimit Specifies max size of the document mod_estraier processes. The default value is 10000000, that is, 10MB. == mod_estraier_search mod_estraier_search is additional module with which you can search Hyper Estraier DB. The configuration example is the following: LoadModule estraier_search_module modules/mod_estraier_search.so <Location /moe/> SetHandler estraier_search EstsearchNode http://localhost:1978/node/test EstsearchUser admin EstsearchPass admin EstsearchTimeout 5 EstsearchNodeDepth 0 EstsearchTemplateHead /home/i/wrk/mod_estraier/tmpl/estseek.head EstsearchTemplateFoot /home/i/wrk/mod_estraier/tmpl/estseek.foot </Location> You should rewrite EstsearchTemplate* to right place. With this configuration, You can use search engine accessing http://localhost/moe/ In this search engine, you can use google-like syntax for search word. You can do or-search with "OR" or "|". You can use "-" started search word to exclude the word from results. You can use link:URI syntax to select the documents that include the link to URI. You can use site:URI syntax to select the documents whose URI include specified URI. == mod_estraier_cache mod_estraier_cache is experimental module. mod_estraier_cache generates document from the DB of Hyper Estraier. The configuration example is the following: LoadModule estraier_cache_module modules/mod_estraier_cache.so Listen *:8081 <VirtualHost *:8081> SetHandler estraier_cache EstcacheNode http://localhost:1978/node/web EstcacheUser admin EstcachePass admin </VirtualHost> And you set proxy and browse.