make your own google
http://combine.it.lth.se/SearchEngineBox/
SearchEngine in a Box using Combine/Zebra
Sprung from development in the EU project ALVIS (IST-1-002068-STP) with the
help of .SE:s Internetfond and based on the two systems Combine Focused
Crawler and Zebra text indexing and retrieval engine. This system allows you
build a vertical search engine for your favorite topic in just 5 easy steps.
But before that you have to install the system on your machine. (Or you can
try it out online before installing).
Installation and testing instructions
Edit /etc/apt/sources.list and add
deb http://combine.it.lth.se/ debian/
deb http://ftp.indexdata.dk/debian sarge main
deb-src http://ftp.indexdata.dk/debian sarge main
Get the crawler, indexer and XSLT tools. Run:
sudo apt-get update
sudo apt-get install combine idzebra2.0 yaz xsltproc
Make sure you have combine version 3.4 or better.
Download the 'SearchEngine ina Box' system, unpack it, and change to where
the software was unpacked. Run
tar zxf SEbox.tgz
cd SearchEngineBox
Initialize crawler for simple test. Run:
sudo combineINIT --jobname atest
combineCtrl --jobname atest load < seeds.txt
Change to the Zebra configuration directory:
cd ZebraConf
make Combine
Tell Zebra where it should run. Edit ZebraConf.xml and change
<host>ldbkit06</host>
<port>3003</port>
to whatever host you are running on and your preferred port
Tell the crawler where the indexer is. Edit /etc/combine/atest/combine.cfg
and add
ZebraHost = <host>:<port>
at the end
ie for the original ZebraConf.xml it would be
ZebraHost = ldbkit06:3003
Generate Zebra configuration. Run
make rmConfs
make
Start the Zebra indexing and database server. Run
rm server.log
zebrasrv -f yazserver.xml -l server.log &
You might consider copying the simple UI to a Web-server (see instructions at
the end of the README file in this directory)
Test it all by starting the simple test crawling. Run
combineCtrl --jobname atest start
You should see things happening in the Zebra log ZebraConf/server.log
Test searching your new database. Use either or both of these possibilities
Use the explain facility of the database directly by opening the URL
http://<host>:<port>/ in your XML enabled browser like FireFox (use the host
and prot you configured above in the ZebraConf.xml file.
Test searching using the simple UI from the ZebraConf directory.
Kill the crawler and Zebra server. Run
combineCtrl --jobname atest kill
kill `cat lock/zebrasrv.pid`
Now you are ready to tailor it to your own application:
Build a vertical search engine in just 5 easy steps
So once the software is installed and tested ...
Create a configuration for Zebra - see the ZebraConf directory
Configure Combine to the crawl you want. Please refer to Combine
Documentation sections 'Configuration' and 'Use Scenarios'. Specifically you
have to create a topic-definition (section 'Crawler operation') for your
particular topic.
Create the crawler
sudo combineINIT --jobname atest --topic YourTopicDefFile.txt
combineCtrl --jobname atest load < seeds.txt
Tell the crawler where the indexer is. Edit /etc/combine/atest/combine.cfg
and add
ZebraHost = <host>:<port>
at the end, where host and port correspond to your Zebra configuration
Start Zebra and the crawler
zebrasrv -f yazserver.xml -l server.log &
combineCtrl --jobname atest start
Make your own UI
And now it's ready for use, building the database as we speak.
Demos
Simple demonstrators of Vertical Search Engines are available here.
Create your own Vertical Search Engine.
Last updated 2009-06-16 by Anders ArddoE
--
TW -> 曼谷 -> AMS -> PRAHA
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 114.45.234.166
※ ARD33:轉錄至看板 Google 11/24 21:07