make your own google

看板Linux作者 (runlevel)時間16年前 (2009/11/24 21:06), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串1/1
http://combine.it.lth.se/SearchEngineBox/ SearchEngine in a Box using Combine/Zebra Sprung from development in the EU project ALVIS (IST-1-002068-STP) with the help of .SE:s Internetfond and based on the two systems Combine Focused Crawler and Zebra text indexing and retrieval engine. This system allows you build a vertical search engine for your favorite topic in just 5 easy steps. But before that you have to install the system on your machine. (Or you can try it out online before installing). Installation and testing instructions Edit /etc/apt/sources.list and add deb http://combine.it.lth.se/ debian/ deb http://ftp.indexdata.dk/debian sarge main deb-src http://ftp.indexdata.dk/debian sarge main Get the crawler, indexer and XSLT tools. Run: sudo apt-get update sudo apt-get install combine idzebra2.0 yaz xsltproc Make sure you have combine version 3.4 or better. Download the 'SearchEngine ina Box' system, unpack it, and change to where the software was unpacked. Run tar zxf SEbox.tgz cd SearchEngineBox Initialize crawler for simple test. Run: sudo combineINIT --jobname atest combineCtrl --jobname atest load < seeds.txt Change to the Zebra configuration directory: cd ZebraConf make Combine Tell Zebra where it should run. Edit ZebraConf.xml and change <host>ldbkit06</host> <port>3003</port> to whatever host you are running on and your preferred port Tell the crawler where the indexer is. Edit /etc/combine/atest/combine.cfg and add ZebraHost = <host>:<port> at the end ie for the original ZebraConf.xml it would be ZebraHost = ldbkit06:3003 Generate Zebra configuration. Run make rmConfs make Start the Zebra indexing and database server. Run rm server.log zebrasrv -f yazserver.xml -l server.log & You might consider copying the simple UI to a Web-server (see instructions at the end of the README file in this directory) Test it all by starting the simple test crawling. Run combineCtrl --jobname atest start You should see things happening in the Zebra log ZebraConf/server.log Test searching your new database. Use either or both of these possibilities Use the explain facility of the database directly by opening the URL http://<host>:<port>/ in your XML enabled browser like FireFox (use the host and prot you configured above in the ZebraConf.xml file. Test searching using the simple UI from the ZebraConf directory. Kill the crawler and Zebra server. Run combineCtrl --jobname atest kill kill `cat lock/zebrasrv.pid` Now you are ready to tailor it to your own application: Build a vertical search engine in just 5 easy steps So once the software is installed and tested ... Create a configuration for Zebra - see the ZebraConf directory Configure Combine to the crawl you want. Please refer to Combine Documentation sections 'Configuration' and 'Use Scenarios'. Specifically you have to create a topic-definition (section 'Crawler operation') for your particular topic. Create the crawler sudo combineINIT --jobname atest --topic YourTopicDefFile.txt combineCtrl --jobname atest load < seeds.txt Tell the crawler where the indexer is. Edit /etc/combine/atest/combine.cfg and add ZebraHost = <host>:<port> at the end, where host and port correspond to your Zebra configuration Start Zebra and the crawler zebrasrv -f yazserver.xml -l server.log & combineCtrl --jobname atest start Make your own UI And now it's ready for use, building the database as we speak. Demos Simple demonstrators of Vertical Search Engines are available here. Create your own Vertical Search Engine. Last updated 2009-06-16 by Anders ArddoE -- TW -> 曼谷 -> AMS -> PRAHA -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 114.45.234.166 ARD33:轉錄至看板 Google 11/24 21:07
文章代碼(AID): #1B2zfeyo (Linux)