Web Indexing
C & C++ » Scripts and Programs » Searching » Web Indexing
A D V E R T I S E M E N T
Today's Special: Get free Magazine from SAP!
harvest - FreeHarvest is a system to collect information and make them searchable using a user friendly web interface. Harvest can collect information on inter- and intranet using http, ftp, nntp as well as local files like data on harddisk, CDROM and files on file servers. Current list of supported formats in addition to HTML include
dvi, ps, fulltext, mail, man pages, news, troff, WordPerfect, C sources and many more. webbase - Freewebbase is an internet web crawler written in C and later ported to C++. It uses a MySQL database to store information about crawled URLs. It is available as a command line program or as a library (shared or static). It has two main functions: crawl the WEB to get documents and build a full text database with these documents. The crawler part visits the documents and stores intersting information about them locally. It visits the document on a regular basis to make sure that it is still there and updates it if it changes. The full text database uses the local copies of the document to build a searchable index. The full text indexing functions are not included in webbase. Larbin - FreeLarbin is a web crawler (also called (web) robot, spider, scooter, etc). It is intended to fetch a large number of web pages to fill the database of a search engine. With
a network fast enough, Larbin should be able to fetch more than 100 millions pages on a standard PC.
|
A D V E R T I S E M E N T
|
Subscribe to SourceCodesWorld - Techies Talk |
|