TWiki . Main . CageyCrawlers TWiki . { Main | Edumacation | TWiki | Test }
Main . { Home | Users | Changes | Search | Go }
Search engines that crawl the Cagey Consumer web site:

search engine robot name robot info page
Alexa ia_archiver http://www.alexa.com/help/webmasters/index.html
alltheweb fastwebcrawler http://www.fastsearch.com/support/crawler.asp
Alta Vista Scooter-3.0.FS none
Commission Junction CJNetworkQuality http://cj.com/networkquality/
CyberAlert Webinator-indexer.cyberalert.com/2.56 http://cyberalert.com
Digital Integrity DIIbot/1.2 http://www.findsame.com/robot.html
dir.com Pompos/1.3 http://dir.com/pompos.html
Direct Hit Mozilla/2.0 none
Domanova Jack http://www.domanova.co.uk/faq.html
Excite ArchitextSpider none
free.fr Pompos/1.3 http://dir.com/pompos.html
GAIS Openbot/3.0 http://gais.cs.ccu.edu.tw/robot.php
Girafabot girafabot http://www.girafa.com/index.acr?c=10
GoGettem Webinator-gogettem.nfis.com/2.52 none
google Googlebot/2.1 http://www.googlebot.com/bot.html
MSN
HotBot
NBCi
Slurp/si http://www.inktomi.com/slurp.html
Links2Go Links2Go Similarity Engine http://www.l2g.com/
Lycos Lycos_Spider_(T-Rex) none
NameProtect NPBot http://www.nameprotect.com/botinfo.html
ignores robots.txt
National Directory NationalDirectory-WebSpider/1.3 none
Northern Light Search Gulliver/1.3 none
Openfind Openbot/3.0+ http://www.openfind.com.tw/robot.html
Picsearch psbot/0.1 http://www.picsearch.com/menu.cgi?item=Psbot
Planet Internet appie/1.1 none
PolySearch polybot 1.0 http://cis.poly.edu/polybot/
Teoma teomaagent none
wisenut ZyBorg/1.0 http://www.wisenutbot.com
WebTop MuscatFerret/2.0 none
(research.att.com) tivraSpider/1.0 none
(cyveillance.com) MSIE 4.01 http://cyveillance.com/response1.html
ignores robots.txt
IP address 63.148.99.233
(almaden.ibm.com) WFARC http://www.almaden.ibm.com/cs/crawler/
(mogulsports.com) Wget/1.6 none
(watcher.lan.onvoy.com) none note: uses http/1.1 but sends wrong host name for twiki
(www.seventwentyfour.com) LinkWalker none
(www.xyleme.com) cosmos/0.8 none
? JennyBot/0.1 none
? Bjaaland/0.9 http://antarcti.ca

For info on robot exclusion (robots.txt) files, see http://www.robotstxt.org/wc/robots.html
Here are some sites which try to track various crawlers:


Topic CageyCrawlers . { Edit | Ref-By | Attach | Diffs | r1.18 | > | r1.17 | > | r1.16 | > | r1.15 | >... }
You must register before editing pages or using other extended features on this TWiki system.
Revision r1.18 - 02 Aug 2003 - 04:16 by EliMantel web search for EliMantel
Privacy Policy
Copyright © 2000-2005 by the contributing authors. All material on this collaboration tool is the property of the contributing authors. Collect email addresses here.
Ideas, requests, problems regarding TWiki? Send feedback.