| search engine | robot name | robot info page |
| Alexa | ia_archiver | http://www.alexa.com/help/webmasters/index.html |
| alltheweb | fastwebcrawler | http://www.fastsearch.com/support/crawler.asp |
| Alta Vista | Scooter-3.0.FS | none |
| Commission Junction | CJNetworkQuality | http://cj.com/networkquality/ |
| CyberAlert | Webinator-indexer.cyberalert.com/2.56 | http://cyberalert.com |
| Digital Integrity | DIIbot/1.2 | http://www.findsame.com/robot.html |
| dir.com | Pompos/1.3 | http://dir.com/pompos.html |
| Direct Hit | Mozilla/2.0 | none |
| Domanova | Jack | http://www.domanova.co.uk/faq.html |
| Excite | ArchitextSpider | none |
| free.fr | Pompos/1.3 | http://dir.com/pompos.html |
| GAIS | Openbot/3.0 | http://gais.cs.ccu.edu.tw/robot.php |
| Girafabot | girafabot | http://www.girafa.com/index.acr?c=10 |
| GoGettem | Webinator-gogettem.nfis.com/2.52 | none |
| Googlebot/2.1 | http://www.googlebot.com/bot.html | |
| MSN HotBot NBCi | Slurp/si | http://www.inktomi.com/slurp.html |
| Links2Go | Links2Go Similarity Engine | http://www.l2g.com/ |
| Lycos | Lycos_Spider_(T-Rex) | none |
| NameProtect | NPBot | http://www.nameprotect.com/botinfo.html ignores robots.txt |
| National Directory | NationalDirectory-WebSpider/1.3 | none |
| Northern Light Search | Gulliver/1.3 | none |
| Openfind | Openbot/3.0+ | http://www.openfind.com.tw/robot.html |
| Picsearch | psbot/0.1 | http://www.picsearch.com/menu.cgi?item=Psbot |
| Planet Internet | appie/1.1 | none |
| PolySearch | polybot 1.0 | http://cis.poly.edu/polybot/ |
| Teoma | teomaagent | none |
| wisenut | ZyBorg/1.0 | http://www.wisenutbot.com |
| WebTop | MuscatFerret/2.0 | none |
| (research.att.com) | tivraSpider/1.0 | none |
| (cyveillance.com) | MSIE 4.01 | http://cyveillance.com/response1.html ignores robots.txt IP address 63.148.99.233 |
| (almaden.ibm.com) | WFARC | http://www.almaden.ibm.com/cs/crawler/ |
| (mogulsports.com) | Wget/1.6 | none |
| (watcher.lan.onvoy.com) | none | note: uses http/1.1 but sends wrong host name for twiki |
| (www.seventwentyfour.com) | LinkWalker | none |
| (www.xyleme.com) | cosmos/0.8 | none |
| ? | JennyBot/0.1 | none |
| ? | Bjaaland/0.9 | http://antarcti.ca |
For info on robot exclusion (robots.txt) files, see http://www.robotstxt.org/wc/robots.html
Here are some sites which try to track various crawlers:
| Topic CageyCrawlers . { Edit | Ref-By | Attach | Diffs | r1.18 | > | r1.17 | > | r1.16 | > | r1.15 | >... } |
|
Revision r1.18 - 02 Aug 2003 - 04:16 by EliMantel Privacy Policy |
Copyright © 2000-2005 by the contributing authors.
All material on this collaboration tool is the property of the contributing authors. Collect email addresses here. Ideas, requests, problems regarding TWiki? Send feedback. |