top of page

Experibot

An experimental web crawler. Important reader key items:

  • "I don't know who you are - stop crawling my domain".

​      ​ I have no intention of downloading pages against your will - 

Just send me an email to amirkr@gmail.com and I will promptly add your domains/IPs to the list of "no crawl". They will be immediately ignored.

  • What is this bot? Who are you?

It is an experimental web crawling bot designed by me (Amir Krause), currently as a personal research project. The collected data is not being published or externally used.

Several key measures were taken to make sure the crawler behaves politely:

  1. I did my best to adhere to robots.txt directives, especially disallows. It is possible that coding mistakes exist in the implementation (even though the code has been checked and verified many, many times). If you see my bot disregarding your robots.txt -  I am sorry - it is not on purpose, and please let me know.

  2. The crawler is programmed to not download anything from robots.txt which contained the word "experibot", in all combinations of upper-lower cases (regardless of the crawler's version). I will not take chances here.

  3. The crawler will never crawl the same host, or the same IP, twice within 60 seconds (a generous "crawl-delay").

  4. After about 12 hours, each cached robots.txt file will become obsolete and re-downloaded (the new download will take place only after a page from that website has been asked, so it could take longer than 12 hours before the robots.txt file is re-requested). If you made changes to your robots.txt file, they will affect my crawler in the following ways:

    • If you specifically disallowed my bot ("experibot"), the disallow will occur immediately when re-fetching.

    • If the change did not involve my bot specifically, it will take up to two days before it affects the crawler.

Thanks!

Amir Krause

amirkr@gmail.com

bottom of page