What type of bots are you trying to prevent? Search engines, spiders, scrapers? Checking the host agent is the easiest but it is easy to fake. You can put some JS in your page that makes an outgoing request that tells you they are not a bot. That might block some. But a lot of bots now just use real browsers so that won't work. The next best approach is some kind of heuristics based approach that involves machine learning. That is really hard to determine and implement. It is also very specific to your site and will need a lot of fine tuning. Another simple answer is to just rate limit the number of requests per IP.
|