View Single Post
Old 06-03-2015, 09:39 PM  
brandonstills
Confirmed User
 
brandonstills's Avatar
 
Join Date: Dec 2007
Location: Chatsworth, CA
Posts: 1,964
Quote:
Originally Posted by PornoPlopedia View Post
Brandonstills thanks for your smart answer.
A heuristic approach is exactly what I had in mind.

An algorithm to which you can feed a site's average KPIs to be used as the control group. Ideally set them as a range MIN MAX.
That could work. The difference between a bot and a human would be that a bot would probably go through things faster. It might also go through the navigation slightly differently.

Are you trying to protect against people scraping from the site or just saving bandwidth? Even with IP rate limiting though, if they really wanted to they could just spin up 1000 different instances on Amazon Web Services and it would come from 1000 different IPs.

It all depends on how badly they want to crawl your site.

What's the nature of what you are protecting and what kind of threat are you trying to block? I might be able to give you a better answer if there is a more concrete example.
brandonstills is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote