Apache

Identifying aggressive crawlers using Go Access

Aggressive crawlers that hit your web site a lot can cause performance problems.

There are many ways to identify aggressive crawlers, including writing custom scripts that analyze your web server logs.

One tool that we found to be useful in analyzing which crawlers hit the site the most today or yesterday is Go Access.

Getting Go Access

Go Access is available for Ubuntu Natty Narwahl (11.04) only, but not earlier LTS releases.

Reducing the size and I/O load of Apache's web server log files

Apache, and all other web servers, have a mechanism to write an "access log" recording every HTTP access to the server. The information that is logged is valuable, and includes things like the IP address of the user making the request, the date and time, the size of the request in bytes, the return code from the HTTP protocol, the request's URI, the referer, and the browser/operating system that the user is using.