Eric Hodel (drbrain) wrote,
Eric Hodel

awstats sucks at IP resolution

So, the biggest problem we have with awstats is IP address resolution. awstats has a script that merges and (allegedly) resolves IP addresses, but it appears it does this in a single process, so any IP address that blocks takes forever.

I've written two lovely scripts to handle this, one that merges two logs, and one that resolves IPs "quickly". The merging one is simple, it just opens N filenames (from ARGV), parses out the date, and sticks the line into an Array. Then I pull out the smallest time and shove the line onto STDOUT.

The input logs must be sorted for it to work, but I awstats can already take care of that stuff (or so it says).

The DNS resolver uses Ruby's resolv.rb and spawns 100 threads to do DNS lookup. Instead of awstats taking 4-5 hours to do DNS lookups, my multithreaded resolver takes about 40 minutes.

The problem with DNS lookups isn't really that its slow to look up names, its that its slow to lookup names that don't exist. Instead of caching hits, I should cache the misses, that way I can avoid looking up failed names over and over.

Cutting down DNS resolution time would make access logs interesting to be processed on an hourly instead of daily basis. Its been an idea of mine to see how traffic for the site changes on a shorter-term basis than monthly, which is what most stats packages give you.
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.