I've written two lovely scripts to handle this, one that merges two logs, and one that resolves IPs "quickly". The merging one is simple, it just opens N filenames (from ARGV), parses out the date, and sticks the line into an Array. Then I pull out the smallest time and shove the line onto STDOUT.
The input logs must be sorted for it to work, but I awstats can already take care of that stuff (or so it says).
The DNS resolver uses Ruby's resolv.rb and spawns 100 threads to do DNS lookup. Instead of awstats taking 4-5 hours to do DNS lookups, my multithreaded resolver takes about 40 minutes.
The problem with DNS lookups isn't really that its slow to look up names, its that its slow to lookup names that don't exist. Instead of caching hits, I should cache the misses, that way I can avoid looking up failed names over and over.
Cutting down DNS resolution time would make access logs interesting to be processed on an hourly instead of daily basis. Its been an idea of mine to see how traffic for the site changes on a shorter-term basis than monthly, which is what most stats packages give you.