Main | What's up at Bentwaters Royal Air Force Base? »

Just what is anyway?

I noticed in my webalizer statistics that has been in the top 10 websites by usage for many months. And that's pretty odd, since those spots are usually held by search engines (usually Google) or proxy caches (usually AOL). Anything not a search engine has to be a little suspicous. Why would an ordinary user hit our site thousands of times in a month, or transfer tens of megabytes of data?

So I fired up my favourite DNS website only to find that is not a website. But the domain is associated with the Telekom Malaysia ISP. They even have their own Wikipedia entry, and it's hardly encouraging. My guess is that I'm simply the target of some web robot running on their network. I'd like to ban their IP, but webalizer only shows the domain name, and has messed up their DNS so that their IPs don't have reverse entries and their IPs all report as their domain. Once I've got the IP I can check my access logs to see what they're up to, and then get rid of them if I want.

I checked and Webalizer keeps a cache of DNS entries so it doesn't have to look up every IP number. Then I found out it's in BerkeleyDB format. This looks like it's going to be messy. From there I found a little Perl program to read the Webalizer cache.

That program needs the perl BerkeleyDB module which you can find on CPAN. Not too bad an install: download, edit one config, a few makes and was working.

Finally I was able to run it: perl | grep Arrrg! 602 entries. Damm, way more than I can check by hand. Ok, now I load up an Access database with by December web log, running into some stupid 2gb limit in Access, reload the data slightly stripped down, plus the list from and join them to find 245 entries matching. Still too many, ok now I summarize by bytes transferred by matching ip and find a couple of IPs that are the top offenders.

But wait, those IPs have only transferred about 1.5mb each. That's not so much. Ok, list all web log entries for those IPs. Each IP only accessed our site on one day, and they each checked out a lot of pages, but not the whole site. Mea culpa. There's nothing wrong with We just have a few hundred people in Malaysia that are accessing our site and their ISP doesn't have a nice DNS setup.

But it's not been a total waste of time. I can now check out the webalizer cache any time I want and I've got more top 10 users to check out.

© 2016 Mike Silversides


This page contains a single entry from the blog posted on January 2, 2007 3:30 PM.

The next post in this blog is What's up at Bentwaters Royal Air Force Base?.

Many more can be found on the main index page or by looking through the archives.