Draconis Software Blog

How to Deal with Log Overload

Even a relatively inactive server can produce quite a large amount of logs, and a high use machine serving multiple functions will probably have so much log activity that it becomes impossible to discover anomalies, resource problems, or potential attacks. Luckily, this copious amount of data can be made manageable without too much trouble.

Consolidate logs across the network

If you have more than one machine to keep track of (and what sysadmin doesn’t) the log problem only multiplies. One thing that can help in this situation is remote logging. For example, syslog makes it easy to send logs over the network to a single source. In large networks this may be a dedicated logging machine. In any case, this at least somewhat reduces the complexity of the situation, although does not help with the sheer amount of information produced.

I’ll usually leave a window open with a connection to the central logging machine with a running output of logs (this can be done with `tail -f’). This helps me catch some issues as they occur, but it’s easy to miss entries if I’m not at the computer or not paying attention to the running log.

Log parsing with epylog

A useful tool to help with this is a log parser. This is a program that periodically analyzes activity since the last time it ran, and emails a summary to the administrator. In my mind the perfect log parser would primarily show me anything unusual happening on my machines, and then secondarily would provide summaries of normal functions like SSH, mail, web and database usage.

I haven’t found this perfect log parser yet, but I’ve been very happy with epylog. Epylog is a log parser for syslog written in Python. Syslog is usually the default for most application logging (or else can easily be turned on), although certain logs, such as Apache logs, are usually not appropriate for syslog.

In any case, I find epylog easy to setup and configure. For many users, simply installing it and adding a cron entry would be sufficient. One thing I like about the emails it sends is that they tends to have more important information first. For example, the first section in the email is Logins, given in the order of Root Failures, Root Logins, User Failures, and User Logins. Following this are reports that are only applicable for certain types of servers, such as firewall violations and mail usage. The bottom of the email contains strings that didn’t match any filters. This is often a good place to look for anything unusual.

Some tips for using epylog

Here I’d like to put a few tips that I’ve discovered while using epylog. First, one oddity is that epylog likes to use /usr/etc and /usr/var rather than just /etc and /var. This isn’t usually a problem, although you may need to create /usr/var/run for epylog to put its pidfile.

The first change you’ll likely want to make is to add filters for log entries you don’t care about. Often other programs you install will produce logs that epylog doesn’t recognize, and that you don’t want to see in the email. Adding these filters reduces the number of lines in the Unparsed Strings section. To do this, edit /usr/etc/epylog/weed_local.cf. Here you can place regular expressions for lines to ignore, or for lines that you don’t want epylog to ignore.

Modules are configured in /usr/etc/epylog/modules.d. Note that some are disabled by default. Also, you may need to edit the modules’ config files in order to turn on particular checks. For example, it took me a while to discover that for epylog to analyze the logs form courier-imap, I needed to enable the login module, then specifically enable courier checks in login.conf.

My final piece of advice is not epylog-specific. When parsing your logs, it may be tempting to do frequent checks, such as every hour. However you may find as time goes by that these emails get tiresome, so you mark them read without actually looking them over. However by doing so you may be missing important information. In this case I find it more useful to reduce the frequency of the log parsing. In my case I find 4 emails per day (every 6 hours) to be best. While it may take you longer to discover a problem, reducing the number of emails might help you to read them more carefully, preventing you from missing an issue altogether.

Share and Enjoy:
  • Digg
  • Reddit
  • Ma.gnolia
  • del.icio.us
  • BlogMemes
  • Facebook
  • Mixx
  • Furl
  • Google
  • TwitThis
  • Spurl
  • LinkedIn
  • Propeller
  • E-mail this story to a friend!