I am working on setting up a proper handling of logs from many hosts.
I want unexpected log entries from any host, to be noticed immediately (via an alert in Nagios) and I want the alert to go away, as soon as the log entry has been handled as needed (added to ignore filter, issue fixed, task assigned for later fixing etc.).
As far as I can gather, most use the typical nagios check_logs and other methods, which all alert for only a certain period of time, about a given logentry, and then just goes green again. If you actually fix the alert while it's there, you have to disable the logfile alert, until it goes green again, and in this time interval, any new logentries won't be notice. Suboptimal at best :(
I've set up syslog-ng as a central syslog host - and wanted something to check the logs for unexpected entries, using the common Alert on everything - except what I define in my ignore filter.
As it turns out, I couldn't find any thing that properly does this.
The script I found to be the closest at doing the job well, was swatch, but for some odd reason it can only do one of 2 things. Scan an entire file,on each run or run as a daemon tailing a file. Neither are IMHO satisfactory.
Me and a friend (Casper) added a small addition to swatch so that it saves the position ($Fh->getpos) when it has parsed a logfile, and continues from the last position, if it is asked to scan the same file again. This works beautifully, when using syslog-ng, setup to name the logfiles $hostname/$faicility.$year$month$day - so I never have to rotate logfiles :)
You can download the modified swatch if you want here: http://vsen.dk/files/swatchpos
With this modification, I can run swatch from cron as often as I want, and in this setup, I am going to make it email a certain queue in Request Tracker (RT), and have Nagios alert on unhandled issues in that queue.
This way, when an issue is closed, it is also saved in RT what has been done about it, and who closed it.