logwatcher: restart Apache after a segmentation fault

In a previous article, I stressed a common problem with all PHP op-code caches/accelerators: they die with segmentation faults every once in a while.

To get around this problem, here is a script that would restart Apache when a segmentation fault is detected.

This script was written by Firebright Inc., with a few modifications, such as sending an email notice.

Here is the logwatcher.php script:

<?php
// path to apache log file
//
define("DEFAULT_APACHE_LOG_PATH", "/var/log/apache2/error.log");

// command to use to restart apache
//
define("DEFAULT_APACHE_RESTART_COMMAND", "/etc/init.d/apache2 restart");

// defines the polling interval (in seconds)
//
define("DEFAULT_POLLING_INTERVAL", 45);

// defines the format for date outputted in log entries (RFC 2228 format date)
//
define("DATE_FORMAT", "[r]");

// where to log watcher status
//
define("LOG_OUTPUT_FILENAME", "/var/log/logwatcher.log");

// conditions to test for (action is top level array element key)
//
$array_action_checks = Array();
$array_action_checks['restart'] = Array('exit signal Segmentation fault');

// list of commands mapped to actions
//
$array_action_commands = Array('restart' => DEFAULT_APACHE_RESTART_COMMAND);

/************************************************************
* END CONFIGURATION, BEGIN IMPLEMENTATION *
************************************************************/

$last_position = 0;
// main loop
//

if ($argc != 2) {
log_message("Called with incorrect number of arguments");
echo "Usage: php logwatcher.php youremail@example.com\n";
exit(1);
}
else {
$email = $argv[1];
}


log_message("logwatcher started");

while (true) {
$last_position = check_file($last_position);
sleep(DEFAULT_POLLING_INTERVAL);
}

function check_file($last_position) {
$file_name = DEFAULT_APACHE_LOG_PATH;
$fp = @fopen($file_name, "r");
if ($fp == null) {
die("unable to open file at $file_name\n");
}
if ($last_position == 0) {
// first time through the file for this instance.. Skip to EOF
//
fseek($fp, 0, SEEK_END);
} else {
// seek to last known position to skip past already handled log entries
//
fseek($fp, $last_position, SEEK_SET);
}

// check for patterns on current line
//
$action_taken = false;
while (($line = fgets($fp, 4096)) != null) {
$action = check_line($line);
if ($action != "") {
// TODO: log that action is taken
//
// take action only once for a given seek, otherwise seek silently to EOF
//
if (!$action_taken) {
log_message("Apache APC/eAccelerator caused a segmentation fault.");
log_message("Executing: " . get_action_command($action));
system(get_action_command($action));
log_message("Executed: " . get_action_command($action));
email_notify();
log_message("Email notification sent");
$action_taken = true;
}
}
}
// record end of file position for next pass through
//
$last_position = ftell($fp);

// close the file pointer
//
fclose($fp);
return $last_position;
}

function log_message($message) {
error_log(date(DATE_FORMAT) . " " . $message . "\n", 3, LOG_OUTPUT_FILENAME);
}

function check_line($line) {
global $array_action_checks;
// walk through each action
//
foreach ($array_action_checks as $action => $array_checks) {
foreach ($array_checks as $check) {
// walk through each check and see if it matches the current line
//
if (preg_match("/" . $check . "/", $line)) {
return $action;
}
}
}
return "";
}
function get_action_command($action) {
global $array_action_commands;
$command = @$array_action_commands[$action];
if ($command == null) {
log_message("Could not retrieve command for action: $action");
return "";
}
return $command;
}


function email_notify() {
$body = "The server has encountered an APC/eAccelerator segmentation fault error.
Apache has been automatically restarted.
The log file " . LOG_OUTPUT_FILENAME . " should have the exact time and number
that this happened.";

mail($email, 'Apache has been restarted', $body);
}

And here is the logwatcher.sh shell script that is used to start it. Change the email addresses to fit your needs.

#!/bin/sh

BASE_DIR=/root/bin
SCRIPT=$BASE_DIR/logwatcher.php
PID_FILE=/var/run/logwatcher.pid
EMAIL=someone@example.com,someoneelse@example.com

# If there is an old process, kill it
kill `cat $PID_FILE`
# Make sure the file is clean
rm -f $PID_FILE

cd $BASE_DIR
nohup php $SCRIPT $EMAIL> /dev/null &
PID=$!

echo $PID > $PID_FILE

Now, all you need to do is edit your /etc/rc.local, and add a line to call the logwatcher.sh script upon booting.

Resources and Links

Contents: 

Comments

semaphores

Good article. Often however the semaphores are locked. The only way to free them is by doing something like

ipcs -s | grep apache | perl -e 'while () { @a=split(/\s+/); print `ipcrm sem $a[1]`}'

A restart resets the sems

Shouldn't a apache restart reset the semaphores altogether?

Also, in your script, the grep apache part is distro dependent. On Ubuntu, it is www-data not apache.
--
2bits -- Drupal consulting

logwatcher not restarting after a few days

Hi there,

Appreciate this fine scrip. It's working fine but on some of my webservers the script suddently dosent restart apache. The script is running but not restarting apache. If I kill the script and launch it again apache is restartet as it should. What could be wrong? how do i debug what the script is doing? (or not)

Kind regards,

Daniel

Great article

Thank you for notifying me about logwatcher.php. I had created a slightly different solution using bash scripting to figure out if seg faults were occurring due to APC. As an additional solution for users using APC, I found that clearing the cache also works so I created a script that basically calls apc_clear_cache) which generally resulted in minimal downtime.

I'm considering taking advantage of the way logwatcher works with what can be done via apc to get the best of both worlds (my script to figure out the errors could be better ^_^)

Downtime is inevitable

The logwatcher works the way it is with any op-code cache (APC and eAccelerator at least).

I am thinking of modifying it so that it detects the op-code cache type, and call the cache clear function (like you do with APC).

For example:

$array_function_list = array(
'eaccelerator_clear' => 'eAccelerator',
'apc_clear_cache' => 'APC',
);

foreach($function_list as $function => $description) {
if (function_exists($function)) {
print("$description caused a segmentation fault.");
$function();
print("Called function $function to reset.");
}
}

However, the downtime is inescapable, unless we keep reading the logs every second or 5 seconds, which is excessive.

You may want to share the code here, or a link to it when it is done.

-- 2bits -- Drupal consulting

You are absolutely right.

You have a perfectly valid point that the downtime is inescapable for any of the sites on the server that may take advantage of the op-code cache. The main advantage I saw to clearing the cache was that you didn't necessarily have to bring the server down for a restart (thus any other sites that don't rely on the op-code cache would not be affected).

Also you have a valid point when you say the logwatcher doesn't have to depend on anything else based on the way it is.

Regardless, once I have what I need working with the logwatcher, I'll post it up for you :D

A small bug

logwatcher.php doesn't actually send out any emails because $email is only a locally scoped variable in email_notify(). We need to either pass in $email as a parameter or, more easily, declare $email as a global in email_notify().

Other than that, the script works well. Thanks!

Please can you show how you

Please can you show how you solved this email-problem?
Also when a segmentation fault happens Apache is not restartet at all although the logwatcher.log says so.

regards

Andreas

Solving email not sending

I believe to solve the email not sending you need to add $email as a global in the email_notify function of logwatcher.php.

Thus the full (corrected) function would look like this:

function email_notify() {
global $email;
$body = "The server has encountered an APC/eAccelerator segmentation fault error.
Apache has been automatically restarted.
The log file " . LOG_OUTPUT_FILENAME . " should have the exact time and number
that this happened.";

mail($email, 'Apache has been restarted', $body);
}
?>

debian init script

Thanks for this article, Khalid- I've finally had occasion to put logwatcher.php into action, and found your notes very helpful to get everything working.

As mentioned, I did need to make a few tweaks to the php to get email notifications working (went with the local argument passing method, rather than globalizing $email), and an additional regex in the $array_action_checks[] array to suit the particular APC segfault error we were seeing in the logs.

However, the main thing I added that I thought might be of value was a Debian-style init.d script, based on the standard 'skeleton' and stealing all the functionality from your logwatcher.sh, above. Here's the code:


#!/bin/sh

set -e

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DESC="logwatcher"
NAME=logwatcher
DAEMON=/root/bin/logwatcher.php
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME

# Gracefully exit if the package has been removed.
test -r $DAEMON || exit 0

BASE_DIR=/root/bin
SCRIPT=$BASE_DIR/logwatcher.php
PID_FILE=/var/run/logwatcher.pid
EMAIL=email@example.com

case "$1" in
start)
echo -n "Starting $DESC: $NAME"
if [ -r $PID_FILE ]
then
kill `cat $PID_FILE`
rm -f $PID_FILE
fi
cd $BASE_DIR
nohup php $SCRIPT $EMAIL > /dev/null &
PID=$!
echo $PID > $PID_FILE
echo "."
;;
stop)
echo -n "Stopping $DESC: $NAME"
if [ -r $PID_FILE ]
then
kill `cat $PID_FILE`
rm -f $PID_FILE
fi
echo "."
;;
restart|force-reload)
echo -n "Restarting $DESC: $NAME"
if [ -r $PID_FILE ]
then
kill `cat $PID_FILE`
rm -f $PID_FILE
fi
cd $BASE_DIR
nohup php $SCRIPT $EMAIL > /dev/null &
PID=$!
echo $PID > $PID_FILE
echo "."
;;
*)
echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
exit 1
;;
esac

exit 0

Drop this into /etc/init.d/logwatcher, chmod +x, and run 'update-rc.d logwatcher defaults' to enable this on your next boot. Then run /etc/init.d/logwatcher to start it up without booting :)

Great script Just one

Great script

Just one question. If I want to check for other strings, is this the correct way to add to the array?

$array_action_checks['restart'] = Array('exit signal Segmentation fault','still did not exit','Out of memory: kill process','php invoked oom-killer');

Testing

Is there any way to simulate a seg fault or otherwise test that this script is working properly? I believe I have everything in place but don't know how to be sure.

Thanks for the script. This will be really useful for our site.

Try this

One way to simulate it is to call the url: "www.example.com/exit signal Segmentation fault"

That works for us.

Best
/Johs.

I am replying to a old post

I am replying to a old post but this shows how easily to force the Apache server to restart with this script. It can be used as DOS attack by keep sending the crafted URL request, "www.example.com/exit signal Segmentation fault", to the server.

This script should not be used unless there is a way to rectify this security hole.

The best way to simulate

The best way to simulate this script would be to fetch one of apache's child PIDs and perform:

kill -s SIGSEGV [PID]

Logrotation

Thanks for a very good script! However after installing logrotation, the script stoped working. We figured out that it was because the filepointer was pointing out of the file after the log had been rotated.
Therefore we changed the line:


if ($last_position == 0) {

to:


if ($last_position == 0 || ($last_position > filesize($file_name)))

Best,
/Johs.

Simplified version ...

I'm a minimalist by nature so here is my version of the script:

- Our error log files can get big (500 Megs) so opening the file and reading each line wasn't efficient.
- By using tail to read the last line in the log file it makes it easy.
- Configure your notifications as needed
- Configure your method for restarting apache

Crontab entry:
* * * * * /usr/bin/php /path/to/the/script/log_check.php > /dev/null

$rst = exec("tail -n 1 /var/log/httpd/error_log");

if (preg_match("/exit signal Segmentation fault/", $rst) == 1)
{
#print "APACHE NEEDED TO BE RESTARTED";
exec("service httpd restart"); # this is what ever you use to restart apache #
#print "APACHE RESTARTED";
mail('your@email.com', 'SERVERNAME - Seg Fault Restart','FYI');
} else {
#print "ALL COOL";
}

?>

Pure bash version

Why start a heavy PHP interpreter, if you can do it in bash with a few lines :-)

You could add several logs to be followed, or use a complex regex (I'm using egrep here).

---CUT---
#!/bin/bash

errLog="/var/log/apache2/error.log"
apacheRestart="/etc/init.d/apache2 restart"
failRegex="exit signal Segmentation fault"
scriptLog="/data/logs/http/`basename $0`.log"

while true; do
echo "`date` (re)starting control loop"
tail --follow=name --retry -n 0 "$errLog" 2>/dev/null | while read logLine; do
if [[ `echo "$logLine" | egrep "$failRegex"` ]]; then
echo "`date` Segfault detected, restarting apache"
$apacheRestart
break
fi
done
sleep 5
done >> "$scriptLog" 2>&1 &
---CUT---

Right on, the bash script is the way to go

I installed it to send me an e-mail on failure, works like a charm! For Centos use httpd, otherwise identical:

#!/bin/bash

errLog="/var/log/httpd/error_log"
apacheRestart="/etc/init.d/httpd restart"
failRegex="exit signal Segmentation fault"
scriptLog="/data/logs/http/`basename $0`.log"

while true; do
echo "`date` (re)starting control loop"
tail --follow=name --retry -n 0 "$errLog" 2>/dev/null | while read logLine; do
if [[ `echo "$logLine" | egrep "$failRegex"` ]]; then
echo "`date` Segfault detected, restarting apache"| mail -s "Apache Restarted Notification." somemail@someone.com
$apacheRestart
break
fi
done
sleep 5
done

How about MON?

Great article! Was reading up about the performance comparison between APC, eAccelerator, and XCache.

How about mon? I know that it can't beat a bash/shell script with a tail command, but it can help monitor not just segfaults.

http://www.kernel.org/software/mon/faq.html

More info about it
http://www.debianhelp.co.uk/mon.htm