Contents

logwatcher: restart Apache after a segmentation fault

In a previous article, I stressed a common problem with all PHP op-code caches/accelerators: they die with segmentation faults every once in a while.

To get around this problem, here is a script that would restart Apache when a segmentation fault is detected.

This script was written by Firebright Inc., with a few modifications, such as sending an email notice.

Here is the logwatcher.php script:

<?php
// path to apache log file
//
define("DEFAULT_APACHE_LOG_PATH", "/var/log/apache2/error.log");

// command to use to restart apache
//
define("DEFAULT_APACHE_RESTART_COMMAND", "/etc/init.d/apache2 restart");

// defines the polling interval (in seconds)
//
define("DEFAULT_POLLING_INTERVAL", 45);

// defines the format for date outputted in log entries (RFC 2228 format date)
//
define("DATE_FORMAT", "[r]");

// where to log watcher status
//
define("LOG_OUTPUT_FILENAME", "/var/log/logwatcher.log");

// conditions to test for (action is top level array element key)
//
$array_action_checks = Array();
$array_action_checks['restart'] = Array('exit signal Segmentation fault');

// list of commands mapped to actions
//
$array_action_commands = Array('restart' => DEFAULT_APACHE_RESTART_COMMAND);

/************************************************************
* END CONFIGURATION, BEGIN IMPLEMENTATION *
************************************************************/

$last_position = 0;
// main loop
//

if ($argc != 2) {
log_message("Called with incorrect number of arguments");
echo "Usage: php logwatcher.php youremail@example.com\n";
exit(1);
}
else {
$email = $argv[1];
}


log_message("logwatcher started");

while (true) {
$last_position = check_file($last_position);
sleep(DEFAULT_POLLING_INTERVAL);
}

function check_file($last_position) {
$file_name = DEFAULT_APACHE_LOG_PATH;
$fp = @fopen($file_name, "r");
if ($fp == null) {
die("unable to open file at $file_name\n");
}
if ($last_position == 0) {
// first time through the file for this instance.. Skip to EOF
//
fseek($fp, 0, SEEK_END);
} else {
// seek to last known position to skip past already handled log entries
//
fseek($fp, $last_position, SEEK_SET);
}

// check for patterns on current line
//
$action_taken = false;
while (($line = fgets($fp, 4096)) != null) {
$action = check_line($line);
if ($action != "") {
// TODO: log that action is taken
//
// take action only once for a given seek, otherwise seek silently to EOF
//
if (!$action_taken) {
log_message("Apache APC/eAccelerator caused a segmentation fault.");
log_message("Executing: " . get_action_command($action));
system(get_action_command($action));
log_message("Executed: " . get_action_command($action));
email_notify();
log_message("Email notification sent");
$action_taken = true;
}
}
}
// record end of file position for next pass through
//
$last_position = ftell($fp);

// close the file pointer
//
fclose($fp);
return $last_position;
}

function log_message($message) {
error_log(date(DATE_FORMAT) . " " . $message . "\n", 3, LOG_OUTPUT_FILENAME);
}

function check_line($line) {
global $array_action_checks;
// walk through each action
//
foreach ($array_action_checks as $action => $array_checks) {
foreach ($array_checks as $check) {
// walk through each check and see if it matches the current line
//
if (preg_match("/" . $check . "/", $line)) {
return $action;
}
}
}
return "";
}
function get_action_command($action) {
global $array_action_commands;
$command = @$array_action_commands[$action];
if ($command == null) {
log_message("Could not retrieve command for action: $action");
return "";
}
return $command;
}


function email_notify() {
$body = "The server has encountered an APC/eAccelerator segmentation fault error.
Apache has been automatically restarted.
The log file " . LOG_OUTPUT_FILENAME . " should have the exact time and number
that this happened.";

mail($email, 'Apache has been restarted', $body);
}

And here is the logwatcher.sh shell script that is used to start it. Change the email addresses to fit your needs.

#!/bin/sh

BASE_DIR=/root/bin
SCRIPT=$BASE_DIR/logwatcher.php
PID_FILE=/var/run/logwatcher.pid
EMAIL=someone@example.com,someoneelse@example.com

# If there is an old process, kill it
kill `cat $PID_FILE`
# Make sure the file is clean
rm -f $PID_FILE

cd $BASE_DIR
nohup php $SCRIPT $EMAIL> /dev/null &
PID=$!

echo $PID > $PID_FILE

Now, all you need to do is edit your /etc/rc.local, and add a line to call the logwatcher.sh script upon booting.

Resources and Links

Simplified version ...

I'm a minimalist by nature so here is my version of the script:

- Our error log files can get big (500 Megs) so opening the file and reading each line wasn't efficient.
- By using tail to read the last line in the log file it makes it easy.
- Configure your notifications as needed
- Configure your method for restarting apache

Crontab entry:
* * * * * /usr/bin/php /path/to/the/script/log_check.php > /dev/null

$rst = exec("tail -n 1 /var/log/httpd/error_log");

if (preg_match("/exit signal Segmentation fault/", $rst) == 1)
{
#print "APACHE NEEDED TO BE RESTARTED";
exec("service httpd restart"); # this is what ever you use to restart apache #
#print "APACHE RESTARTED";
mail('your@email.com', 'SERVERNAME - Seg Fault Restart','FYI');
} else {
#print "ALL COOL";
}

?>

Logrotation

Thanks for a very good script! However after installing logrotation, the script stoped working. We figured out that it was because the filepointer was pointing out of the file after the log had been rotated.
Therefore we changed the line:

if ($last_position == 0) {

to:

if ($last_position == 0 || ($last_position > filesize($file_name)))

Best,
/Johs.

Testing

Is there any way to simulate a seg fault or otherwise test that this script is working properly? I believe I have everything in place but don't know how to be sure.

Thanks for the script. This will be really useful for our site.

The best way to simulate

The best way to simulate this script would be to fetch one of apache's child PIDs and perform:

kill -s SIGSEGV [PID]

Try this

One way to simulate it is to call the url: "www.example.com/exit signal Segmentation fault"

That works for us.

Best
/Johs.

debian init script

Thanks for this article, Khalid- I've finally had occasion to put logwatcher.php into action, and found your notes very helpful to get everything working.

As mentioned, I did need to make a few tweaks to the php to get email notifications working (went with the local argument passing method, rather than globalizing $email), and an additional regex in the $array_action_checks[] array to suit the particular APC segfault error we were seeing in the logs.

However, the main thing I added that I thought might be of value was a Debian-style init.d script, based on the standard 'skeleton' and stealing all the functionality from your logwatcher.sh, above. Here's the code:

#!/bin/sh

set -e

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DESC="logwatcher"
NAME=logwatcher
DAEMON=/root/bin/logwatcher.php
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME

# Gracefully exit if the package has been removed.
test -r $DAEMON || exit 0

BASE_DIR=/root/bin
SCRIPT=$BASE_DIR/logwatcher.php
PID_FILE=/var/run/logwatcher.pid
EMAIL=email@example.com

case "$1" in
  start)
        echo -n "Starting $DESC: $NAME"
        if [ -r $PID_FILE ]
        then
          kill `cat $PID_FILE`
          rm -f $PID_FILE
        fi
        cd $BASE_DIR
        nohup php $SCRIPT $EMAIL > /dev/null &
        PID=$!
        echo $PID > $PID_FILE
        echo "."
        ;;
  stop)
        echo -n "Stopping $DESC: $NAME"
        if [ -r $PID_FILE ]
        then
          kill `cat $PID_FILE`
          rm -f $PID_FILE
        fi
        echo "."
        ;;
  restart|force-reload)
        echo -n "Restarting $DESC: $NAME"
        if [ -r $PID_FILE ]
        then
          kill `cat $PID_FILE`
          rm -f $PID_FILE
        fi
        cd $BASE_DIR
        nohup php $SCRIPT $EMAIL > /dev/null &
        PID=$!
        echo $PID > $PID_FILE
        echo "."
        ;;
  *)
        echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
        exit 1
        ;;
esac

exit 0

Drop this into /etc/init.d/logwatcher, chmod +x, and run 'update-rc.d logwatcher defaults' to enable this on your next boot. Then run /etc/init.d/logwatcher to start it up without booting :)

A small bug

logwatcher.php doesn't actually send out any emails because $email is only a locally scoped variable in email_notify(). We need to either pass in $email as a parameter or, more easily, declare $email as a global in email_notify().

Other than that, the script works well. Thanks!

Please can you show how you

Please can you show how you solved this email-problem?
Also when a segmentation fault happens Apache is not restartet at all although the logwatcher.log says so.

regards

Andreas

Solving email not sending

I believe to solve the email not sending you need to add $email as a global in the email_notify function of logwatcher.php.

Thus the full (corrected) function would look like this:

<?php
function email_notify() {
        global
$email;
       
$body = "The server has encountered an APC/eAccelerator segmentation fault error.
        Apache has been automatically restarted.
        The log file "
LOG_OUTPUT_FILENAME . " should have the exact time and number
        that this happened."
;

        mail($email, 'Apache has been restarted', $body);
}
?>

Great article

Thank you for notifying me about logwatcher.php. I had created a slightly different solution using bash scripting to figure out if seg faults were occurring due to APC. As an additional solution for users using APC, I found that clearing the cache also works so I created a script that basically calls apc_clear_cache) which generally resulted in minimal downtime.

I'm considering taking advantage of the way logwatcher works with what can be done via apc to get the best of both worlds (my script to figure out the errors could be better ^_^)

Downtime is inevitable

The logwatcher works the way it is with any op-code cache (APC and eAccelerator at least).

I am thinking of modifying it so that it detects the op-code cache type, and call the cache clear function (like you do with APC).

For example:

$array_function_list = array(
'eaccelerator_clear' => 'eAccelerator',
'apc_clear_cache' => 'APC',
);

foreach($function_list as $function => $description) {
if (function_exists($function)) {
print("$description caused a segmentation fault.");
$function();
print("Called function $function to reset.");
}
}

However, the downtime is inescapable, unless we keep reading the logs every second or 5 seconds, which is excessive.

You may want to share the code here, or a link to it when it is done.

-- 2bits -- Drupal consulting

You are absolutely right.

You have a perfectly valid point that the downtime is inescapable for any of the sites on the server that may take advantage of the op-code cache. The main advantage I saw to clearing the cache was that you didn't necessarily have to bring the server down for a restart (thus any other sites that don't rely on the op-code cache would not be affected).

Also you have a valid point when you say the logwatcher doesn't have to depend on anything else based on the way it is.

Regardless, once I have what I need working with the logwatcher, I'll post it up for you :D

semaphores

Good article. Often however the semaphores are locked. The only way to free them is by doing something like

ipcs -s | grep apache | perl -e 'while () { @a=split(/\s+/); print `ipcrm sem $a[1]`}'

A restart resets the sems

Shouldn't a apache restart reset the semaphores altogether?

Also, in your script, the grep apache part is distro dependent. On Ubuntu, it is www-data not apache.
--
2bits -- Drupal consulting