Abstract: In this article, let us discuss the many techniques and precautions for building a basic server-side monitoring engine based on PHP language, and give a complete source code implementation.
1. The problem of changing the working directory
When you write a monitoring program, it is usually better to let it set its own working directory. In this way, if you use a relative path to read and write files, it will automatically handle the location where the user expects the file to be stored based on the situation. Although it is a good practice to always limit the paths used in a program; however, it loses the flexibility it deserves. Therefore, the safest way to change your working directory is to use both chdir() and chroot().
chroot() can be used in the CLI and CGI versions of PHP, but it requires the program to be run with root privileges. chroot() actually changes the path of the current process from the root directory to the specified directory. This allows the current process to execute only files that exist in that directory. Often, chroot() is used by servers as a "security device" to ensure that malicious code does not modify files outside a specific directory. Keep in mind that although chroot() prevents you from accessing any files outside of your new directory, any currently open file resources can still be accessed. For example, the following code can open a log file, call chroot() and switch to a data directory; then, still be able to successfully log in and open the file resource:
<?php
$logfile = fopen("/var/log/chroot.log", "w");
chroot("/Users/george");
fputs($logfile, "Hello From Inside The Chrootn");
?>
If an application cannot use chroot(), then you can call chdir() to set the working directory. This is useful, for example, when the code needs to load specific code that can be located anywhere in the system. Note that chdir() provides no security mechanism to prevent unauthorized opening of files.
2. Give up privileges
When writing Unix daemons, a classic security precaution is to have them give up all unnecessary privileges; otherwise, having unnecessary privileges can easily lead to unnecessary trouble. In the case of vulnerabilities in the code (or PHP itself), the damage can often be minimized by ensuring that a daemon runs as a least privileged user.
One way to accomplish this is to execute the daemon as an unprivileged user. However, this is usually not enough if the program needs to initially open resources that unprivileged users do not have permission to open (such as log files, data files, sockets, etc.).
If you are running as root, you can give up your privileges with the help of the posix_setuid() and posiz_setgid() functions. The following example changes the privileges of the currently running program to those owned by user nobody:
$pw=posix_getpwnam('nobody');
posix_setuid($pw['uid']);
posix_setgid($pw['gid']);
Just like chroot(), any privileged resources that were opened before giving up privileges will remain open, but new resources cannot be created.
3. Guarantee exclusivity
You may often want to achieve: only one instance of a script is running at any time. This is particularly important to protect scripts, as running them in the background can easily lead to accidentally calling multiple instances.
The standard technique for ensuring this exclusivity is to have the script lock a specific file (often a locked file, and used exclusively) by using flock(). If the lock fails, the script should print an error and exit. Here is an example:
$fp=fopen("/tmp/.lockfile","a");
if(!$fp || !flock($fp, LOCK_EX | LOCK_NB)) {
fputs(STDERR, "Failed to acquire lockn");
exit;
}
/*Successfully locked to perform work safely*/
Note that the discussion of the lock mechanism involves more content and will not be explained here.
4. Building the Monitoring Service
In this section, we will use PHP to write a basic monitoring engine. Since you won't know in advance how to change it, you should make its implementation flexible and possible.
The logger should be able to support arbitrary service inspection (for example, HTTP and FTP services) and be able to log events in any way (via email, output to a log file, etc.). Of course you want it to run as a daemon; therefore, you should ask it to output its complete current state.
A service needs to implement the following abstract class:
abstract class ServiceCheck {
const FAILURE = 0;
const SUCCESS = 1;
protected $timeout = 30;
protected $next_attempt;
protected $current_status = ServiceCheck::SUCCESS;
protected $previous_status = ServiceCheck::SUCCESS;
protected $frequency = 30;
protected $description;
protected $consecutive_failures = 0;
protected $status_time;
protected $failure_time;
protected $loggers = array();
abstract public function __construct($params);
public function __call($name, $args)
{
if(isset($this->$name)) {
return $this->$name;
}
}
public function set_next_attempt()
{
$this->next_attempt = time() + $this->frequency;
}
public abstract function run();
public function post_run($status)
{
if($status !== $this->current_status) {
$this->previous_status = $this->current_status;
}
if($status === self::FAILURE) {
if( $this->current_status === self::FAILURE ) {
$this->consecutive_failures++;
}
else {
$this->failure_time = time();
}
}
else {
$this->consecutive_failures = 0;
}
$this->status_time = time();
$this->current_status = $status;
$this->log_service_event();
}
public function log_current_status()
{
foreach($this->loggers as $logger) {
$logger->log_current_status($this);
}
}
private function log_service_event()
{
foreach($this->loggers as $logger) {
$logger->log_service_event($this);
}
}
public function register_logger(ServiceLogger $logger)
{
$this->loggers[] = $logger;
}
}
The __call() overloaded method above provides read-only access to the parameters of a ServiceCheck object:
· timeout - how long this check can be suspended before the engine terminates the check.
· next_attempt - The next time to attempt to connect to the server.
· current_status - The current status of the service: SUCCESS or FAILURE.
· previous_status - the status before the current status.
· frequency - how often to check the service.
· description - service description.
· consecutive_failures - The number of consecutive service check failures since the last success.
· status_time - The last time the service was checked.
· failure_time - If the status is FAILED, it represents the time when the failure occurred.
This class also implements the Observer pattern, allowing objects of type ServiceLogger to register themselves and then call it when log_current_status() or log_service_event() is called.
The key function implemented here is run(), which is responsible for defining how the check should be performed. If the check is successful, it should return SUCCESS; otherwise it should return FAILURE.
When the service check defined in run() returns, the post_run() method is called. It is responsible for setting the state of the object and implementing logging.
ServiceLogger interface: Specifying a log class only needs to implement two methods: log_service_event() and log_current_status(), which are called when a run() check returns and when a normal status request is implemented.
The interface is as follows:
interface ServiceLogger {
public function log_service_event(ServiceCheck$service);
public function log_current_status(ServiceCheck$service);
}
Finally, you need to write the engine itself. The idea is similar to the one used when writing the simple program in the previous section: the server should create a new process to handle each check and use a SIGCHLD handler to detect the return value when the check is completed. The maximum number that can be checked simultaneously should be configurable, thus preventing excessive use of system resources. All services and logs will be defined in an XML file.
Here is the ServiceCheckRunner class that defines this engine:
class ServiceCheckRunner {
private $num_children;
private $services = array();
private $children = array();
public function _ _construct($conf, $num_children)
{
$loggers = array();
$this->num_children = $num_children;
$conf = simplexml_load_file($conf);
foreach($conf->loggers->logger as $logger) {
$class = new Reflection_Class("$logger->class");
if($class->isInstantiable()) {
$loggers["$logger->id"] = $class->newInstance();
}
else {
fputs(STDERR, "{$logger->class} cannot be instantiated.n");
exit;
}
}
foreach($conf->services->service as $service) {
$class = new Reflection_Class("$service->class");
if($class->isInstantiable()) {
$item = $class->newInstance($service->params);
foreach($service->loggers->logger as $logger) {
$item->register_logger($loggers["$logger"]);
}
$this->services[] = $item;
}
else {
fputs(STDERR, "{$service->class} is not instantiable.n");
exit;
}
}
}
private function next_attempt_sort($a, $b){
if($a->next_attempt() == $b->next_attempt()) {
return 0;
}
return ($a->next_attempt() < $b->next_attempt())? -1 : 1;
}
private function next(){
usort($this->services, array($this, 'next_attempt_sort'));
return $this->services[0];
}
public function loop(){
declare(ticks=1);
pcntl_signal(SIGCHLD, array($this, "sig_child"));
pcntl_signal(SIGUSR1, array($this, "sig_usr1"));
while(1) {
$now = time();
if(count($this->children)< $this->num_children) {
$service = $this->next();
if($now < $service->next_attempt()) {
sleep(1);
continue;
}
$service->set_next_attempt();
if($pid = pcntl_fork()) {
$this->children[$pid] = $service;
}
else {
pcntl_alarm($service->timeout());
exit($service->run());
}
}
}
}
public function log_current_status(){
foreach($this->services as $service) {
$service->log_current_status();
}
}
private function sig_child($signal){
$status = ServiceCheck::FAILURE;
pcntl_signal(SIGCHLD, array($this, "sig_child"));
while(($pid = pcntl_wait($status, WNOHANG)) > 0){
$service = $this->children[$pid];
unset($this->children[$pid]);
if(pcntl_wifexited($status) && pcntl_wexitstatus($status) ==ServiceCheck::SUCCESS)
{
$status = ServiceCheck::SUCCESS;
}
$service->post_run($status);
}
}
private function sig_usr1($signal){
pcntl_signal(SIGUSR1, array($this, "sig_usr1"));
$this->log_current_status();
}
}
This is a very complex class. Its constructor reads and parses an XML file, creates all services to be monitored, and creates a logger to record them.
The loop() method is the main method in this class. It sets the request's signal handler and checks whether a new child process can be created. Now, if the next event (ordered by next_attempt time CHUO) runs well, a new process will be created. Within this new child process, issue a warning to prevent the test duration from exceeding its time limit, and then execute the test defined by run().
There are also two signal handlers: the SIGCHLD handler sig_child(), which is responsible for collecting terminated child processes and executing their service's post_run() method; the SIGUSR1 handler sig_usr1(), which simply calls all registered loggers log_current_status() method, which can be used to get the current status of the entire system.
Of course, this surveillance architecture doesn't do anything practical. But first, you need to check out a service. The following class checks whether you are getting a "200 Server OK" response from an HTTP server:
class HTTP_ServiceCheck extends ServiceCheck{
public $url;
public function _ _construct($params){
foreach($params as $k => $v) {
$k = "$k";
$this->$k = "$v";
}
}
public function run(){
if(is_resource(@fopen($this->url, "r"))) {
return ServiceCheck::SUCCESS;
}
else {
return ServiceCheck::FAILURE;
}
}
}
Compared with the frameworks you have built before, this service is extremely simple and will not be described in detail here.
5. Sample ServiceLogger process
The following is a sample ServiceLogger process. When a service is down, it is responsible for sending an email to an on-call person:
class EmailMe_ServiceLogger implements ServiceLogger {
public function log_service_event(ServiceCheck$service)
{
if($service->current_status ==ServiceCheck::FAILURE) {
$message = "Problem with{$service->description()}rn";
mail( '[email protected]' , 'Service Event', $message);
if($service->consecutive_failures()> 5) {
mail( '[email protected]' , 'Service Event', $message);
}
}
}
public function log_current_status(ServiceCheck$service){
return;
}
}
If it fails five times in a row, the process also sends a message to a backup address. Note that it does not implement a meaningful log_current_status() method.
Whenever you change the state of a service as follows, you should implement a ServiceLogger process that writes to the PHP error log:
class ErrorLog_ServiceLogger implements ServiceLogger {
public function log_service_event(ServiceCheck$service)
{
if($service->current_status() !==$service->previous_status()) {
if($service->current_status() ===ServiceCheck::FAILURE) {
$status = 'DOWN';
}
else {
$status = 'UP';
}
error_log("{$service->description()} changed status to $status");
}
}
public function log_current_status(ServiceCheck$service)
{
error_log("{$service->description()}: $status");
}
}
The log_current_status() method means that if a process sends a SIGUSR1 signal, it will copy its complete current status to your PHP error log.
The engine uses a configuration file as follows:
<config>
<loggers>
<logger>
<id>errorlog</id>
<class>ErrorLog_ServiceLogger</class>
</logger>
<logger>
<id>emailme</id>
<class>EmailMe_ServiceLogger</class>
</logger>
</loggers>
<services>
<service>
<class>HTTP_ServiceCheck</class>
<params>
<description>OmniTI HTTP Check</description>
<url> http://www.omniti.com </url>
<timeout>30</timeout>
<frequency>900</frequency>
</params>
<loggers>
<logger>errorlog</logger>
<logger>emailme</logger>
</loggers>
</service>
<service>
<class>HTTP_ServiceCheck</class>
<params>
<description>Home Page HTTP Check</description>
<url> http://www.schlossnagle.org/~george </url>
<timeout>30</timeout>
<frequency>3600</frequency>
</params>
<loggers>
<logger>errorlog</logger>
</loggers>
</service>
</services>
</config>
When passed this XML file, the constructor of ServiceCheckRunner instantiates a logging program for each specified log. Then, it instantiates a ServiceCheck object corresponding to each specified service.
Note that the constructor uses the Reflection_Class class to implement internal checks of the service and logging classes - before you attempt to instantiate them. Although this is unnecessary, it nicely demonstrates the use of the new Reflection API in PHP 5. In addition to these classes, the Reflection API provides classes to implement intrinsic inspection of almost any internal entity (class, method or function) in PHP.
In order to use the engine you built, you still need some wrapper code. The watchdog should prevent you from trying to start it twice - you don't need to create two messages for every event. Of course, the monitor should also receive some options including:
Option description
[-f] A location for the engine configuration file. The default is monitor.xml.
[-n] The size of the child process pool allowed by the engine. The default is 5.
[-d] A flag that disables the daemon functionality of this engine. This is useful when you write a debug ServiceLogger process that outputs information to stdout or stderr.
Here is the final watchdog script that parses options, ensures exclusivity and runs service checks:
require_once "Service.inc";
require_once "Console/Getopt.php";
$shortoptions = "n:f:d";
$default_opts = array('n' => 5, 'f' =>'monitor.xml');
$args = getOptions($default_opts, $shortoptions, null);
$fp = fopen("/tmp/.lockfile", "a");
if(!$fp || !flock($fp, LOCK_EX | LOCK_NB)) {
fputs($stderr, "Failed to acquire lockn");
exit;
}
if(!$args['d']) {
if(pcntl_fork()) {
exit;
}
posix_setsid();
if(pcntl_fork()) {
exit;
}
}
fwrite($fp, getmypid());
fflush($fp);
$engine = new ServiceCheckRunner($args['f'], $args['n']);
$engine->loop();
Note that this example uses the customized getOptions() function.
After writing an appropriate configuration file, you can start the script as follows:
> ./monitor.php -f /etc/monitor.xml
This protects and continues monitoring until the machine is shut down or the script is killed.
This script is quite complex, but there are still some easily improved areas, which are left as an exercise for the reader:
· Add a SIGHUP handler that reanalyzes the configuration file so that you can change the configuration without starting the server.
· Write a ServiceLogger that can log into a database to store query data.
· Write a web front-end program to provide a good GUI for the entire monitoring system.