The apache mod status module provides very detailed real time server information but does not provide much intuition on how that activity is spread over the virtual host configuration. I had once written a php prototype that built a list of virtual host log files using glob(), then scanned those files every second to collect the last modified date time, and finally enumerated those files and time stamps in descending order. It was not very pretty and very inefficient but it provided the inspiration for a better server monitor.
Linux provides a very efficient mechanism (inotify) to monitor file system changes, so I set about creating a curses application to reproduce my prototype. To be generally useful, the file list specification had to be flexible to account for the various ways virtual host logging is setup. I parameterized and slightly generalized the glob() mechanism and added an explicit list mechanism. To hold these and other parameterizations a configuration file "~/.topvhosts" was created.
The efficiency of this approach, allowed me to add information incrementally extracted from each log file by scanning the records added to the file since it last changed. Record count is used as a proxy for "hits" and the fields from the last parsed record can be displayed to provide "almost" real-time information.
The application is currently provided as a source tarball released under the GPL. In order to build the application the following are required:
The build uses GNU autotools so the installation follows the typical pattern of expanding the tarball, changing your directory to the main distribution directory, then executing "./configure", "make", and "make install". You must configure your application before use - see the next section.
The configuration file is named '.topvhosts' and is stored in the home directory. A prototype of this file is found in the main distribution directory. This file should be customized and placed in the user's home directory (it is hoped that this step may become an addition make target in the future). The general features of this "INI" style text file are:
The currently recognized configuration settings are:
default | The name of the configuration file section used if none is specified on the command line |
---|---|
hdr_format | A sprintf like format string that describes the topmost line of the display. See details below. |
row_format | A sprintf like format string that describes the other lines of the display. See details below. |
log_format | A sprintf like format string that describes the format of the log records. If not specified, the apache default will be used. See details below. |
glob_src | A path containing '%s' as a place holder for the domain name. The place holder will be replaced by '*' and expanded by glob() to produce a file list |
glob_omit | The extension of any files to be removed from the list generated by glob() |
config_src | The name of a section in the configuration file whose assignments will be added to the file list. The left hand value is presumed to be the domain name and the right hand value is presumed to be the complete path to the log file |
file_src | The full path to an external configuration file whose topmost assignments use the same assignment format as used by config_src |
The list of files monitored by the application is obtained by merging the *_src specifications. The sample configuration file included with the package is shown below:
The sections in the sample include:# Configuration for topvhost # # default=DA hdr_format=%02m-%02d %02H:%02M:%02S Elapsed: %t Hosts: %3n Hits: %6o Read: %#I\n| row_format=%02m-%02d %02H:%02M:%02S %8o %18h %v %3s %40r\n [DA] glob_omit=.error.log glob_src=/var/log/httpd/domains/%s.log log_format=%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O [Plesk] glob_src=/var/www/vhosts/%s/statistics/logs/access_log [Main] config_src=MainFiles [MainFiles] main_log=/var/log/httpd/access_log [Example] glob_omit=.error.log glob_src=/var/log/httpd/domains/%s.log log_format=%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O row_format=%02m-%02d %02H:%02M:%02S %8o %18h %v %3s %40r\n%40{Referer}i %40{User-Agent}i\n
[DA] A sample configuration for virtual hosts managed by the Direct Admin control panel [Plesk] A sample configuration for virtual hosts managed by the Plesk control panel [Main] A sample configuration with log files enumerated in the [MainFiles] section [Example] A sample Direct Admin section with a multi-row record layout
Log and display fields are treated as named columns and construction of the row display takes place as the transfer of log columns to row columns. It makes sense to use the same nomenclature to describe both the source and destination of this transfer. The source record is specified by apache LogFormat syntax, so the destination (the display row) is described by a similar syntax:
The token formats are summarized by the following tabulation:
Token | Value | log | row | hdr | Note |
---|---|---|---|---|---|
%B | bytes | x | w | response excluding headers | |
%{name}C | cookie | x | w | use row entry "%<width>{name}C" | |
%H | hour | n | n | ||
%I | read | n | Bytes read since start - use '#' for human format | ||
%M | minute | n | n | ||
%S | second | n | n | ||
%T | elapsed | n | Seconds since start | ||
%U | url | x | w | ||
%Y | year | n | n | ||
%b | bytes | x | w | response excluding headers (CLF) | |
%d | day | n | n | ||
%{name}e | environment | x | w | use row entry "%<width>{name}e" | |
%h | remote host | x | w | Output right aligned and truncated/padded to width | |
%(name)i | header | x | w | use row entry "%<width>{name}i" | |
%m | month | n | n | ||
%n | count (hosts) | n | Total number of virtual hosts | ||
%o | count (hits) | n | n | Record count | |
%r | request | x | w | Output left aligned and truncated/padded to width | |
%s | status | x | w | ||
%t | elapsed | * | "hh:mm:ss" since start | ||
%u | user | x | w | auth only | |
%v | domain name | * | Output left aligned and truncated/padded to width |
Legend |
x = extracted from log record; n=numeric specification (0-+# )*[\d\.]+; w = width specification \d+; * = left aligned, fixed width log_format defaults to '%h %l %u %t \"%r\" %>s %b' if not provided |
Currently the fixed width requirement is not enforced for those fields marked 'n' above. For those fields marked 'w' above, the format specification is converted to an integer that determines the fixed width of the field. The distributed configuration file shown below provides sample syntax.
Invocation arguments (courtesy of getopt) are:
-d | Diagnostic for output format |
---|---|
-f --file | Specify an alternate configuration file |
-i --init | Initialize hit counts by reading the log files. The default is to start the hit counts at zero an only read records added after startup |
-s --select | Specify the configuration section |
-v --verify | Display the list of log files and the initial record displayed and then exit |
A sample command line for someone running the Direct Admin panel who wanted to smoke test the just built application from the build directory would be:
src/topvhost -f./.topvhost -sExample
The execution main loop consists of checking the keyboard for input, reading any log files that have changed, displaying the result, and suspending for 1 second. The following keys are recognized:
b B | Previous screen - also Page Up if keypad recognized |
---|---|
f F | Next screen - also Page Down if keypad recognized |
h H | Order display by descending hit count |
q Q | Quit the display |
r R | Force a display refresh |
t T | Order the display by descending access time |
The application also catches and terminates on SIGINT or SIGKILL. Some minor display glitches can be fixed be resizing the terminal window.
Please use the .05 package. Users of versions prior to .04 should note there have been incompatible changes in layout syntax which require adjustment of an existing configuration file. See the distributed ChangeLog.
Obviously, this application is not suitable for servers with many busy virtual hosts. I do have some ideas on how to better deal with higher loads by better using the inotify queue but that will have to wait until I get some free time again.
gary at issiweb dot com