First published: 7th February 1997
Gathering Visitor Information:
Customising Your Logfiles
Every time a browser hits your site it leaves a trail
in your access log. This file is enough to tell you how many
hits you received and gives you some basic information about
the browser, such as their hostname. But there is a lot more
information readily available that you could be gathering. Want
to know which browser is most common on your site, or what
languages your readers can understand? In Apache 1.2 logging
information like this is easy.
First published in Apache Week issue 51 (7th
February 1997).
Logging in Apache
Apache uses the TransferLog command set create a single log
file for storing details of every request. However Apache's
logging capabilities are far more advanced: it can write the
log file in any format, it can write multiple log files (each
with a different format), and it can send log messages to an
external process via a "pipe".
This feature will explain first how to customise the format
of your existing log file, then show how to create multiple
log files. Finally it will cover how logging works when you
have virtual hosts, where you can chose whether to log a
virtual host into the main log files or have separate log
files for each host.
Customising the Log File Format
The traditional format for web log files looks like this:
jupiter.eu.c2.net - - [03/Feb/1997:00:06:59 +0000] "GET /
HTTP/1.0" 200 4571
jupiter.eu.c2.net - - [03/Feb/1997:00:07:00 +0000] "GET
/img/awlogo.gif HTTP/1.0" 200 12706
(There are two lines here, both starting with
"jupiter.eu.c2.net". If you see more than two, the lines have
been wrapped on the screen).
This format is called the
common log format and is standard across most web servers
(although it is not very well documented). There are various
tools to analyse data in this format, and it is not too
difficult to write custom tools (in, say, perl) to extract
the data. But the lack of a common field delimiter makes such
tools more complex than necessary and prevents the use of
simple Unix programs such as cut.
You can customise this format. There are probably two common
reasons for doing this: firstly, to make the format simpler
by using a common delimiter character, and secondly to log
addition information such as the browser type at the end of
each line (placing it at the end means the file can still be
analysed by standard log analysis programs).
You customise the format by telling Apache a format to use.
Special character sequences are used to represent specific
information. For example, the sequence %h will be replaced
with the name of the remote host. The common log format is
defined like this:
%h %l %u %t "%r" %>s %b
Additional sequences here are %l (the remote username, if
using identd), %u (the HTTP authenticated username, if any),
%t (the time in common-log format), %r (the request), %s (the
returned status) and %b (the number of bytes in the document
served).
Say, for example, you would prefer a file format with a
common delimiter character between each field, so that you
could use cut or write very simple perl scripts to extract
the data. Using the common log format above as a guide, you
could use
%h|%l|%u|%t|%r|%>s|%b
Here the | character is being used as a delimiter. Note that
this can cause problems if this occurs within a field (which
is possible in the %r request field).
To set this format for your log file, you use the
LogFormat directive. For example
LogFormat "%h|%l|%u|%t|%r|%>s|%b"
Logging Browser and User Information
The % sequences introduced so far let you log various aspects
of the request. There are some more sequences (covered below)
that log additional aspects of the request. However one of
the most important features of the custom log format is being
able to log any of the request headers supplied by the
browser. This lets you log things like the users language
preferences, browsers type and the page they just came from.
Logging a request header is doing using the %{}i sequence.
You put the name of the request header between the braces.
For example, to log the browser type, you would use
%{user-agent}i
This information is typically added to the end of the common
log format in Apache 1.1.1 (in Apache 1.2, you can put it in
a separate log file, which is much more convenient. This is
explained later). To add the user-agent information to the
end of the common log format, use
LogFormat "%h %l %u %t \"%r\" %>s %b %{user-agent}i"
If the browser does not send a user-agent, the text "-" will
be logged as the user-agent. Otherwise you will get the
browser name, such as "Mozilla/3.0Gold (Win95; I)" or
"Mozilla/2.0 (compatible; MSIE 3.01; Windows 95)" (the former
is Netscape Gold version 3, the latter Microsoft Internet
Explorer version 3, pretending to be Netscape 2).
In addition to %{...}i, there is a corresponding sequence
%{...}o to log any of the response headers (in these
sequences, the i means incoming and the o
outgoing headers).
Multiple Logs
Adding extra fields onto the end of the common log file
format can be inconvenient, especially if you already have
software which processes the log files in their current
format. Luckily, Apache offers a completely customisable log
file interface: you can create any number of logs files each
in a different format. It is now almost trivial to add a log
file for (say) user-agents or requested languages, without
needing to compile in a new module or modify the Apache
source code. You can even log all the common log file
information into both common log format (for existing
analysers) and in a delimited format at the same time!
The interface to all this is via a single, simple directive:
CustomLog. This directive takes both a file name
to log to, and a custom format. For example, to log
user-agents to a file called agents in the logs directory,
you would use:
CustomLog logs/agent "%{user-agent}i"
Other useful log files can also be created. This next two
directives create a referrer log and a log of language
preferences of your clients:
CustomLog logs/referer "%{referer}i -> %U"
CustomLog logs/language "%{accept-language}i"
Advanced Configuration Options
You can tell the format to only log particular fields if the
response status is (or is not) a particular value. For
example, to only log the language preference for 200 or 304
statuses, use %200,304{accept-language}i. You
can put a exclamation mark (!) straight after the % to
reverse the condition (i.e. to only log if the status was not
200 or 304).
The time logged by %t is in common log file format. If you
want to use another format, use
%{format}t, where format is a date
and time format as used by strftime (see man strftime
for more information).
In some cases, the request will be handled by an internal
redirect (this is common for things like requests satisfied
by a DirectoryIndex file). In these cases, the
configuration options can apply to either the original
response, or the one actually delivered. The characters <
and > after the % determine whether to log the original
value, or the redirected value. For example, in %s you always
want the value of the status actually returned, so %>s is
used in the common log file definition. Each % sequence knows
whether it should use the original response or the real
response - for example, %r (the request line) uses the
original response.
Logs and Virtual Hosts
The logging directives, TransferLog,
LogFormat and CustomLog can be used
inside virtual hosts. The way they interact with the logs set
up outside the virtual hosts is like this:
-
If there are no TransferLog or
CustomLog directives inside the virtual host,
log requests for this host to the logs defined in the main
server.
-
Otherwise log requests to the log files defined in this
virtual host and do not use any of the log files defined in
the main server.
-
If Logformat is used in a virtual host, the
format it defines is used for all TransferLog
files defined inside that virtual host
-
Otherwise the log format defined outside the virtual host
is used by the TransferLogs defined inside the
host, defaulting to the common log format if no
LogFormat is defined in the main server.
Configurable Format Reference
Here are all the % sequences allowed in the configurable log
format in Apache.
%b
|
bytes sent, excluding HTTP headers
|
%f
|
filename
|
%h
|
remote host
|
%{Header}i
|
The contents of Header: header line(s) in the request
sent from the client
|
%l
|
remote username (from identd, if supplied)
|
%{Note}n
|
The contents of note "Note" from another module
|
%{Header}o
|
The contents of Header: header line(s) in the reply
|
%p
|
the port the request was served to
|
%P
|
the process ID of the child that serviced the request
|
%r
|
first line of request
|
%s
|
response status. For requests that got internally
redirected, this is status of the original request: use
%>s for the returned status
|
%t
|
time, in common log format time format
|
%{format}t
|
The time, in the form given by format, which should be
in strftime format
|
%T
|
the time taken to serve the request, in seconds
|
%u
|
remote user (from auth; may be bogus if return status
(%s) is 401)
|
%U
|
the URL path requested
|
%v
|
the name of the server (i.e. the virtual host)
|
|