In this issue
Release: 1.1.3 (Released 14th January 1997)
Beta: 1.2b6 (Released 26th January 1997)
Bugs fixed in 1.2b6:
-
Configuration for HP MPE on HP3000, updates for QNX
-
Problem with negotiated documents where
LanguagePriority was being ignored
-
Satisfy Any might not be applied in a
directory if .htaccess file exists containing certain
directives
-
Redirect from /index.html causes a core dump when a request
for / is made (using DirectoryIndex)
-
Documentation says IdentityCheck and
HostNameLookups are valid in .htaccess. This
is not correct, and the docs have been updated
Patches to fix some Apache 1.2b6 bugs are available in the 1.2b6
patches directory on the Apache site.
Apache is currently in a 'beta release' cycle. This is where
it is made available prior to full release for testing by
anyone interested. Normally during the beta cycle no new
major features will be added. The full release of Apache 1.2
is expected in February.
Performance Tweaks
Apache is designed for high performance sites. The use of
dynamically changing number of pre-forked servers means it
can cope with rapidly changing load levels. However this week
has seen a number of changes designed to increase
performance. This has been in the specific area of server
side includes, response transmission and the Apache core code
which gets used on every request. Server Side includes are
being speeded up by reading the file in large chunks
(buffering). Additional speed-ups are being considered but
may not make it into 1.2. These include having a directive to
turn off processing of SSI directives (so the rest of the
file can be read quickly without searching for more <!--
sequences) and generating a valid "last-modified" date to
allow clients to cache the pages.
The Apache core is also being updated. The number of network
writes needed to send a document is being reduced (this
applies when the document is being send in "chunks", a new
HTTP/1.1 format). A number of other network-related speed-ups
have already been applied since 1.1.1, and the use of
HTTP/1.1 persistent connections (keep-alives) will also speed
up network response.
Finally, other areas of the Apache core code have been
modified. For example, there are some things which are fixed
after the configuration file is read, so can be evaluated
once in advance rather than for every request.
httpd_monitor Updated
Some systems still use a file to store the "scoreboard". This
is where Apache records details of which child processes are
running and what they are doing. On most systems this is
stored in memory, and can be accessed by compiling in the
optional status module. But where a file in used, the
external program httpd_monitor (in the support
directory) can be used to get roughly the same information.
This program has been updated to know about the current
scoreboard format.
Easier Compilation of Support Programs
Various support programs are available in the
support directory. This includes the
htpasswd program to create and modify http
usernames and passwords. In previous releases, the
Makefile in this directory has had to be edited
by hand to select the correct compiler and other OS-specific
options. In Apache 1.2, a Configure script is
used to set these things automatically for the main server.
This has been extended so it also configures the support
directory as well. So after running Configure,
the support programs can be made by going into the support
directory and typing make.
Byteranges Workaround for Netscape
When Netscape Navigator requests an Adobe PDF file from a
server, it uses HTTP/1.1 "byteranges" to get parts of the
document in a particular order. However it does not recognise
the response that Apache returns, because it is looking for a
non-standard content type on the response. According to
HTTP/1.1, a byterange response should be marked as type
"multipart/byteranges". However Netscape Navigator only
accepts "multipart/x-byteranges" (which, incidentally, is
what Netscape servers send). Apache cannot be altered to send
this since it is non-standard and would break clients which
conform to HTTP/1.1. Fortunately, Netscape Navigator also
sends an extra non-standard header. Apache will be updated to
look for this header, and if present, return byteranges in a
format the Netscape clients can understand.
Every time a browser hits you site it leaves a trail in your
access log. This file is enough to tell you how many hits you
received and gives you some basic information about the
browser, such as their hostname. But there is a lot more
information readily available that you could be gathering.
Want to know which browser is most common on your site, or
what languages your readers can understand? In Apache 1.2
logging information like this is easy.
Logging in Apache 1.1.1
If you've been using Apache 1.1.1 for a while, you've
probably come across the customisable log file module. This
is a replacement for the standard "common" log file module,
and lets you select what to log. You can use this to log
addition information, such as browser types. But because
Apache 1.1.1 can only have one main log file, you have to
store this information in your normal access log.
A better alternative is to put this additional information in
a separate log file. This requires a new module for each
additional log file. Apache 1.1.1 comes with two such
modules: one to log the browser type and one to log the
'referrer' (the page the user came from before requesting the
current page). While these modules are useful, they are
limited. Changing the format of their log files, or logging
other information, requires C programming and re-compiling
your Apache.
Apache 1.2 provides a much neater solution: you can have any
number of log files, each with it's own customised format.
We'll explain first how to customise the format of your
existing log file, then show how to create multiple log
files. Finally we'll explain how logging works when you have
virtual hosts, where you can chose whether to log a virtual
host into the main log files or have separate log files for
each host.
Customising the Log File Format
The traditional format for web log files looks like this:
jupiter.ukweb.com - - [03/Feb/1997:00:06:59 +0000] "GET /
HTTP/1.0" 200 4571
jupiter.ukweb.com - - [03/Feb/1997:00:07:00 +0000] "GET
/img/awlogo.gif HTTP/1.0" 200 12706
(There are two lines here, both starting with
"jupiter.ukweb.com". If you see more than two, the lines have
been wrapped on the screen).
This format is called the common log format and is standard across most web servers
(although it is not very well documented). There are various
tools to analyse data in this format, and it is not too
difficult to write custom tools (in, say, perl) to extract
the data. But the lack of a common field delimiter makes such
tools more complex than necessary and prevents the use of
simple Unix programs such as cut.
In Apache 1.2 (and Apache 1.1.1 if you are using the config
log module) you can customise this format. There are probably
two common reasons for doing this: firstly, to make the
format simpler by using a common delimiter character, and
secondly to log addition information such as the browser type
at the end of each line (placing it at the end means the file
can still be analysed by standard log analysis programs).
You customise the format by telling Apache a format to use.
Special character sequences are used to represent specific
information. For example, the sequence %h will be replaced
with the name of the remote host. The common log format is
defined like this:
%h %l %u %t "%r" %>s %b
Additional sequences here are %l (the remote username, if
using identd), %u (the HTTP authenticated username, if any),
%t (the time in common-log format), %r (the request), %s (the
returned status) and %b (the number of bytes in the document
served).
Say, for example, you would prefer a file format with a
common delimiter character between each field, so that you
could use cut or write very simple perl scripts to extract
the data. Using the common log format above as a guide, you
could use
%h|%l|%u|%t|%r|%>s|%b
Here the | character is being used as a delimiter. Note that
this can cause problems if this occurs within a field (which
is possible, if unlikely in the %r request field).
To set this format for your log file, you use the
LogFormat directive. For example
LogFormat "%h|%l|%u|%t|%r|%>s|%b"
Logging Browser and User Information
The % sequences introduced so far let you log various aspects
of the request. There are some more sequences (covered below)
that log additional aspects of the request. However one of
the most important features of the custom log format is being
able to log any of the request headers supplied by the
browser. This lets you log things like the users language
preferences, browsers type and the page they just came from.
Logging a request header is doing using the %{}i sequence.
You put the name of the request header between the braces.
For example, to log the browser type, you would use
%{user-agent}i
This information is typically added to the end of the common
log format in Apache 1.1.1 (in Apache 1.2, you can put it in
a separate log file, which is much more convenient. This is
explained later). To add the user-agent information to the
end of the common log format, use
LogFormat "%h %l %u %t \"%r\" %>s %b %{user-agent}i"
If the browser does not send a user-agent, the text "-" will
be logged as the user-agent. Otherwise you will get the
browser name, such as "Mozilla/3.0Gold (Win95; I)" or
"Mozilla/2.0 (compatible; MSIE 3.01; Windows 95)" (the former
is Netscape Gold version 3, the latter Microsoft Internet
Explorer version 3, pretending to be Netscape 2).
In addition to %{...}i, there is a corresponding sequence
%{...}o to log any of the response headers (in these
sequences, the i means incoming and the o
outgoing headers).
Multiple Logs in Apache 1.2
Adding extra fields onto the end of the common log file
format is inconvenient. Luckily, Apache 1.2 offers a
completely customisable log file interface: you can create
any number of logs files each in a different format. It is
now almost trivial to add a log file for (say) user-agents or
requested languages, without needing to compile in a new
module or modify the Apache source code. You can even log all
the common log file information into both common log format
(for existing analysers) and in a delimited format at
the same time!
The interface to all this is via a single, simple directive:
CustomLog. This directive takes both a file name
to log to, and a custom format. For example, to log
user-agents to a file called agents in the logs directory,
you would use:
CustomLog logs/agent "%{user-agent}i"
Other useful log files can also be created. This next two
directives create a referrer log and a log of language
preferences of your clients:
CustomLog logs/referer "%{referer}i -> %U"
CustomLog logs/language "%{accept-language}i"
Advanced Configuration Options
You can tell the format to only log particular fields if the
response status is (or is not) a particular value. For
example, to only log the language preference for 200 or 304
statuses, use %200,304{accept-language}i. You
can put a exclamation mark (!) straight after the % to
reverse the condition (i.e. to only log if the status was not
200 or 304).
The time logged by %t is in common log file format. If you
want to use another format, use
%{format}t, where format is a date
and time format as used by strftime (see
man strftime for more information).
In some cases, the request will be handled by an internal
redirect (this is common for things like requests satisfied
by a DirectoryIndex file). In these cases, the
configuration options can apply to either the original
response, or the one actually delivered. The characters <
and > after the % determine whether to log the original
value, or the redirected value. For example, in %s you always
want the value of the status actually returned, so %>s is
used in the common log file definition. Each % sequence knows
whether it should use the original response or the real
response - for example, %r (the request line) uses the
original response.
Logs and Virtual Hosts
The logging directives, TransferLog,
LogFormat and CustomLog can be used
inside virtual hosts. The way they interact with the logs
setup outside the virtual hosts is like this:
-
If there are no TransferLog or
CustomLog directives inside the virtual host,
log requests for this host to the logs defined in the main
server.
-
Otherwise log requests to the log files defined in this
virtual host and do not use any of the log files defined in
the main server.
-
If Logformat is used in a virtual host, the
format it defines is used for all TransferLog
files defined inside that virtual host
-
Otherwise the log format defined outside the virtual host
is used by the TransferLogs defined inside the
host, defaulting to the common log format if no
LogFormat is defined in the main server.
Configurable Format Reference
Here are all the % sequences allowed in the configurable log
format in Apache 1.2.
%b
|
bytes sent, excluding HTTP headers
|
%f
|
filename
|
%h
|
remote host
|
%{Header}i
|
The contents of Header: header line(s) in the request
sent from the client
|
%l
|
remote username (from identd, if supplied)
|
%{Note}n
|
The contents of note "Note" from another module
|
%{Header}o
|
The contents of Header: header line(s) in the reply
|
%p
|
the port the request was served to
|
%P
|
the process ID of the child that serviced the request
|
%r
|
first line of request
|
%s
|
response status. For requests that got internally
redirected, this is status of the original request: use
%>s for the returned status
|
%t
|
time, in common log format time format
|
%{format}t
|
The time, in the form given by format, which should be
in strftime format
|
%T
|
the time taken to serve the request, in seconds
|
%u
|
remote user (from auth; may be bogus if return status
(%s) is 401)
|
%U
|
the URL path requested
|
%v
|
the name of the server (i.e. the virtual host)
|
|