In this issue
Release: 1.2.1 (Released 6th July 1997) (local download
sites)
Beta: None
Bugs in 1.2.1:
-
Solaris systems can fail to restart on a SIGHUP. This
appears to be a bug in Solaris which should be fixed in
Solaris 2.6. For more details, workarounds and a patch see
known
bugs. This will be fixed in 1.3.
-
Content negotiation may fail to pick the smallest of
equally acceptable variants. This will be fixed in 1.3.
Patches to Apache 1.2 bugs will be made available in the apply
to 1.2.1 directory on the Apache site. Some new features
and other unofficial patches are available in the 1.2
patches directory. For details of all previously reported
bugs, see the Apache bug database and
Known
Bugs page. Many common configuration questions are
answered in the Apache FAQ.
Unless otherwise noted, all the new features discussed here
are planned for Apache 1.3 and not Apache 1.2.1.
More Use of Regular Expressions
The Alias, ScriptAlias and
Redirect directives map incoming URLs onto a
file or another URL. The incoming URL is given as a simple
partial match, so for example,
Alias /icons/ /usr/web/icons/
maps /icons/banner.gif onto
/usr/web/icons/banner.gif. But it is difficult
to do things like map (say) all images onto a different
server. Although this can be done with the optional rewrite
module, the syntax for this module is quite complex. A new
simpler way of matching URLs will be implemented in Apache
1.3.
This will use Unix "regular expressions" to match the
incoming URL. This gives a lot of flexibility, especially
since parts of the incoming URL can be included in the
resulting filename or URL (instead of just the trailing
part). Three new directives implement this:
AliasMatch, ScriptAliasMatch and
RedirectMatch. They perform the same function as
their counterparts without the "Match", but use regular
expressions for the first argument and can include
replacement tokens in the second argument.
For example, to map all requests for .gif files onto a
different server you could use
RedirectMatch (.*)\.gif$ http://www.img_server.com$1.gif
The first argument is the regular expression to match against
the incoming URL. The .* means match any number
of characters, while the \.gif$ matches the text
".gif" at the end of the URL only. Because the
expression tries to match the longest part it can, the .* bit
will match the whole initial part of the request, from the
initial / onwards. Finally the brackets ( ) mark the text
that matches for use in the second argument.
The second argument gives the replacement URL. The $1 part is
replaced by the text that matched within the brackets in the
first argument. So, for example, if the incoming URL was
/about/head.gif
then the first argument would match (because it ends in
.gif), and the bracketed part would match the text
"/about/head" and call that match $1. In the
second argument the $1 will be replaced with this text,
giving a redirected URL of
http://www.img_server.com/about/head.gif
The directives <Directory>,
<Location> and <Files>
can already use regular expressions, indicated by a ~ (tilde)
as the first argument, followed by the expression. For
consistency there are now additional directives
<DirectoryMatch>,
<LocationMatch> and
<FilesMatch> which take just a regular
expression argument.
Directory Indexing Split into Two Modules
At present, the mod_dir module handles directory
indexes. It actually does two very different things, each
individually controllable:
-
It can return an automatic index of a directory as HTML,
configured by the Indexes option
-
It can map an incoming request for a directory onto a
filename (typically index.html or index.cgi).
Most of the code in mod_dir deals with the first action, which
is quite complex. The second part is much simpler. Many sites
require the second part but do not need the first (in fact, the
first can expose files which should not be displayed to the
user, so it is more secure to not use directory indexes). In
Apache 1.3 these two functions have been split into two
separate modules. This means people who need the index.html
functionality but not the auto-indexing can reduce the size of
their executable by removing the auto-indexing module.
The auto-indexing code has been removed from mod_dir (which
now just handles index.html style functionality), and placed
into the new module mod_autoindex.
Turning off Hostname Lookups
Two weeks ago we reported that 1.3 will ship with a
configuration file containing the HostnameLookups off
directive. Currently the hostname lookups default
to on. The main effect of this - besides better
performance - will be that the log file will contain IP
numbers instead of hostnames. At the moment, setting
HostnameLookups off in 1.2.1 or earlier will
also affect access restrictions based on hostname (such as
allow from .nasa.gov). In 1.3 this will work
even if hostname lookups are set to off. If Apache sees a
hostname in an allow or deny
directive it will convert the browser's IP address into the
corresponding hostname. This means it is quite safe to set
hostname lookups to off in 1.3 without affecting existing
access restrictions.
Better Support for 64 Bit Systems
At some places in the code, Apache uses variables or
arguments which can take either an integer value or a pointer
value. These are actually stored as pointers, then cast to
the correct type when used. On most systems this is not a
problem, since both ints and pointers are stored in the same
sized locations (32 bit). However newer systems may use 64
bits for one or more of these types. There is a risk that if
the size of an integer is larger than the size of a pointer,
data will be lost, and the code will often cause compilation
warnings about data type sizes. From 1.3 onwards, Apache's
internal code will use a special "generic" data type which is
defined to be large enough for whatever data is stored within
it. Although a typical way of doing this would be to use a
union of all the data types, this would slow down function
calls, so Apache uses a type which can be passed by value to
functions. This may affect the module API for 1.3.
Unbuffered CGI
Normally the output a CGI scripts is "buffered". That is,
Apache reads the output and sends it out when it has got
enough, or when the CGI program exits. This is good for
performance of the server and the network, but might be
undesirable in some situations. For example, if you have a
long running CGI program you cannot currently send back a
line or two to the user telling them to "please wait....", or
a search engine cannot display results as it finds them.
Actually there is a way to do both of those, called "nph"
scripts. This is an old system where the CGI output is sent
straight back to the client without buffering. NPH actually
stands for "Non-parsed Headers", because NPH scripts must
also send back all the required HTTP response headers. Given
that there are now three different versions of HTTP, and that
HTTP/1.1 adds a lot of new requirements, writing a compliant
NPH script is very difficult. So using NPH is not
recommended.
Recent changes to the 1.3 code will make it possible to have
unbuffered scripts without having to use NPH.
But rocket scientists at JPL use Apache. The latest news
about the PathFinder mission to Mars is being made available
by JPL on their website. As might be expected for such as
high-profile site, it generated a lot of traffic.
Since the touchdown last weekend, the web servers used by JPL
have changed quite a bit. To offload their servers there are
many mirrors of the PathFinder site around the world,
including some high-capacity sites run by SGI, Sun and
others. These tend to use the corporate vendor's own server,
or one they have a commercial relationship with (so, for
example, SGI's PathFinder site runs a Netscape server).
However internally JPL uses Apache servers for its web sites.
In fact the main JPL site at www.jpl.nasa.gov is
running Apache. This is actually handled by Sun Ultra 1's
running Solaris and Apache. These servers were initially
handling the PathFinder site as well but when they becaome
overloaded another server was setup by SGI (mpfwww.jpl.nasa.gov
using Netscape Enterprise server).
Like many other popular sites, the JPL site at
www.jpl.nasa.gov gets a lot of hits. With several million
hits per day, they need a server which can cope with more
than 50 hits per second. With suitable hardware and some
configuration, Apache can easily handle this sort of load on
a single system. Combined with multiple servers Apache can
also scale to huge numbers of hits. The JPL main site is
currently getting about 6,000,000 per day, split across two
servers (3 million hits per day per server). The hardware
used are Sun Ultra 1's with 256Mb of memory, and Apache on
this hardware has no problems with up to 5 million hits per
day.
The key to handling high hit-rates with Apache is to ensure
that there is enough memory to run the concurrent child
processes in RAM without swapping. In this case, 256Mb per
server allows for well over 500 concurrent servers (i.e. 500
concurrent clients). Besides memory, the configuration files
and operating system should be adjusted for maximum
performance (although this is significantly less important
that the amount of physical memory). Adjustments should
include:
-
Remove all modules you do not use from the running
executable
-
Turn off looking for .htaccess files with
AllowOverride None
-
Reduce timeouts
-
Increase the listen queue size if necessary
-
Do not read from or write to any NFS mounted disks
(especially not for log files)
-
Configure the operating system for large numbers of file
descriptors
-
Increase the number of requests per child with
MaxRequestPerChild
-
Increase the number of children started and ensuring that
there are enough spare children to handle sudden bursts of
requests
-
Increase the maximum number of servers with
MaxClients (this will also require
recompilation with a larger HARD_SERVER_LIMIT)
-
Turn off DNS lookups with HostnameLookups Off.
Ensure that all host-based restrictions are done by IP
number (until 1.3 comes out)
-
Use Apache 1.2.0 or 1.2.1 which are much more efficient
(both on the server and on the network) than early betas of
1.2.
A future release of Apache will be multithreaded (this might be
in version 1.4 or 2.0, depending on how development goes). The
use of multithreading rather than multiple processes may reduce
the amount of memory needed to run Apache efficiently, but
probably not a huge amount. Although each Apache executable is
often around 700kb to 1Mb in size, most of this is executable
code which is shared between all the processes.
Info World reviewed four web server "solutions", including
Apache running on a dual Pentium P6/200 system. The review in
"
Web platform solutions - Big Blue deja vu" compared
Apache on RedHat Linux, Microsoft IIS on NT, Netscape
Enterprise on NT and IBM's Internet Connection Secure Server
on AIX. They gave Apache last place, and recommended the IBM
solution.
The most important part of this review is a performance test
which showed Apache having serious problems coping the high
loads. This should not have been the case, since Apache can
cope with very high load given correct configuration,
hardware and software. There are several reasons why they
might have had problems at high loads:
-
The hardware used was well under a third the price of the
hardware used for the other solutions
-
They used Apache 1.1.3, whereas Apache 1.2.1 includes many
optimisations for efficient use of server and network
resources
-
The operating system used a single CPU rather than the two
available in the hardware
-
The amount of memory was not specified, but may not have
been enough for the number of concurrent clients. Much more
memory could have been installed in the system and still
kept the server cost under a third of the other solutions.
-
The configuration may not have been optimised. They listed
the extensive optimisations applied to the other servers
(including things like disabling SNMP management and web
publishing of Enterprise, despite critising Apache for not
having SNMTP managment or content managment).
-
The tests were (presumably) carried out under laboratory
conditions on a fast local area network. Apache is designed
for use on real-world internet connections with long
latency times, badly behaved clients, etc.
It is ironic that Apache looses out for not having the
bells-and-whistles of some other servers, but when it comes to
performance they had to disable these features! And of course
as the JPL site shows, in the real world Apache can easily cope
with hit rates of 5 million a day (57 per second) with some
minor tuning and adequate hardware.
|