|
In this issue
It has been about a year since Apache 1.3 was released, and
the core Apache members are now working on version 2.0. The
new version will be significantly different to the current
one, which raises issues such as "Why update Apache at all?"
and "What does this update mean for Apache administrators?"
We hope to answer those and many other questions in this
article and, as the release of 2.0 approaches, provide more
up to date information.
It is important to note that presently there is only
development code available for 2.0 and that downloading it
now is not advised for anybody other than those who are
already familiar with the Apache internals. The code in its
current state is not guaranteed to compile from day to day or
to work on many platforms.
Apache Week will announce any upcoming alpha or beta versions
and the details of the 2.0 release as soon as they are ready.
Why Go Beyond 1.3?
Apache 1.3 is a great web server which serves pages for the
vast majority of the web, but there are things it can't do.
Firstly, it isn't particularly scalable on some platforms.
AIX processes, for example, are very heavy-weight and a small
AIX box serving 500 concurrent connections can become so
heavily loaded that it can be impossible to telnet to it. In
situations like this, using processes is not the right
solution: we need a threaded web server.
Apache is renouned for being portable as it works on most
POSIX platforms, all versions of Windows, and a couple of
mainframes. However, like most good things, portability comes
with a price which in this case is ease of maintenance.
Apache is reaching the point where porting to additional
platforms is becoming more difficult. In order to give Apache
the flexibility it needs to survive in the future, this
problem must be resolved by making Apache easy to port to new
platforms. In addition, Apache will be able to use any
specialised APIs, where they are available, to give better
performance.
The original reason for creating Apache 2.0 was scalability,
and the first solution was a hybrid web server; one that has
both processes and threads. This solution provides the
reliability that comes with not having everything in one
process, combined with the scalability that threads provide.
The problem with this is that there is no perfect way to map
requests to either a thread or a process.
On platforms such as like Linux, it is best to have multiple
processes each with multiple threads serving the requests so
that if a single thread dies, the rest of the server will
continue to serve more requests. Other platforms such as
Windows don't handle multiple processes well, so one process
with multiple threads is required. Older platforms which do
not have threads also had to be taken into account. For these
platforms, it is necessary to continue with the 1.3 method of
pre-forking processes to handle requests.
There are multiple ways to deal with the mapping issue, but
the cleanest is to enhance the module features of Apache.
Apache 2.0 sees the introduction of 'Multiple-Processing
Modules' (MPMs) - modules which determine how requests are
mapped to threads or processes. The majority of users will
never write an MPM or even know they exist. Each server uses
a single MPM, and the correct one for a given platform is
determined at compile time.
What MPMs are available?
There are currently five options available for MPMs. Their
names will likely change before 2.0 ships, but their
behaviours are basically set. All of the MPMs, except
possibly the OS/2 MPM, retain the parent/child relationships
from Apache 1.3. This means that the parent process will
monitor the children and make sure that an adequate number
are running.
-
PREFORK
-
This MPM mimics the old 1.3 behaviour by forking the
desired number of servers at startup and then mapping each
request to a process. When all of the processes are busy
serving pages, more processes will be forked. This MPM
should be used for older platforms, platforms without
threads, or as the initial MPM for a new platform.
-
PMT_PTHREAD
-
This MPM is based on the PREFORK MPM and begins by forking
the desired number of child processes, each of which starts
the specified number of threads. When a request comes in, a
thread will accept the request and serve the response. If
most of the threads in the entire server are busy serving
requests, a new child process will be forked. This MPM
should be used on platforms that have threads, but which
have a memory leak in their implementation. This may also
be the proper MPM for platforms with user-land threads,
although there has not been enough testing at this point to
prove this hypothesis.
-
DEXTER
-
This MPM is the next step in the evolution of the hybrid
concept. The server starts by forking a static number of
processes which will not change during the life of the
server. Each process will then create the specified number
of threads. When a request comes in a thread will accept
and answer the request. At the point where a child process
decides that too many of its threads are serving requests,
more threads will be created. This MPM should be used on
most modern platforms capable of supporting threads. It
should create the lightest load on the CPU while serving
the most requests possible.
-
WINNT
-
This MPM is designed for use on Windows NT. Before Apache
2.0 is released, it will also be made to work on Windows 95
and 98 although, just like Apache 1.3, it is unlikely to be
as stable as on NT. This MPM creates one child process,
which then creates a specified number of threads. When a
request comes in it is mapped to a thread that will serve
the request.
-
OS/2
-
This MPM is designed for use on OS/2. It is purely
threaded, and removes the concept of a parent process
altogether. When a request comes in, a thread will serve it
properly, unless all of the threads are busy, in which case
more threads will be created.
Multi-processing modules are designed to work behind the
scenes and do not interfere with requests in any way. In
fact, its only function is to map the request to a thread or
process. One advantage of this technique is that each MPM can
define its own directives. This means that if you are using a
PREFORK MPM, you won't be asked how many threads you want per
server, or if you are using the WINNT MPM, you won't need to
specify the number of processes.
Modules written for 1.3 will not work with 2.0 without
modification. There are many changes which will be documented
by the time 2.0 is released.
In Apache 1.3, each module uses a table of callback routines
and data structures. Instead of using this table to specify
which functions to use when processing a request, 2.0 modules
will have a new function to register any callbacks needed.
In the past, new features have been added to subsequent
releases of Apache which required the callback table to be
expanded causing existing modules to break. In 2.0, each
module is able to define how many callbacks it wants to use
instead of using a statically defined table with a set number
of callbacks. If the Apache Group decides to add callbacks in
the future, the changes are less likely to affect existing
modules.
Many things have been abstracted in Apache 2.0 and there are
many new functions available. This means it will no longer be
possible to access most of the internals of Apache data
structures directly. For example, if a module needs access to
the connection in order to send data to the client, it will
have to use the provided functions rather than access the
socket directly.
APR was originally designed as a way to combine code across
platforms. There are some sections of code that should be
different for different platforms as well as sections of code
that can safely be made common across all platforms.
Apache on Windows currently uses POSIX functions and types
that are non-native and non-optimised for communicating
across a network. By replacing these functions and types with
the Windows native equivalent there has been a significant
performance improvement. For example, spawning CGI processes
is very confusing in Apache 1.3 because Unix, Windows, and
OS/2 all handle spawning in different ways. By using APR, the
logic can be combined for spawning CGI processes, decreasing
the number of platform-specific bugs that are introduced
later.
APR will make porting Apache to additional platforms easier.
With a fully implemented APR layer any platform will be able
to run Apache. APR is small and well defined and once it is
fully integrated into Apache, will change very little in the
future. Apache has never been well defined for porting
purposes as there was too much code to make porting a simple
task. In addition, the code was originally designed for use
on Unix, which made porting to non-POSIX platforms very
difficult. With APR, all a developer needs to do is implement
the APR layer. APR was designed with Windows, Unix, OS/2, and
BeOS in mind and is more flexible as a result.
APR acts as the abstraction layer in Apache 2.0. To allow the
use of native types for the best performance, APR has unified
functions such as sockets into a single type which Apache
will then use independently of the platform. The underlying
type is invisible to the Apache developer, who is free to
write code without worrying about how it will work on
multiple platforms.
Apache 2.0 is a major re-working of Apache that will
hopefully result in a web server that can continue to grow
and serve the web. As has been traditional with previous
Apache releases, the 2.0 upgrade will be made available when
it is ready and stable. There is no promised release date
although it is hoped that a beta version will be available
either late in 1999 or early in 2000.
This article covers some of the major changes in Apache 2.0,
such as MPMs, module callbacks, and the abstraction layer.
Future editions of Apache Week will report on the progress of
Apache 2.0 and highlight any major developments.
|
|
|
|