In this issue
Release: 1.1.3 (Released 14th January 1997)
Beta: 1.2b7 (Released 22nd February 1997)
Bugs reported in 1.2b7:
-
Trailing slash after a reference to a type-map (.var) URL
causes Apache child to enter a loop and consume resources
-
suExec bug allows CGI program to write to the suExec log
file
-
Possible memory corruption of the local host name is longer
than 128 characters
-
The default virtual host (as designated with fake address
255.255.255.255) does not work on all systems
Bugs fixed in next release:
-
OS specific updates for FreeBSD 2.2, MachTen
-
Proxy module could close files too early
-
File descriptor leak if client disconnects immediately
after connection is established
-
Server could use wrong virtual host when using non-ip
vhosts
Patches to fix some Apache 1.2b6 bugs are available in the 1.2b6
patch directory
Apache is currently in a 'beta release' cycle. This is where
it is made available prior to full release for testing by
anyone interested. Normally during the beta cycle no new
major features will be added. The full release of Apache 1.2
is expected this month.
The March server
survey shows that over 356,000 sites now use Apache,
almost 43% of the servers surveyed. The percentage change in
share for Apache (+1.67%) was higher than both Netscape
(-0.16%) and Micrsoft (+0.72%).
PC Magazine in
the UK recently published performance comparisons of several
web servers. Apache was reported as the slowest server on
test, handling 100 requests per second, compared to
Microsoft's IIS which could handle 900 requests per second.
While they no doubt did get these figures from their tests,
there are a number of reasons why these figures may not
reflect the true capabilities of the servers. The article is
not available online.
Firstly, they tested Apache version 1.4.1. This is not an
Apache version number, and the URL they give is for NCSA
httpd, so they appear to have tested NCSA http instead.
Apache was originally based on NCSA 1.3, but is now very
significantly different, with many changes designed to
increase performance over NCSA. Figures from a test of NCSA
httpd cannot be applied to Apache. Even assuming that they
did test Apache, their figures may not be valid, for the
following reasons.
Secondly, web server software is complex and highly
configurable. It is probably possible to get a wide variety
of performance figures by changing the configuration. In
particular, Apache will require configuration is get best
performance. If this configuration was not done poorer
results will be obtained. They gave no details of if or how
they configured to servers.
Thirdly, they turned on the local caching option on IIS and
Netscape Enterprise, but not on Apache. For a valid
comparison they should have used either Apache's proxy
module, or a local cache such as Squid. (Of course, since
they were not even testing Apache they could not use the
module).
Fourthly they used very different hardware for the Windows
and Unix based tests: for Windows, they used a dual processor
pentium pro system while they run the Unix server on an SGI
O2. A better test would have been to use the same hardware,
with a PC version of Unix (Linux, BSD or similar).
Fifthly they used a local 100Mhz ethernet for the testing.
Apache contains a number of features designed to make it work
more efficiently over real-world Internet connections where
the setup and transmission times are not negligible.
Lastly the report did not clearly identify what was being
measured. There are several ways to measure server
performance (such as hits per second, concurrent requests
processed or overall response time per request), all of which
are valid in some situations. They appear to have tested hits
per second, which on a very fast local network would tend to
reduce the number of concurrent requests the servers have to
process. In a real-life situation servers would have to cope
with requests that take longer to transmit, so they would
have more concurrent requests.
By comparison, a
Benchmark Report by BSDI compared MS IIS and Apache in
August 1996 and came out with very different results. While
the versions of the software they used are now out a date,
this report explains in detail how the servers were
configured and what tests were performed. They also used the
same hardware for both tests.
The easiest way to serve up pages is to store them in HTML
format files. This is simple and efficient. However there are
many cases where you might want to generate a page
'on-the-fly': to add information which changes on each
request, or to get information fromo a database. There are
many different ways that you can add this sort of "dynamic"
page to your site. In this feature we look at the range of
options, from simple in-line HTML to full programming
languages and CGI.
When choosing how to generate dynamic pages there are server
things to consider:
-
Performance: dynamic pages require more work on the
server, so are less efficient than static files, but some
types of dynamic pages are more resource efficient than
others.
-
Complexity: dynamic features can be generated from
relatively simple code build into HTML pages (called
"embedded"), through to self contained programs written in
C or perl, using the CGI interface.
-
Security: some methods of generating dynamic pages
allow you to use a programming or scripting language on
your server. There is a risk of letting users access things
on your system that they should not do if the pages are
poorly written.
Traditionally there were three ways of getting dynamic pages on
your site: use "server side includes" (SSI) inside HTML pages,
use a scripting language such as Perl or PHP, or use a compiled
programming language such as C or Pascal. Both scripts and
compiled programs were accessed using "CGI". But the
distinctions are becoming more blurred. SSI as implemented in
Apache 1.2 now has variables and conditional execution, making
it more like a scripting language, while the PHP scripting
language can be embedded into HTML pages. There is even a
module to embed perl commands into HTML pages.
Also, many scripting languages can be built into Apache as
Apache modules, rather than using CGI. This makes executing
the scripts much more efficient, since an interpreter does
not need to be started for very request.
Complexity
There are two ways to get the server to run your programs:
either embed a script into an HTML document, or create a
standalone program which makes use of the CGI interface.
Embedded scripts are easier to write but restrict you to the
languages available for embedding, while CGI can be used with
any language.
The traditional embedded language is "Server-Side Includes"
(SSI) but other scripting languages are available which can
be embeded. Embedded commands are executed by the server
before it serves the page to the client (so serving HTML
pages containing embedded commands is slower than serving
straight HTML pages). Embedded pages can be processed either
by an Apache module or a CGI program. Using a module will be
much faster. Languages available for embedded use include
SSI, PHP, Perl and NeoScript (of these, SSI is built into
Apache by default, while the others require a new module to
be compiled in).
The alternative to embedding the commands into HTML is to
write self-contained programs. These usually use the
CGI, or Common Gateway Interface, to work with the
server. The CGI
specification says how servers should talk to the script
or program and how the script or program formats its reply
for use by the server. CGI is not a language itself.
If you know the CGI protocol you can write programs for use
with a web server in any language.
Performance
If you want better performance from your pages (by
performance we mean low use of resources, resulting in more
pages served more quickly), you should use either a
pre-compiled language (such as C) and CGI, or a scripting
language which is available as an Apache module. In the case
of the perl and python modules, preload scripts or data that
will be used often.
Of course the best performance can be obtained by using
static pages instead of dynamic ones. You might consider
pre-generating HTML files, rather than serving up dynamic
pages if possible. For example, if your readers access pages
from a database, it might be faster to export those pages
into HTML every so often, rather than lookup the records in
the database for every request.
Alternatively (or in addition) consider using a local cache
in front of your Apache server. The client would connect to
the cache first, and if that page has already recently been
requested, the cache would return it without calling the
server. This sort of local cache is also called a "server
accelerator". Your dynamic pages will have to be setup to
allow them to be cached though (SSI pages, for example, are
not cacheable).
Security
Security is a very important considerable when thinking about
dynamic pages. All CGI programs, both scripted and compiled,
are potentially insecure. You have to be very careful when
writing CGI programs, for instance, to ensure that Internet
users cannot execute programs on your server or read files
they should not have access to.
Another security issue which might be important is related to
other local users. For example, you might want to let your
customers or colleagues use a dynamic language. But if you
let them write CGI programs they could write a program which
accesses other people's files (since by default all CGI
programs run as the same user). More limited scripted
languages (such as SSI) might be safer in this situation.
Dynamic Page Languages
Finally, here is a reference list of ways of including
dynamic pages on your site.
Language
|
Embedded?
|
Apache Module?
|
Description
|
SSI
|
Yes
|
Yes
|
Traditional "Server Side Includes" allow simple dynamic
pages. Apache 1.2 extends SSI to include variables and
conditional code. Already part of Apache. Because of the
restricted range of commands this can be more secure than
other languages, and Apache has the ability to turn off
some less secure features.
|
PHP
|
Yes
|
Yes
|
A more comprehensive embedded language than SSI, with
built-in support for various databases (such as mSQL,
mySQL, DBM), page counters.
|
NeoScript
|
Yes
|
Yes
|
An embedded scripting language based on Tcl.
|
Meta-HTML
|
Yes
|
No
|
An extended version of SSI.
|
Python
|
No
|
Yes
|
Python is an interpreted object-orientated language. This
module builds the Python interpreter into Apache for
better performance than normal CGI.
|
embedded
Perl (ePerl)
|
Yes
|
No
|
Perl is a powerful general purpose interpreted
(scripting) language. This module lets you embed
arbitrary Perl commands into your HTML.
|
Perl
Module
|
No
|
Yes
|
Perl is an advanced interpreted language. This very
powerful module integrates Perl into Apache, letting you
write Apache modules in perl. This gives you much more
access to and control over the server than CGI programs
in Perl (which this module also supports). The ability to
write modules in perl makes it possible to extend the
server's functionality relatively easily, without the
complexity of writing a module in C.
|
Compiled languages (C, Pascal, Fortran, etc)
|
No
|
No
|
Facilities available depend on language. Usually more
efficient than scripted or embedded languages. Has to be
written to use CGI protocol.
|
Scripting languages (Perl, Python, shell, etc)
|
No*
|
For Some Languages
|
Facilities available depend on language. Unless an Apache
module is available, has to be written to use CGI
protocol. When using CGI is less efficient that compiled
languages or scripting languages using an Apache module.
(Note: * Perl can be embedded if the eperl module is
used).
|
Summary
It is impossible to recommend the "best" dynamic page
language since what is best will depend on your needs.
However some general conclusions can be drawn.
If you do not already know a scripting or programming
language, use one of the embedded languages. SSI is probably
the simplest, but PHP has some useful extra features.
If you want a language than is quick to develop in and
efficient, use an embedded language such as PHP or embedded
perl, or use perl with the perl module. If you prefer other
scripting languages, use one with an Apache module (e.g.
python). If you already use perl CGI programs, consider
moving over to using the perl module, which will give you
much better performance and more control over the server.
If you want a "full" programming language for arbitary
programs, either use any compiled language (e.g. C) or use
perl with the perl module. If you've been put off Perl
because of concerns about performance, think again. The
module makes it very efficient, and the ease of development
and large range of add-on perl modules (packages) make
developing applications more convenient.
The final way to make a top-performance dynamic page is to
write an Apache module. This is complex and requires care to
ensure that you do not "leak" resources or affect the rest of
the server, but will give the best performance. Modules have
to be written in C (although it might be possible to
link in other languages). An alternative to writing modules
in C is to use the perl module, which lets you develop Apache
modules in perl.
|