First published: 26th July 1996
Content Negotiation
Explained
Content Negotiation is an often over-looked feature of
Apache, but correctly used it can let you present documents
in different languages and formats based on what the user
wants. Apache is one of the few servers that actually
implements content negotiation. However there are a few
problems caused by browsers which do not do the right thing.
We explain how to use negotiation correctly, and why some
browsers make this difficult.
Why content negotiation is needed
Content negotiation is a very powerful tool where the browser
says what type of information it can accept, and the server
decides what (if any) type of information to return. The term
type is used very loosely here, because negotiation
can apply to several aspects of the information. For example,
it can be used to choose the appropriate human language for a
document (say, French or German), or to choose the media type
that the browser can display (say, GIF or JPEG).
In order for the server to deliver the correct representation
of the data, the browser must send some information about
what it can accept. A browser used on a French-language
machine, for instance, should indicate that it can accept
data in French (of course, this should also be
user-configurable).
The most common use of content negotiation at the moment is
to select data based on media type. Here, the browser says
what sort of data it can display. For example, when
requesting an inline image, the browser could tell the server
that it can accept GIF and JPEG images. Infact, the browser
might prefer to JPEG over GIF images because they are quicker
to download, so it can specify this as well. The ability to
indicate what content types a browser can accept is
particularily important now that plug-ins can extend the
browser capabilities. Unfortunately many current browsers
don't supply the correct information to the server.
Using Negotiation
To use negotiation, you need two things. Firstly, you need a
resource that exists in more than one format (for example, a
document in French and German, or an image stored as a GIF
and a JPEG), and secondly you need to configure Apache to
know that each of these files is actually the same resource.
Apache has two methods for doing this: either using a special
index file to identify the various versions of the
information, or using the MultiViews facility where
Apache gets the information it needs from file extensions.
Using a Variants File
The first method involves creating a variants file,
usually referred to as a var file. This lists each of
the files which contains the same resource, along with
details of what representation it is. Any request for this
var file causes Apache to return the best file, based on the
contents of the var file and the information supplied by the
browser.
To get Apache to use variant files, first uncomment the
following line in srm.conf:
AddHandler type-map var
and restart the server as normal.
As an example, say there is a file in English and a file in
German containing the same information. The files could be
called english.html and german.html (they are both HTML
files). So create a var file listing each of these files, and
specifying which languages they are in. Create a var file
called (say) info.var containing:
URI: english.html
Content-Language: en
URI: german.html
Content-Language: de
This file consists of a series of sections, separated by
blank lines. Each section contains the name of the file (on
the URI: line) and header information used in the
negotiation.
Now, when a request for info.var is received, the server will
read the var file and return the best file, based on which
languages the browser has said it can accept. Similarly, the
var file could be used to select files based on content type
(using Content-Type:) or content encoding (using
Content-Encoding:), or any combination.
The Content-Type: line in a variants file can also give any
other content type parameters, such as the subjective qualify
factor. This will be used in the negotation when picking the
'best' match. For example, an image available as a JPEG might
be regarded as having higher quality then the same image in
GIF format. To tell this to the server, the following .var
contents could be used:
URI: image.jpg
Content-Type: image/jpeg; qs=0.6
URI: image.gif
Content-Type: image/gif; qs=0.4
Here the qs parameters give the 'source quality' for
these two files, in the range 0.000 to 1.000, with the
highest value being the most desirable. A browser than
indicates it can handle both GIF and JPEG files equally would
see the JPEG version rather than the GIF.
Using variant files gives complete control over the scope of
the negotiation, however it does require the file to be
created and maintained for each resource. An alternative
interface to the negotiation mechanism is to get Apache to
identify the negotiation parameters (language, content type,
encoding) from the file extensions.
Using File Extensions
Instead of using a var file, file extensions can be used to
identify the content of files. For example, the extension
eng could be used on English files, and ger
on German files. Then the AddLanguage directive can
be used to map these extensions onto the standard language
tags.
To use this feature, the MultiViews option must
first be turned on in the directory, either in access.conf or
a .htaccess file. Note that Options All does
not turn on multiviews.
After enabling multiviews, the directives which map
extensions onto representation types can be given. These are
AddLanguage, AddEncoding and
AddType (content types are also set in the
mime.types file). For example:
AddLanguage en .eng
AddLanguage de .ger
AddEncoding x-compress .Z
AddType application/pdf pdf
(the last line is shown as an example only, this is actually
set in the mime.types on recent Apache versions).
When a request is received, the server looks at all the files
in the directory which start with the same filename.
So a request for /about/info would cause the server to
negotiate between all the files names /about/info.*
For each matching file, the server checks its extensions and
sets the content type, language and encodings appropriately.
For example, a file called info.eng.html would be
associated with the language tag en and the content
type text/html. The source quality is assumed to be
1.000 for all files (this can actually be set on the mime
type, like "text/html;qs=0.5" but this confuses most browsers
so is probably best not used).
The extensions can be listed in any order, and the request
itself can include one or more extensions. For example, the
files info.html.eng and info.html.ger could be requested with
the URL info.html. This provides an easy way to
upgrade a site to use negotiation without having to change
existing links.
Of course, for negotiation to work browsers must send the
correct information. While most make a reasonable attempt
there are some problems.
What Browsers Do
For negotiation to work, browsers must send the correct
request information. For human languages, browsers should let
the user pick what lanuguage or languages they are interested
in. Recent betas versions of Netscape let the user select one
or more languages (see the Options, General Preferences,
Languages section).
For content-types, the browser should send a list of types it
can accept. For example, "text/html, text/plain, image/jpeg,
image/gif". Most browsers also add the catch-all type of
"*/*" to indicate that they can accept any content type. The
server treats this entry with lower priority than a direct
match.
Unfortunately, the */* type is sometimes used instead of
listing explicitly acceptable types. For example, if the
Adobe Acrobat Reader plug-in is installed into Netscape,
Netscape should add application/pdf to its acceptable content
types. This would let the server transparently send the most
appropriate content type (PDF files to suitable browsers,
else HTML). Netscape does not send the content types it can
accept, instead relying on the */* catch-all. This makes
transparent content-negotiation impossible.
In addition, most browsers do not indicate a preferences for
particular types. This should be done by adding a preference
factor (q) to the content type. For example, a browser
which can accept Acrobat files might prefer them to HTML, so
it could send an accept type list which includes
text/html: q=0.7, application/pdf: q=0.8. When the
server handles the request, it would combine this information
with its source quality information (if any) to pick the
'best' content type to return.
HTTP/1.1
The new HTTP/1.1 specification defines how content
negotiation works for the first time. It also adds some new
facilities which are not yet available in any browser or
server. This includes the ability for the server to return a
list of possible matches if it cannot identify the best one
to use. Apache implements the server end of HTTP/1.1 content
negotiation.
|