HTTP/1.1:
HTTP/1.1 is a major revision of the HTTP standard, which defines how browsers, servers and proxies communicate.
First published: 16th August 1996
The Hypertext Transfer
Protocol
From version 1.2, Apache was be fully compliant
with the new HTTP/1.1 specification. This is the protocol which
tells browsers and servers how to communicate, and the features
added here determine how Web pages can be accessed. We take a
look at what HTTP/1.1 includes and what changes it will bring
to browsers and servers.
Part of Apache Week issue 28 (16th August
1996).
Hypertext Transfer Protocol (HTTP) defines how Web
pages are requested and transmitted across the Internet.
Almost all servers and browsers currently use version 1.0 of
this protocol, but a major update, version 1.1, has been
released. HTTP/1.1 adds a lot of new features to HTTP, which
in turn will lead to new capabilities in both servers and
browsers. We look at what is new in 1.1 and how it is likely
to affect the Web.
History of HTTP
HTTP was initially a very simple protocol used to request
pages from a server. The browser would connect to the server
and send a command like:
GET /welcome.html
and the server would respond with the contents of the
requested file. There were no request headers, no methods
other than GET, and the response had to be a HTML document.
This protocol was first documented as
HTTP/0.9. All current servers are capable of
understanding and handling HTTP/0.9 requests, but the
protocol is so basic it is not very useful today.
Browsers and servers extended the HTTP protocol from 0.9 with
new features such as request headers and additional request
methods. The resulting HTTP/1.0 protocol was only officially
documented in early 1996 with the release of RFC1945.
Servers and browsers having been using HTTP/1.0 for several
years.
Even while 1.0 was being documented, the next version was in
serious development. This time the specification was
developed first. This new version, 1.1, is now available as
RFC2068.
HTTP/1.1 will include a lot of new features, and will also
document for the first time some features already found in
servers or browsers.
A Quick Guide to how HTTP Works
Knowing how HTTP works is very useful for a server
administrator. It lets you check out the operation of your
server without having to fire up a browser, and gives you a
very useful diagnostic tool to check in detail how the server
responds to individual requests.
You can use telnet to emulate how a browser requests
documents from a server. With telnet you can connect to the
server, issue a request, and see what the server responds
with. For example, to get the home page from
www.apacheweek.com, you would use:
% telnet www.apacheweek.com 80
Connected to www.apacheweek.com.
GET / HTTP/1.0 [RETURN]
[RETURN]
This assumes you are connecting from a Unix system, starting
at the command prompt (%) and with a telnet command
available. You could also use any other telnet program such
as the one in Windows 95. The text in bold is what you type.
The standard port for Web requests is port 80, so we connect
to that port number. Once connected we can type in and send a
HTTP request, followed by the request headers. In this case,
the request is GET / HTTP/1.0. The / is the resource
we want to obtain, and the HTTP/1.0 tells the server that
this is a HTTP/1.0 request. After entering this line, press
RETURN twice - the first ends the request line, and the
second marks the end of the optional request headers (in this
case, we did not enter any request headers). The server will
respond by sending a number of response headers, followed by
the text of the requested document.
It is often more convenient to send a 'HEAD' request instead
of 'GET'. This makes the server behave exactly as if it was
handling a GET, but it doesn't bother to send the actual
document. This makes it much easier to see the response
headers, and means you do not have to wait to download the
document itself. For example, to see what response headers
that www.apacheweek.com sends for /, use:
HEAD / HTTP/1.0
HTTP/1.0 200 OK
Date: Fri, 16 Aug 1996 11:48:52 GMT
Server: Apache/1.1.1 UKWeb/1.0
Content-type: text/html
Content-length: 3406
Last-modified: Fri, 09 Aug 1996 14:21:40 GMT
Connection closed by foreign host.
The first response line is the status - in this case '200'
means the request is okay. The rest are response headers,
which give information either about the server or the
resource. For example, Server: gives the server
version, and Last-Modified: is the last modification
date of the file.
New in HTTP/1.1
The basic operation of HTTP/1.1 remains the same as for
HTTP/1.0, and the protocol ensures that browsers and servers
of different versions can all interoperate correctly. If the
browser understands version 1.1, it uses HTTP/1.1 on the
request line instead of HTTP/1.0. When the server sees this
is knows it can make use of new 1.1 features (if a 1.1 server
sees a lower version, it must adjust its response to use that
protocol instead).
HTTP/1.1 contains a lot of new facilities, the main ones are:
hostname identification, content negotiation, persistent
connections, chunked transfers, byte ranges and support for
proxies and caches.
Hostname Identification
Every request sent using HTTP/1.1 must identify the hostname
of the request. For example, if the URL
http://www.apache.org/ is used, the request must include the
fact that the hostname part is 'www.apache.org'. In previous
versions of HTTP, the server never knew the hostname used in
the URL. Letting the server see the hostname allows the
implementation of non-IP virtual hosts. For example, if two
names, www.apache.org and www.someoneelse.com, point to the
same IP address, a HTTP/1.1 server can use the hostname it
receives to return different content for each request.
HTTP/1.0 servers cannot differentiate between these two
requests.
The hostname must be passed to the server either as a full
URI on the request line, or on the new Host: header.
For example, to test how www.apache.org responds to a
HTTP/1.1 request, you could send
GET / HTTP/1.1
Host: www.apache.org
Note that the HTTP version on the GET request is now
'HTTP/1.1'. If the URI does not include the hostname on the
Host: header the server will respond with an error.
Content Negotiation
Content Negotiation refers to the ability to have a number of
different versions of a single resource. For example, a
document might be available in English and French, with each
of these available as either HTML or PDF. The possible
responses are called representations or
variants.
There are actually two sorts of content negotiation:
-
Server-driven Negotiation
Here the server decides (or guesses) on the best
representation to send to the browser, based on information
the browser provides in the request
-
Agent-driven Negotiation
Here the server does not guess on the best representation,
but instead returns of list of the representations it has.
The browser can then either automatically request one of
these, or present a choice to the use.
The first type, server negotiation, has been implemented in
Apache since the summer of 1995 and is explained in a special
feature from Apache Week
issue 25. However, the HTTP/1.1 specification is the
first place it is officially documented.
The second type, agent negotiation, is not fully documented.
The HTTP/1.1 specification just contains basic definitions of
some of the headers to be used, but no details. The details
of content negotiation are being specified in an
Internet draft. This draft also expands on how
server-driver negotiation works, and defines how caches can
perform negotiation on behalf of either the server or the
user agent.
Persistent Connections
Many pages today include inlined documents, usually images
but increasingly also sounds and other types such as
Shockwave presentations. These pages can be slow to download
because each item needs to be requested separately from the
server, each on a separate connection. Typically, for each
inline document the browser needs to connect to the server,
ask for the document, wait for it to be received, and
disconnect from the server. (Although some browsers can do
multiple requests in parallel).
This can be slow, especially across the Internet when there
is a delay involved in each connection and disconnection. To
help make pages with inline documents quicker to download,
HTTP/1.1 defines persistent connections where a number
of documents can be requested over a single connection, one
at a time.
An early implementation of persistent connections was known
as keep-alive, and Apache as well as a number of other
servers and browsers support this sort of connection.
However, persistent connections are first officially
documented in HTTP/1.1, and will be implemented slightly
differently from keep-alives.
For a start, in HTTP/1.1, persistent connections are the
default. Unless the browser explicitly tells the server not
to use persistent connections, the server should assume that
it might be getting multiple requests on a single connection.
Persistent connections are controlled by the
Connection header. Unless a
Connection: close header is given, the connection will remain open.
This can be tested by connecting to www.apache.org and
sending a simple request, for example:
% telnet www.apache.org 80
HEAD / HTTP/1.1
Host: www.apache.org
HTTP/1.1 200 OK
Server: Apache/1.3.0
...
where the connection will remain open for a short period
before closing (this is a server-configurable time out). If
the same request is sent with a Connection: close
header the connection will close immediately after the
request headers have been sent.
Chunked Transfers
Normally, when sending back a response the sever has to know
everything about the response it is about to send before it
sends it. For instance, servers should set the
Content-Length header on each response to the length
of the response itself. This can be difficult for the server
to do if the content is dynamically created (e.g. if it is
the output of a CGI script). So in practice servers
(including Apache) often do not send a Content-Length with
dynamic documents. This has not been a problem with HTTP/1.0,
but for persistent connections to work in HTTP/1.1, the
Content-Length must be known in advance.
The server could find out the length of the output of a CGI
script by reading it into memory until the script has
finished, then setting the Content-Length and returning the
stored content. This might be acceptable for small content,
but could be a problem if the CGI produces a lot of output.
One possible way around this is to use the new chunked
encoding method. This lets the server send output a bit at a
time. Each bit (or chunk) is small enough for its
content-length to be known before it is sent. Using chunked
encoding will let servers send out dynamic content that is
either large or produced slowly without having to disable
persistent connections.
In addition, after a chunked-encoded document has been
completely sent, additional response headers can be
transmitted. This could allow dynamically produced headers to
be associated with the document, even if they are not
available until after the script (or whatever produced the
document) has finished.
Byte Ranges
Byte ranges allow browsers to request parts of
documents. This can be used to continue an interrupted
transfer, or to obtain just part of a long document (say, a
single page).
Byte ranges are implemented by the Range header. For
example, to request just the second 500-bytes of a document,
the request would include:
Range: bytes=500-999
A single request can also ask for more than one range at once
(for example, it could ask for the first 500 bytes and the
last 500 bytes of a file). When the server replies, it will
send back each part in a single response, using MIME
multipart encoding to distinguish the parts.
Proxies and Caches
HTTP/1.1 includes a lot of information and new features for
people implementing proxies and caches. Until now, the
operation of proxies and caches has been largely
undocumented. In addition to documenting how they are
supposed to work, HTTP/1.1 also includes a range of new
features to make implementing proxies and caches easier, and
in particular to reduce network traffic by allowing proxies
and caches to send more 'conditional' requests and to do
transparent content negotiation.
A conditional request is like a normal request, except
the sender (the proxy or cache server) includes some
information about whether it really needs the document. For
example, a proxy or cache can send an entity-tag which
identifies a document it already has, and the server only
sends back the document if the cache does not already have
this document. Conditional requests can also be based on the
last-modified time of the document.
Other Changes
There are a lot of other changes between 1.0 and 1.1,
including
-
More status response codes
-
New request methods: OPTIONS, TRACE, DELETE, PUT
-
Digest authentication
-
Various new headers such as Retry-After: and Max-Forwards:
-
Definition of the media types message/http and
multipart/byteranges
How this will Affect Servers
and Browsers
Users of the Web will notice the following major changes when
browsers and servers are available which implement HTTP/1.1:
-
Non-IP virtual Hosts
Virtual hosts can be used without needing additional IP
addresses.
-
Content Negotiation means more content types and better
selection
Using content negotiation means that resources can be
stored in various formats, and the browser automatically
gets the 'best' one (e.g. the correct language). If a best
match cannot be determined, the browser or server can offer
a list of choices to the user.
-
Faster Response
Persistent connections will mean that accessing pages with
inline or embedded documents should be quicker.
-
Better handling of interrupted downloads
The ability to request byte ranges will let browsers
continue interrupted downloads.
-
Better Behaviour and Performance from Caches
Caches will be able to use persistent connections to
increase performance both when talking to browsers and
servers. Use of conditionals and content negotiation will
mean caches can identify responses quicker.
|