Skip to content

Content Disposition Header Example For Essay

[Docs] [txt|pdf] [draft-ietf-http...] [Tracker] [Diff1] [Diff2] [Errata]

PROPOSED STANDARD
Errata Exist
Internet Engineering Task Force (IETF) J. Reschke Request for Comments: 6266 greenbytes Updates: 2616 June 2011 Category: Standards Track ISSN: 2070-1721 Use of the Content-Disposition Header Field in theHypertext Transfer Protocol (HTTP) Abstract RFC 2616 defines the Content-Disposition response header field, but points out that it is not part of the HTTP/1.1 Standard. This specification takes over the definition and registration of Content- Disposition, as used in HTTP, and clarifies internationalization aspects. Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc6266. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Reschke Standards Track [Page 1]
RFC 6266 Content-Disposition in HTTP June 2011 Table of Contents 1. Introduction ....................................................22. Notational Conventions ..........................................33. Conformance and Error Handling ..................................34. Header Field Definition .........................................34.1. Grammar ....................................................44.2. Disposition Type ...........................................54.3. Disposition Parameter: 'Filename' ..........................54.4. Disposition Parameter: Extensions ..........................64.5. Extensibility ..............................................75. Examples ........................................................76. Internationalization Considerations .............................87. Security Considerations .........................................88. IANA Considerations .............................................88.1. Registry for Disposition Values and Parameters .............88.2. Header Field Registration ..................................89. Acknowledgements ................................................910. References .....................................................910.1. Normative References ......................................910.2. Informative References ....................................9Appendix A. Changes from the RFC 2616 Definition ..................11Appendix B. Differences Compared to RFC 2183 ......................11Appendix C. Alternative Approaches to Internationalization ........11C.1. RFC 2047 Encoding ..........................................12C.2. Percent Encoding ...........................................12C.3. Encoding Sniffing ..........................................12Appendix D. Advice on Generating Content-Disposition Header Fields ................................................131. IntroductionRFC 2616 defines the Content-Disposition response header field (Section 19.5.1 of [RFC2616]) but points out that it is not part of the HTTP/1.1 Standard (Section 15.5): Content-Disposition is not part of the HTTP standard, but since it is widely implemented, we are documenting its use and risks for implementers. This specification takes over the definition and registration of Content-Disposition, as used in HTTP. Based on interoperability testing with existing user agents (UAs), it fully defines a profile of the features defined in the Multipurpose Internet Mail Extensions (MIME) variant ([RFC2183]) of the header field, and also clarifies internationalization aspects. Reschke Standards Track [Page 2]
RFC 6266 Content-Disposition in HTTP June 2011 Note: This document does not apply to Content-Disposition header fields appearing in payload bodies transmitted over HTTP, such as when using the media type "multipart/form-data" ([RFC2388]). 2. Notational Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. This specification uses the augmented BNF (ABNF) notation defined in Section 2.1 of [RFC2616], including its rules for implied linear whitespace (LWS). 3. Conformance and Error Handling This specification defines conformance criteria for both senders (usually, HTTP origin servers) and recipients (usually, HTTP user agents) of the Content-Disposition header field. An implementation is considered conformant if it complies with all of the requirements associated with its role. This specification also defines certain forms of the header field value to be invalid, using both ABNF and prose requirements (Section 4), but it does not define special handling of these invalid field values. Senders MUST NOT generate Content-Disposition header fields that are invalid. Recipients MAY take steps to recover a usable field value from an invalid header field, but SHOULD NOT reject the message outright, unless this is explicitly desirable behavior (e.g., the implementation is a validator). As such, the default handling of invalid fields is to ignore them. 4. Header Field Definition The Content-Disposition response header field is used to convey additional information about how to process the response payload, and also can be used to attach additional metadata, such as the filename to use when saving the response payload locally. Reschke Standards Track [Page 3]
RFC 6266 Content-Disposition in HTTP June 20114.1. Grammar content-disposition = "Content-Disposition" ":" disposition-type *( ";" disposition-parm ) disposition-type = "inline" | "attachment" | disp-ext-type ; case-insensitive disp-ext-type = token disposition-parm = filename-parm | disp-ext-parm filename-parm = "filename" "=" value | "filename*" "=" ext-value disp-ext-parm = token "=" value | ext-token "=" ext-value ext-token = <the characters in token, followed by "*"> Defined in [RFC2616]: token = <token, defined in [RFC2616], Section 2.2> quoted-string = <quoted-string, defined in [RFC2616], Section 2.2> value = <value, defined in [RFC2616], Section 3.6> ; token | quoted-string Defined in [RFC5987]: ext-value = <ext-value, defined in [RFC5987], Section 3.2> Content-Disposition header field values with multiple instances of the same parameter name are invalid. Note that due to the rules for implied linear whitespace (Section 2.1 of [RFC2616]), OPTIONAL whitespace can appear between words (token or quoted-string) and separator characters. Furthermore, note that the format used for ext-value allows specifying a natural language (e.g., "en"); this is of limited use for filenames and is likely to be ignored by recipients. Reschke Standards Track [Page 4]
RFC 6266 Content-Disposition in HTTP June 20114.2. Disposition Type If the disposition type matches "attachment" (case-insensitively), this indicates that the recipient should prompt the user to save the response locally, rather than process it normally (as per its media type). On the other hand, if it matches "inline" (case-insensitively), this implies default processing. Therefore, the disposition type "inline" is only useful when it is augmented with additional parameters, such as the filename (see below). Unknown or unhandled disposition types SHOULD be handled by recipients the same way as "attachment" (see also [RFC2183], Section 2.8). 4.3. Disposition Parameter: 'Filename' The parameters "filename" and "filename*", to be matched case- insensitively, provide information on how to construct a filename for storing the message payload. Depending on the disposition type, this information might be used right away (in the "save as..." interaction caused for the "attachment" disposition type), or later on (for instance, when the user decides to save the contents of the current page being displayed). The parameters "filename" and "filename*" differ only in that "filename*" uses the encoding defined in [RFC5987], allowing the use of characters not present in the ISO-8859-1 character set ([ISO-8859-1]). Many user agent implementations predating this specification do not understand the "filename*" parameter. Therefore, when both "filename" and "filename*" are present in a single header field value, recipients SHOULD pick "filename*" and ignore "filename". This way, senders can avoid special-casing specific user agents by sending both the more expressive "filename*" parameter, and the "filename" parameter as fallback for legacy recipients (see Section 5 for an example). Reschke Standards Track [Page 5]
RFC 6266 Content-Disposition in HTTP June 2011 It is essential that recipients treat the specified filename as advisory only, and thus be very careful in extracting the desired information. In particular: o Recipients MUST NOT be able to write into any location other than one to which they are specifically entitled. To illustrate the problem, consider the consequences of being able to overwrite well-known system locations (such as "/etc/passwd"). One strategy to achieve this is to never trust folder name information in the filename parameter, for instance by stripping all but the last path segment and only considering the actual filename (where 'path segments' are the components of the field value delimited by the path separator characters "\" and "/"). o Many platforms do not use Internet Media Types ([RFC2046]) to hold type information in the file system, but rely on filename extensions instead. Trusting the server-provided file extension could introduce a privilege escalation when the saved file is later opened (consider ".exe"). Thus, recipients that make use of file extensions to determine the media type MUST ensure that a file extension is used that is safe, optimally matching the media type of the received payload. o Recipients SHOULD strip or replace character sequences that are known to cause confusion both in user interfaces and in filenames, such as control characters and leading and trailing whitespace. o Other aspects recipients need to be aware of are names that have a special meaning in the file system or in shell commands, such as "." and "..", "~", "|", and also device names. Recipients SHOULD ignore or substitute names like these. Note: Many user agents do not properly handle the escape character "\" when using the quoted-string form. Furthermore, some user agents erroneously try to perform unescaping of "percent" escapes (see Appendix C.2), and thus might misinterpret filenames containing the percent character followed by two hex digits. 4.4. Disposition Parameter: Extensions To enable future extensions, recipients SHOULD ignore unrecognized parameters (see also [RFC2183], Section 2.8). Reschke Standards Track [Page 6]
RFC 6266 Content-Disposition in HTTP June 20114.5. Extensibility Note that Section 9 of [RFC2183] defines IANA registries both for disposition types and disposition parameters. This registry is shared by different protocols using Content-Disposition, such as MIME and HTTP. Therefore, not all registered values may make sense in the context of HTTP. 5. Examples Direct the UA to show "save as" dialog, with a filename of "example.html": Content-Disposition: Attachment; filename=example.html Direct the UA to behave as if the Content-Disposition header field wasn't present, but to remember the filename "an example.html" for a subsequent save operation: Content-Disposition: INLINE; FILENAME= "an example.html" Note: This uses the quoted-string form so that the space character can be included. Direct the UA to show "save as" dialog, with a filename containing the Unicode character U+20AC (EURO SIGN): Content-Disposition: attachment; filename*= UTF-8''%e2%82%ac%20rates Here, the encoding defined in [RFC5987] is also used to encode the non-ISO-8859-1 character. This example is the same as the one above, but adding the "filename" parameter for compatibility with user agents not implementing RFC 5987: Content-Disposition: attachment; filename="EURO rates"; filename*=utf-8''%e2%82%ac%20rates Note: Those user agents that do not support the RFC 5987 encoding ignore "filename*" when it occurs after "filename". Reschke Standards Track [Page 7]
RFC 6266 Content-Disposition in HTTP June 20116. Internationalization Considerations The "filename*" parameter (Section 4.3), using the encoding defined in [RFC5987], allows the server to transmit characters outside the ISO-8859-1 character set, and also to optionally specify the language in use. Future parameters might also require internationalization, in which case the same encoding can be used. 7. Security Considerations Using server-supplied information for constructing local filenames introduces many risks. These are summarized in Section 4.3. Furthermore, implementers ought to be aware of the security considerations applying to HTTP (see Section 15 of [RFC2616]), and also the parameter encoding defined in [RFC5987] (see Section 5). 8. IANA Considerations8.1. Registry for Disposition Values and Parameters This specification does not introduce any changes to the registration procedures for disposition values and parameters that are defined in Section 9 of [RFC2183]. 8.2. Header Field Registration This document updates the definition of the Content-Disposition HTTP header field in the permanent HTTP header field registry (see [RFC3864]). Header field name: Content-Disposition Applicable protocol: http Status: standard Author/Change controller: IETF Specification document: this specification (Section 4) Related information: none Reschke Standards Track [Page 8]
RFC 6266 Content-Disposition in HTTP June 20119. Acknowledgements Thanks to Adam Barth, Rolf Eike Beer, Stewart Bryant, Bjoern Hoehrmann, Alfred Hoenes, Roar Lauritzsen, Alexey Melnikov, Henrik Nordstrom, and Mark Nottingham for their valuable feedback. 10. References10.1. Normative References [ISO-8859-1] International Organization for Standardization, "Information technology -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1", ISO/IEC 8859-1:1998, 1998. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [RFC5987] Reschke, J., "Character Set and Language Encoding for Hypertext Transfer Protocol (HTTP) Header Field Parameters", RFC 5987, August 2010. 10.2. Informative References [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996. [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text", RFC 2047, November 1996. [RFC2183] Troost, R., Dorner, S., and K. Moore, Ed., "Communicating Presentation Information in Internet Messages: The Content-Disposition Header Field", RFC 2183, August 1997. [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations", RFC 2231, November 1997. [RFC2388] Masinter, L., "Returning Values from Forms: multipart/ form-data", RFC 2388, August 1998. Reschke Standards Track [Page 9]
RFC 6266 Content-Disposition in HTTP June 2011 [RFC3864] Klyne, G., Nottingham, M., and J. Mogul, "Registration Procedures for Message Header Fields", BCP 90, RFC 3864, September 2004. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. [US-ASCII] American National Standards Institute, "Coded Character Set -- 7-bit American Standard Code for Information Interchange", ANSI X3.4, 1986. Reschke Standards Track [Page 10]
RFC 6266 Content-Disposition in HTTP June 2011Appendix A. Changes from the RFC 2616 Definition Compared to Section 19.5.1 of [RFC2616], the following normative changes reflecting actual implementations have been made: o According to RFC 2616, the disposition type "attachment" only applies to content of type "application/octet-stream". This restriction has been removed, because recipients in practice do not check the content type, and it also discourages properly declaring the media type. o RFC 2616 only allows "quoted-string" for the filename parameter. This would be an exceptional parameter syntax, and also doesn't reflect actual use. o The definition for the disposition type "inline" ([RFC2183], Section 2.1) has been re-added with a suggestion for its processing. o This specification requires support for the extended parameter encoding defined in [RFC5987]. Appendix B. Differences Compared to RFC 2183Section 2 of [RFC2183] defines several additional disposition parameters: "creation-date", "modification-date", "quoted-date-time", and "size". The majority of user agents do not implement these; thus, they have been omitted from this specification. Appendix C. Alternative Approaches to Internationalization By default, HTTP header field parameters cannot carry characters outside the ISO-8859-1 ([ISO-8859-1]) character encoding (see [RFC2616], Section 2.2). For the "filename" parameter, this of course is an unacceptable restriction. Unfortunately, user agent implementers have not managed to come up with an interoperable approach, although the IETF Standards Track specifies exactly one solution ([RFC2231], clarified and profiled for HTTP in [RFC5987]). For completeness, the sections below describe the various approaches that have been tried, and explain how they are inferior to the RFC 5987 encoding used in this specification. Reschke Standards Track [Page 11]
RFC 6266 Content-Disposition in HTTP June 2011C.1. RFC 2047 EncodingRFC 2047 defines an encoding mechanism for header fields, but this encoding is not supposed to be used for header field parameters -- see Section 5 of [RFC2047]: An 'encoded-word' MUST NOT appear within a 'quoted-string'. ... An 'encoded-word' MUST NOT be used in parameter of a MIME Content- Type or Content-Disposition field, or in any structured field body except within a 'comment' or 'phrase'. In practice, some user agents implement the encoding, some do not (exposing the encoded string to the user), and some get confused by it. C.2. Percent Encoding Some user agents accept percent-encoded ([RFC3986], Section 2.1) sequences of characters. The character encoding being used for decoding depends on various factors, including the encoding of the referring page, the user agent's locale, its configuration, and also the actual value of the parameter. In practice, this is hard to use because those user agents that do not support it will display the escaped character sequence to the user. For those user agents that do implement this, it is difficult to predict what character encoding they actually expect. C.3. Encoding Sniffing Some user agents inspect the value (which defaults to ISO-8859-1 for the quoted-string form) and switch to UTF-8 when it seems to be more likely to be the correct interpretation. As with the approaches above, this is not interoperable and, furthermore, risks misinterpreting the actual value. Reschke Standards Track [Page 12]
RFC 6266 Content-Disposition in HTTP June 2011Appendix D. Advice on Generating Content-Disposition Header Fields To successfully interoperate with existing and future user agents, senders of the Content-Disposition header field are advised to: o Include a "filename" parameter when US-ASCII ([US-ASCII]) is sufficiently expressive. o Use the 'token' form of the filename parameter only when it does not contain disallowed characters (e.g., spaces); in such cases, the quoted-string form should be used. o Avoid including the percent character followed by two hexadecimal characters (e.g., %A9) in the filename parameter, since some existing implementations consider it to be an escape character, while others will pass it through unchanged. o Avoid including the "\" character in the quoted-string form of the filename parameter, as escaping is not implemented by some user agents, and "\" can be considered an illegal path character. o Avoid using non-ASCII characters in the filename parameter. Although most existing implementations will decode them as ISO-8859-1, some will apply heuristics to detect UTF-8, and thus might fail on certain names. o Include a "filename*" parameter where the desired filename cannot be expressed faithfully using the "filename" form. Note that legacy user agents will not process this, and will fall back to using the "filename" parameter's content. o When a "filename*" parameter is sent, to also generate a "filename" parameter as a fallback for user agents that do not support the "filename*" form, if possible. This can be done by substituting characters with US-ASCII sequences (e.g., Unicode character point U+00E4 (LATIN SMALL LETTER A WITH DIARESIS) by "ae"). Note that this may not be possible in some locales. o When a "filename" parameter is included as a fallback (as per above), "filename" should occur first, due to parsing problems in some existing implementations. o Use UTF-8 as the encoding of the "filename*" parameter, when present, because at least one existing implementation only implements that encoding. Reschke Standards Track [Page 13]
RFC 6266 Content-Disposition in HTTP June 2011 Note that this advice is based upon UA behavior at the time of writing, and might be superseded. At the time of publication of this document, <http://purl.org/NET/http/content-disposition-tests> provides an overview of current levels of support in various implementations. Author's Address Julian F. Reschke greenbytes GmbH Hafenweg 16 Muenster, NW 48155 Germany EMail: julian.reschke@greenbytes.de URI: http://greenbytes.de/tech/webdav/ Reschke Standards Track [Page 14]
Html markup produced by rfcmarkup 1.126, available from https://tools.ietf.org/tools/rfcmarkup/

Whether you're a programmer or not, you have seen it everywhere on the web. At this moment your browsers address bar shows something that starts with "http://". Even your first Hello World script sent HTTP headers without you realizing it. In this article we are going to learn about the basics of HTTP headers and how we can use them in our web applications.

What are HTTP Headers?

HTTP stands for "Hypertext Transfer Protocol". The entire World Wide Web uses this protocol. It was established in the early 1990's. Almost everything you see in your browser is transmitted to your computer over HTTP. For example, when you opened this article page, your browser probably have sent over 40 HTTP requests and received HTTP responses for each.

HTTP headers are the core part of these HTTP requests and responses, and they carry information about the client browser, the requested page, the server and more.

Example

When you type a url in your address bar, your browser sends an HTTP request and it may look like this:

GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1 Host: net.tutsplus.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: PHPSESSID=r2t5uvjq435r4q7ib3vtdjq120 Pragma: no-cache Cache-Control: no-cache

First line is the "Request Line" which contains some basic info on the request. And the rest are the HTTP headers.

After that request, your browser receives an HTTP response that may look like this:

HTTP/1.x 200 OK Transfer-Encoding: chunked Date: Sat, 28 Nov 2009 04:36:25 GMT Server: LiteSpeed Connection: close X-Powered-By: W3 Total Cache/0.8 Pragma: public Expires: Sat, 28 Nov 2009 05:36:25 GMT Etag: "pub1259380237;gz" Cache-Control: max-age=3600, public Content-Type: text/html; charset=UTF-8 Last-Modified: Sat, 28 Nov 2009 03:50:37 GMT X-Pingback: http://net.tutsplus.com/xmlrpc.php Content-Encoding: gzip Vary: Accept-Encoding, Cookie, User-Agent <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Top 20+ MySQL Best Practices - Nettuts+</title> <!-- ... rest of the html ... -->

The first line is the "Status Line", followed by "HTTP headers", until the blank line. After that, the "content" starts (in this case, an HTML output).

When you look at the source code of a web page in your browser, you will only see the HTML portion and not the HTTP headers, even though they actually have been transmitted together as you see above.

These HTTP requests are also sent and received for other things, such as images, CSS files, JavaScript files etc. That is why I said earlier that your browser has sent at least 40 or more HTTP requests as you loaded just this article page.

Now, let's start reviewing the structure in more detail.

How to See HTTP Headers

I use the following Firefox extensions to analyze HTTP headers:

In PHP:

Further in the article, we will see some code examples in PHP.

HTTP Request Structure

The first line of the HTTP request is called the request line and consists of 3 parts:

  • The "method" indicates what kind of request this is. Most common methods are GET, POST and HEAD.
  • The "path" is generally the part of the url that comes after the host (domain). For example, when requesting "http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/" , the path portion is "/tutorials/other/top-20-mysql-best-practices/".
  • The "protocol" part contains "HTTP" and the version, which is usually 1.1 in modern browsers.

The remainder of the request contains HTTP headers as "Name: Value" pairs on each line. These contain various information about the HTTP request and your browser. For example, the "User-Agent" line provides information on the browser version and the Operating System you are using. "Accept-Encoding" tells the server if your browser can accept compressed output like gzip.

You may have noticed that the cookie data is also transmitted inside an HTTP header. And if there was a referring url, that would have been in the header too.

Most of these headers are optional. This HTTP request could have been as small as this:

GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1 Host: net.tutsplus.com

And you would still get a valid response from the web server.

Request Methods

The three most commonly used request methods are: GET, POST and HEAD. You're probably already familiar with the first two, from writing html forms.

GET: Retrieve a Document

This is the main method used for retrieving html, images, JavaScript, CSS, etc. Most data that loads in your browser was requested using this method.

For example, when loading a Nettuts+ article, the very first line of the HTTP request looks like so:

GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1 ...

Once the html loads, the browser will start sending GET request for images, that may look like this:

GET /wp-content/themes/tuts_theme/images/header_bg_tall.png HTTP/1.1 ...

Web forms can be set to use the method GET. Here is an example.

<form method="GET" action="foo.php"> First Name: <input type="text" name="first_name" /> <br /> Last Name: <input type="text" name="last_name" /> <br /> <input type="submit" name="action" value="Submit" /> </form>

When that form is submitted, the HTTP request begins like this:

GET /foo.php?first_name=John&last_name=Doe&action=Submit HTTP/1.1 ...

You can see that each form input was added into the query string.

POST: Send Data to the Server

Even though you can send data to the server using GET and the query string, in many cases POST will be preferable. Sending large amounts of data using GET is not practical and has limitations.

POST requests are most commonly sent by web forms. Let's change the previous form example to a POST method.

<form method="POST" action="foo.php"> First Name: <input type="text" name="first_name" /> <br /> Last Name: <input type="text" name="last_name" /> <br /> <input type="submit" name="action" value="Submit" /> </form>

Submitting that form creates an HTTP request like this:

POST /foo.php HTTP/1.1 Host: localhost User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://localhost/test.php Content-Type: application/x-www-form-urlencoded Content-Length: 43 first_name=John&last_name=Doe&action=Submit

There are three important things to note here:

  • The path in the first line is simply /foo.php and there is no query string anymore.
  • Content-Type and Content-Lenght headers have been added, which provide information about the data being sent.
  • All the data is in now sent after the headers, with the same format as the query string.

POST method requests can also be made via AJAX, applications, cURL, etc. And all file upload forms are required to use the POST method.

HEAD: Retrieve Header Information

HEAD is identical to GET, except the server does not return the content in the HTTP response. When you send a HEAD request, it means that you are only interested in the response code and the HTTP headers, not the document itself.

"When you send a HEAD request, it means that you are only interested in the response code and the HTTP headers, not the document itself."

With this method the browser can check if a document has been modified, for caching purposes. It can also check if the document exists at all.

For example, if you have a lot of links on your website, you can periodically send HEAD requests to all of them to check for broken links. This will work much faster than using GET.

HTTP Response Structure

After the browser sends the HTTP request, the server responds with an HTTP response. Excluding the content, it looks like this:

The first piece of data is the protocol. This is again usually HTTP/1.x or HTTP/1.1 on modern servers.

The next part is the status code followed by a short message. Code 200 means that our GET request was successful and the server will return the contents of the requested document, right after the headers.

We all have seen "404" pages. This number actually comes from the status code part of the HTTP response. If the GET request would be made for a path that the server cannot find, it would respond with a 404 instead of 200.

The rest of the response contains headers just like the HTTP request. These values can contain information about the server software, when the page/file was last modified, the mime type etc...

Again, most of those headers are actually optional.

HTTP Status Codes

  • 200's are used for successful requests.
  • 300's are for redirections.
  • 400's are used if there was a problem with the request.
  • 500's are used if there was a problem with the server.

200 OK

As mentioned before, this status code is sent in response to a successful request.

206 Partial Content

If an application requests only a range of the requested file, the 206 code is returned.

It's most commonly used with download managers that can stop and resume a download, or split the download into pieces.

404 Not Found

When the requested page or file was not found, a 404 response code is sent by the server.

401 Unauthorized

Password protected web pages send this code. If you don't enter a login correctly, you may see the following in your browser.

Note that this only applies to HTTP password protected pages, that pop up login prompts like this:

403 Forbidden

If you are not allowed to access a page, this code may be sent to your browser. This often happens when you try to open a url for a folder, that contains no index page. If the server settings do not allow the display of the folder contents, you will get a 403 error.

For example, on my local server I created an images folder. Inside this folder I put an .htaccess file with this line: "Options -Indexes". Now when I try to open http://localhost/images/ - I see this:

There are other ways in which access can be blocked, and 403 can be sent. For example, you can block by IP address, with the help of some htaccess directives.

order allow,deny deny from 192.168.44.201 deny from 224.39.163.12 deny from 172.16.7.92 allow from all

302 (or 307) Moved Temporarily & 301 Moved Permanently

These two codes are used for redirecting a browser. For example, when you use a url shortening service, such as bit.ly, that's exactly how they forward the people who click on their links.

Both 302 and 301 are handled very similarly by the browser, but they can have different meanings to search engine spiders. For instance, if your website is down for maintenance, you may redirect to another location using 302. The search engine spider will continue checking your page later in the future. But if you redirect using 301, it will tell the spider that your website has moved to that location permanently. To give you a better idea: http://www.nettuts.com redirects to http://net.tutsplus.com/ using a 301 code instead of 302.

500 Internal Server Error

This code is usually seen when a web script crashes. Most CGI scripts do not output errors directly to the browser, unlike PHP. If there is any fatal errors, they will just send a 500 status code. And the programmer then needs to search the server error logs to find the error messages.

Complete List

You can find the complete list of HTTP status codes with their explanations here.

HTTP Headers in HTTP Requests

Now, we'll review some of the most common HTTP headers found in HTTP requests.

Almost all of these headers can be found in the $_SERVER array in PHP. You can also use the getallheaders() function to retrieve all headers at once.

Host

An HTTP Request is sent to a specific IP Addresses. But since most servers are capable of hosting multiple websites under the same IP, they must know which domain name the browser is looking for.

Host: net.tutsplus.com

This is basically the host name, including the domain and the subdomain.

In PHP, it can be found as $_SERVER['HTTP_HOST'] or $_SERVER['SERVER_NAME'].

User-Agent

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729)

This header can carry several pieces of information such as:

  • Browser name and version.
  • Operating System name and version.
  • Default language.

This is how websites can collect certain general information about their surfers' systems. For example, they can detect if the surfer is using a cell phone browser and redirect them to a mobile version of their website which works better with low resolutions.

In PHP, it can be found with: $_SERVER['HTTP_USER_AGENT'].

if ( strstr($_SERVER['HTTP_USER_AGENT'],'MSIE 6') ) { echo "Please stop using IE6!"; }

Accept-Language

Accept-Language: en-us,en;q=0.5

This header displays the default language setting of the user. If a website has different language versions, it can redirect a new surfer based on this data.

It can carry multiple languages, separated by commas. The first one is the preferred language, and each other listed language can carry a "q" value, which is an estimate of the user's preference for the language (min. 0 max. 1).

In PHP, it can be found as: $_SERVER["HTTP_ACCEPT_LANGUAGE"].

if (substr($_SERVER['HTTP_ACCEPT_LANGUAGE'], 0, 2) == 'fr') { header('Location: http://french.mydomain.com'); }

Accept-Encoding

Accept-Encoding: gzip,deflate

Most modern browsers support gzip, and will send this in the header. The web server then can send the HTML output in a compressed format. This can reduce the size by up to 80% to save bandwidth and time.

In PHP, it can be found as: $_SERVER["HTTP_ACCEPT_ENCODING"]. However, when you use the ob_gzhandler() callback function, it will check this value automatically, so you don't need to.

// enables output buffering // and all output is compressed if the browser supports it ob_start('ob_gzhandler');

If-Modified-Since

If a web document is already cached in your browser, and you visit it again, your browser can check if the document has been updated by sending this:

If-Modified-Since: Sat, 28 Nov 2009 06:38:19 GMT

If it was not modified since that date, the server will send a "304 Not Modified" response code, and no content - and the browser will load the content from the cache.

In PHP, it can be found as: $_SERVER['HTTP_IF_MODIFIED_SINCE'].

// assume $last_modify_time was the last the output was updated // did the browser send If-Modified-Since header? if(isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) { // if the browser cache matches the modify time if ($last_modify_time == strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE'])) { // send a 304 header, and no content header("HTTP/1.1 304 Not Modified"); exit; } }

There is also an HTTP header named Etag, which can be used to make sure the cache is current. We'll talk about this shortly.

Cookie

As the name suggests, this sends the cookies stored in your browser for that domain.

Cookie: PHPSESSID=r2t5uvjq435r4q7ib3vtdjq120; foo=bar

These are name=value pairs separated by semicolons. Cookies can also contain the session id.

In PHP, individual cookies can be accessed with the $_COOKIE array. You can directly access the session variables using the $_SESSION array, and if you need the session id, you can use the session_id() function instead of the cookie.

echo $_COOKIE['foo']; // output: bar echo $_COOKIE['PHPSESSID']; // output: r2t5uvjq435r4q7ib3vtdjq120 session_start(); echo session_id(); // output: r2t5uvjq435r4q7ib3vtdjq120

Referer

As the name suggests, this HTTP header contains the referring url.

For example, if I visit the Nettuts+ homepage, and click on an article link, this header is sent to my browser:

Referer: http://net.tutsplus.com/

In PHP, it can be found as $_SERVER['HTTP_REFERER'].

if (isset($_SERVER['HTTP_REFERER'])) { $url_info = parse_url($_SERVER['HTTP_REFERER']); // is the surfer coming from Google? if ($url_info['host'] == 'www.google.com') { parse_str($url_info['query'], $vars); echo "You searched on Google for this keyword: ". $vars['q']; } } // if the referring url was: // http://www.google.com/search?source=ig&hl=en&rlz=&=&q=http+headers&aq=f&oq=&aqi=g-p1g9 // the output will be: // You searched on Google for this keyword: http headers

You may have noticed the word "referrer" is misspelled as "referer". Unfortunately it made into the official HTTP specifications like that and got stuck.

Authorization

When a web page asks for authorization, the browser opens a login window. When you enter a username and password in this window, the browser sends another HTTP request, but this time it contains this header.

Authorization: Basic bXl1c2VyOm15cGFzcw==

The data inside the header is base64 encoded. For example, base64_decode('bXl1c2VyOm15cGFzcw==') would return 'myuser:mypass'

In PHP, these values can be found as $_SERVER['PHP_AUTH_USER'] and $_SERVER['PHP_AUTH_PW'].

More on this when we talk about the WWW-Authenticate header.

HTTP Headers in HTTP Responses

Now we are going to look at some of the most common HTTP headers found in HTTP responses.

In PHP, you can set response headers using the header() function. PHP already sends certain headers automatically, for loading the content and setting cookies etc... You can see the headers that are sent, or will be sent, with the headers_list() function. You can check if the headers have been sent already, with the headers_sent() function.

Cache-Control

Definition from w3.org: "The Cache-Control general-header field is used to specify directives which MUST be obeyed by all caching mechanisms along the request/response chain." These "caching mechanisms" include gateways and proxies that your ISP may be using.

Example:

Cache-Control: max-age=3600, public

"public" means that the response may be cached by anyone. "max-age" indicates how many seconds the cache is valid for. Allowing your website to be cached can reduce server load and bandwidth, and also improve load times at the browser.

Caching can also be prevented by using the "no-cache" directive.

Cache-Control: no-cache

For more detailed info, see w3.org.

Content-Type

This header indicates the "mime-type" of the document. The browser then decides how to interpret the contents based on this. For example, an html page (or a PHP script with html output) may return this:

Content-Type: text/html; charset=UTF-8

"text" is the type and "html" is the subtype of the document. The header can also contain more info such as charset.

For a gif image, this may be sent.

Content-Type: image/gif

The browser can decide to use an external application or browser extension based on the mime-type. For example this will cause the Adobe Reader to be loaded:

Content-Type: application/pdf

When loading directly, Apache can usually detect the mime-type of a document and send the appropriate header. Also most browsers have some amount fault tolerance and auto-detection of the mime-types, in case the headers are wrong or not present.

You can find a list of common mime types here.

In PHP, you can use the finfo_file() function to detect the mime type of a file.

Content-Disposition

This header instructs the browser to open a file download box, instead of trying to parse the content. Example:

Content-Disposition: attachment; filename="download.zip"

That will cause the browser to do this:

Note that the appropriate Content-Type header should also be sent along with this:

Content-Type: application/zip Content-Disposition: attachment; filename="download.zip"

Content-Length

When content is going to be transmitted to the browser, the server can indicate the size of it (in bytes) using this header.

Content-Length: 89123

This is especially useful for file downloads. That's how the browser can determine the progress of the download.

For example, here is a dummy script I wrote, which simulates a slow download.

// it's a zip file header('Content-Type: application/zip'); // 1 million bytes (about 1megabyte) header('Content-Length: 1000000'); // load a download dialogue, and save it as download.zip header('Content-Disposition: attachment; filename="download.zip"'); // 1000 times 1000 bytes of data for ($i = 0; $i < 1000; $i++) { echo str_repeat(".",1000); // sleep to slow down the download usleep(50000); }

The result is:

Now I am going to comment out the Content-Length header

// it's a zip file header('Content-Type: application/zip'); // the browser won't know the size // header('Content-Length: 1000000'); // load a download dialogue, and save it as download.zip header('Content-Disposition: attachment; filename="download.zip"'); // 1000 times 1000 bytes of data for ($i = 0; $i < 1000; $i++) { echo str_repeat(".",1000); // sleep to slow down the download usleep(50000); }

Now the result is:

The browser can only tell you how many bytes have been downloaded, but it does not know the total amount. And the progress bar is not showing the progress.

Etag

This is another header that is used for caching purposes. It looks like this:

Etag: "pub1259380237;gz"

The web server may send this header with every document it serves. The value can be based on the last modify date, file size or even the checksum value of a file. The browser then saves this value as it caches the document. Next time the browser requests the same file, it sends this in the HTTP request:

If-None-Match: "pub1259380237;gz"

If the Etag value of the document matches that, the server will send a 304 code instead of 200, and no content. The browser will load the contents from its cache.

Last-Modified

As the name suggests, this header indicates the last modify date of the document, in GMT format:

Last-Modified: Sat, 28 Nov 2009 03:50:37 GMT$modify_time = filemtime($file); header("Last-Modified: " . gmdate("D, d M Y H:i:s", $modify_time) . " GMT");

It offers another way for the browser to cache a document. The browser may send this in the HTTP request:

If-Modified-Since: Sat, 28 Nov 2009 06:38:19 GMT

We already talked about this earlier in the "If-Modified-Since" section.

Location

This header is used for redirections. If the response code is 301 or 302, the server must also send this header. For example, when you go to http://www.nettuts.com your browser will receive this:

HTTP/1.x 301 Moved Permanently ... Location: http://net.tutsplus.com/ ...

In PHP, you can redirect a surfer like so:

header('Location: http://net.tutsplus.com/');

By default, that will send a 302 response code. If you want to send 301 instead:

header('Location: http://net.tutsplus.com/', true, 301);

Set-Cookie

When a website wants to set or update a cookie in your browser, it will use this header.

Set-Cookie: skin=noskin; path=/; domain=.amazon.com; expires=Sun, 29-Nov-2009 21:42:28 GMT Set-Cookie: session-id=120-7333518-8165026; path=/; domain=.amazon.com; expires=Sat Feb 27 08:00:00 2010 GMT

Each cookie is sent as a separate header. Note that the cookies set via JavaScript do not go through HTTP headers.

In PHP, you can set cookies using the setcookie() function, and PHP sends the appropriate HTTP headers.

setcookie("TestCookie", "foobar");

Which causes this header to be sent:

Set-Cookie: TestCookie=foobar

If the expiration date is not specified, the cookie is deleted when the browser window is closed.

WWW-Authenticate

A website may send this header to authenticate a user through HTTP. When the browser sees this header, it will open up a login dialogue window.

WWW-Authenticate: Basic realm="Restricted Area"

Which looks like this:

There is a section in the PHP manual, that has code samples on how to do this in PHP.

if (!isset($_SERVER['PHP_AUTH_USER'])) { header('WWW-Authenticate: Basic realm="My Realm"'); header('HTTP/1.0 401 Unauthorized'); echo 'Text to send if user hits Cancel button'; exit; } else { echo "<p>Hello {$_SERVER['PHP_AUTH_USER']}.</p>"; echo "<p>You entered {$_SERVER['PHP_AUTH_PW']} as your password.</p>"; }

Content-Encoding

This header is usually set when the returned content is compressed.

Content-Encoding: gzip

In PHP, if you use the ob_gzhandler() callback function, it will be set automatically for you.

Conclusion

Thanks for reading. I hope this article was a good starting point to learn about HTTP Headers. Please leave your comments and questions below, and I will try to respond as much as I can.

If you want to take your web development further, check out some of the popular files on CodeCanyon. These scripts, apps, templates and plugins can save you precious development time and help you add new features quickly and easily. Or get some support from a professional developer on Envato Studio.