Network Working Group                                           E. Nebel 
            Request For Comments: 1867                                   L. Masinter 
            Category: Experimental                                 Xerox Corporation 
                                                                       November 1995
             
                                 Form-based File Upload in HTML 
            Status of this Memo 
               This memo defines an Experimental Protocol for the Internet 
               community.  This memo does not specify an Internet standard of any 
               kind.  Discussion and suggestions for improvement are requested. 
               Distribution of this memo is unlimited. 
            1. Abstract 
               Currently, HTML forms allow the producer of the form to request 
               information from the user reading the form.  These forms have proven 
               useful in a wide variety of applications in which input from the user 
               is necessary.  However, this capability is limited because HTML forms 
               don't provide a way to ask the user to submit files of data.  Service 
               providers who need to get files from the user have had to implement 
               custom user applications.  (Examples of these custom browsers have 
               appeared on the www-talk mailing list.)  Since file-upload is a 
               feature that will benefit many applications, this proposes an 
               extension to HTML to allow information providers to express file 
               upload requests uniformly, and a MIME compatible representation for 
               file upload responses.  This also includes a description of a 
               backward compatibility strategy that allows new servers to interact 
               with the current HTML user agents. 
               The proposal is independent of which version of HTML it becomes a 
               part. 
            2.  HTML forms with file submission 
               The current HTML specification defines eight possible values for the 
               attribute TYPE of an INPUT element: CHECKBOX, HIDDEN, IMAGE, 
               PASSWORD, RADIO, RESET, SUBMIT, TEXT. 
               In addition, it defines the default ENCTYPE attribute of the FORM 
               element using the POST METHOD to have the default value 
               "application/x-www-form-urlencoded". 
              
              
              
            Nebel & Masinter              Experimental                      [Page 1] 
             
            RFC 1867             Form-based File Upload in HTML        November 1995 
             
               This proposal makes two changes to HTML: 
               1) Add a FILE option for the TYPE attribute of INPUT. 
               2) Allow an ACCEPT attribute for INPUT tag, which is a list of 
                  media types or type patterns allowed for the input. 
               In addition, it defines a new MIME media type, multipart/form-data, 
               and specifies the behavior of HTML user agents when interpreting a 
               form with ENCTYPE="multipart/form-data" and/or <INPUT type="file"> 
               tags. 
               These changes might be considered independently, but are all 
               necessary for reasonable file upload. 
               The author of an HTML form who wants to request one or more files 
               from a user would write (for example): 
                <FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST> 
                File to process: <INPUT NAME="userfile1" TYPE="file"> 
                <INPUT TYPE="submit" VALUE="Send File"> 
                </FORM> 
               The change to the HTML DTD is to add one item to the entity 
               "InputType". In addition, it is proposed that the INPUT tag have an 
               ACCEPT attribute, which is a list of comma-separated media types. 
              ... (other elements) ... 
              <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX | 
                                     RADIO | SUBMIT | RESET | 
                                     IMAGE | HIDDEN | FILE )"> 
              <!ELEMENT INPUT - 0 EMPTY> 
              <!ATTLIST INPUT 
                      TYPE %InputType TEXT 
                      NAME CDATA #IMPLIED  -- required for all but submit and reset 
                      VALUE CDATA #IMPLIED 
                      SRC %URI #IMPLIED  -- for image inputs -- 
                      CHECKED (CHECKED) #IMPLIED 
                      SIZE CDATA #IMPLIED  --like NUMBERS, 
                                              but delimited with comma, not space 
                      MAXLENGTH NUMBER #IMPLIED 
                      ALIGN (top|middle|bottom) #IMPLIED 
                      ACCEPT CDATA #IMPLIED --list of content types 
                      > 
              
             
            Nebel & Masinter              Experimental                      [Page 2] 
             
            RFC 1867             Form-based File Upload in HTML        November 1995 
             
              ... (other elements) ... 
            3.  Suggested implementation 
               While user agents that interpret HTML have wide leeway to choose the 
               most appropriate mechanism for their context, this section suggests 
               how one class of user agent, WWW browsers, might implement file 
               upload. 
            3.1 Display of FILE widget 
               When a INPUT tag of type FILE is encountered, the browser might show 
               a display of (previously selected) file names, and a "Browse" button 
               or selection method. Selecting the "Browse" button would cause the 
               browser to enter into a file selection mode appropriate for the 
               platform. Window-based browsers might pop up a file selection window, 
               for example. In such a file selection dialog, the user would have the 
               option of replacing a current selection, adding a new file selection, 
               etc. Browser implementors might choose let the list of file names be 
               manually edited. 
               If an ACCEPT attribute is present, the browser might constrain the 
               file patterns prompted for to match those with the corresponding 
               appropriate file extensions for the platform. 
            3.2 Action on submit 
               When the user completes the form, and selects the SUBMIT element, the 
               browser should send the form data and the content of the selected 
               files.  The encoding type application/x-www-form-urlencoded is 
               inefficient for sending large quantities of binary data or text 
               containing non-ASCII characters.  Thus, a new media type, 
               multipart/form-data, is proposed as a way of efficiently sending the 
               values associated with a filled-out form from client to server. 
            3.3 use of multipart/form-data 
               The definition of multipart/form-data is included in section 7.  A 
               boundary is selected that does not occur in any of the data. (This 
               selection is sometimes done probabilisticly.) Each field of the form 
               is sent, in the order in which it occurs in the form, as a part of 
               the multipart stream.  Each part identifies the INPUT name within the 
               original HTML form. Each part should be labelled with an appropriate 
               content-type if the media type is known (e.g., inferred from the file 
               extension or operating system typing information) or as 
               application/octet-stream. 
              
              
            Nebel & Masinter              Experimental                      [Page 3] 
             
            RFC 1867             Form-based File Upload in HTML        November 1995 
             
               If multiple files are selected, they should be transferred together 
               using the multipart/mixed format. 
               While the HTTP protocol can transport arbitrary BINARY data, the 
               default for mail transport (e.g., if the ACTION is a "mailto:" URL) 
               is the 7BIT encoding.  The value supplied for a part may need to be 
               encoded and the "content-transfer-encoding" header supplied if the 
               value does not conform to the default encoding.  [See section 5 of 
               RFC 1521 for more details.] 
               The original local file name may be supplied as well, either as a 
               'filename' parameter either of the 'content-disposition: form-data' 
               header or in the case of multiple files in a 'content-disposition: 
               file' header of the subpart. The client application should make best 
               effort to supply the file name; if the file name of the client's 
               operating system is not in US-ASCII, the file name might be 
               approximated or encoded using the method of RFC 1522.  This is a 
               convenience for those cases where, for example, the uploaded files 
               might contain references to each other, e.g., a TeX file and its .sty 
               auxiliary style description. 
               On the server end, the ACTION might point to a HTTP URL that 
               implements the forms action via CGI. In such a case, the CGI program 
               would note that the content-type is multipart/form-data, parse the 
               various fields (checking for validity, writing the file data to local 
               files for subsequent processing, etc.). 
            3.4 Interpretation of other attributes 
               The VALUE attribute might be used with <INPUT TYPE=file> tags for a 
               default file name. This use is probably platform dependent.  It might 
               be useful, however, in sequences of more than one transaction, e.g., 
               to avoid having the user prompted for the same file name over and 
               over again. 
               The SIZE attribute might be specified using SIZE=width,height, where 
               width is some default for file name width, while height is the 
               expected size showing the list of selected files.  For example, this 
               would be useful for forms designers who expect to get several files 
               and who would like to show a multiline file input field in the 
               browser (with a "browse" button beside it, hopefully).  It would be 
               useful to show a one line text field when no height is specified 
               (when the forms designer expects one file, only) and to show a 
               multiline text area with scrollbars when the height is greater than 1 
               (when the forms designer expects multiple files). 
              
              
             
            Nebel & Masinter              Experimental                      [Page 4] 
             
            RFC 1867             Form-based File Upload in HTML        November 1995 
             
            4.  Backward compatibility issues 
               While not necessary for successful adoption of an enhancement to the 
               current WWW form mechanism, it is useful to also plan for a migration 
               strategy: users with older browsers can still participate in file 
               upload dialogs, using a helper application. Most current web browers, 
               when given <INPUT TYPE=FILE>, will treat it as <INPUT TYPE=TEXT> and 
               give the user a text box. The user can type in a file name into this 
               text box. In addition, current browsers seem to ignore the ENCTYPE 
               parameter in the <FORM> element, and always transmit the data as 
               application/x-www-form-urlencoded. 
               Thus, the server CGI might be written in a way that would note that 
               the form data returned had content-type application/x-www-form- 
               urlencoded instead of multipart/form-data, and know that the user was 
               using a browser that didn't implement file upload. 
               In this case, rather than replying with a "text/html" response, the 
               CGI on the server could instead send back a data stream that a helper 
               application might process instead; this would be a data stream of 
               type "application/x-please-send-files", which contains: 
               * The (fully qualified) URL to which the actual form data should 
                 be posted (terminated with CRLF) 
               * The list of field names that were supposed to be file contents 
                 (space separated, terminated with CRLF) 
               * The entire original application/x-www-form-urlencoded form data 
                 as originally sent from client to server. 
               In this case, the browser needs to be configured to process 
               application/x-please-send-files to launch a helper application. 
               The helper would read the form data, note which fields contained 
               'local file names' that needed to be replaced with their data 
               content, might itself prompt the user for changing or adding to the 
               list of files available, and then repackage the data & file contents 
               in multipart/form-data for retransmission back to the server. 
               The helper would generate the kind of data that a 'new' browser 
               should actually have sent in the first place, with the intention that 
               the URL to which it is sent corresponds to the original ACTION URL. 
               The point of this is that the server can use the *same* CGI to 
               implement the mechanism for dealing with both old and new browsers. 
               The helper need not display the form data, but *should* ensure that 
               the user actually be prompted about the suitability of sending the 
               files requested (this is to avoid a security problem with malicious 
               servers that ask for files that weren't actually promised by the 
              
            Nebel & Masinter              Experimental                      [Page 5] 
             
            RFC 1867             Form-based File Upload in HTML        November 1995 
             
               user.) It would be useful if the status of the transfer of the files 
               involved could be displayed. 
            5.  Other considerations 
            5.1 Compression, encryption 
               This scheme doesn't address the possible compression of files.  After 
               some consideration, it seemed that the optimization issues of file 
               compression were too complex to try to automatically have browsers 
               decide that files should be compressed.  Many link-layer transport 
               mechanisms (e.g., high-speed modems) perform data compression over 
               the link, and optimizing for compression at this layer might not be 
               appropriate. It might be possible for browsers to optionally produce 
               a content-transfer-encoding of x-compress for file data, and for 
               servers to decompress the data before processing, if desired; this 
               was left out of the proposal, however. 
               Similarly, the proposal does not contain a mechanism for encryption 
               of the data; this should be handled by whatever other mechanisms are 
               in place for secure transmission of data, whether via secure HTTP or 
               mail. 
            5.2 Deferred file transmission 
               In some situations, it might be advisable to have the server validate 
               various elements of the form data (user name, account, etc.)  before 
               actually preparing to receive the data.  However, after some 
               consideration, it seemed best to require that servers that wish to do 
               this should implement this as a series of forms, where some of the 
               data elements that were previously validated might be sent back to 
               the client as 'hidden' fields, or by arranging the form so that the 
               elements that need validation occur first.  This puts the onus of 
               maintaining the state of a transaction only on those servers that 
               wish to build a complex application, while allowing those cases that 
               have simple input needs to be built simply. 
               The HTTP protocol may require a content-length for the overall 
               transmission. Even if it were not to do so, HTTP clients are 
               encouraged to supply content-length for overall file input so that a 
               busy server could detect if the proposed file data is too large to be 
               processed reasonably and just return an error code and close the 
               connection without waiting to process all of the incoming data.  Some 
               current implementations of CGI require a content-length in all POST 
               transactions. 
               If the INPUT tag includes the attribute MAXLENGTH, the user agent 
               should consider its value to represent the maximum Content-Length (in 
              
            Nebel & Masinter              Experimental                      [Page 6] 
             
            RFC 1867             Form-based File Upload in HTML        November 1995 
             
               bytes) which the server will accept for transferred files.  In this 
               way, servers can hint to the client how much space they have 
               available for a file upload, before that upload takes place.  It is 
               important to note, however, that this is only a hint, and the actual 
               requirements of the server may change between form creation and file 
               submission. 
               In any case, a HTTP server may abort a file upload in the middle of 
               the transaction if the file being received is too large. 
            5.3 Other choices for return transmission of binary data 
               Various people have suggested using new mime top-level type 
               "aggregate", e.g., aggregate/mixed or a content-transfer-encoding of 
               "packet" to express indeterminate-length binary data, rather than 
               relying on the multipart-style boundaries.  While we are not opposed 
               to doing so, this would require additional design and standardization 
               work to get acceptance of "aggregate".  On the other hand, the 
               'multipart' mechanisms are well established, simple to implement on 
               both the sending client and receiving server, and as efficient as 
               other methods of dealing with multiple combinations of binary data. 
            5.4 Not overloading <INPUT>: 
               Various people have wondered about the advisability of overloading 
               'INPUT' for this function, rather than merely providing a different 
               type of FORM element.  Among other considerations, the migration 
               strategy which is allowed when using <INPUT> is important.  In 
               addition, the <INPUT> field *is* already overloaded to contain most 
               kinds of data input; rather than creating multiple kinds of <INPUT> 
               tags, it seems most reasonable to enhance <INPUT>.  The 'type' of 
               INPUT is not the content-type of what is returned, but rather the 
               'widget-type'; i.e., it identifies the interaction style with the 
               user.  The description here is carefully written to allow <INPUT 
               TYPE=FILE> to work for text browsers or audio-markup. 
            5.5 Default content-type of field data 
               Many input fields in HTML are to be typed in. There has been some 
               ambiguity as to how form data should be transmitted back to servers. 
               Making the content-type of <INPUT> fields be text/plain clearly 
               disambiguates that the client should properly encode the data before 
               sending it back to the server with CRLFs. 
            5.6 Allow form ACTION to be "mailto:" 
               Independent of this proposal, it would be very useful for HTML 
               interpreting user agents to allow a ACTION in a form to be a 
              
            Nebel & Masinter              Experimental                      [Page 7] 
             
            RFC 1867             Form-based File Upload in HTML        November 1995 
             
               "mailto:" URL. This seems like a good idea, with or without this 
               proposal. Similarly, the ACTION for a HTML form which is received via 
               mail should probably default to the "reply-to:" of the message. 
               These two proposals would allow HTML forms to be served via HTTP 
               servers but sent back via mail, or, alternatively, allow HTML forms 
               to be sent by mail, filled out by HTML-aware mail recipients, and the 
               results mailed back. 
            5.7 Remote files with third-party transfer 
               In some scenarios, the user operating the client software might want 
               to specify a URL for remote data rather than a local file. In this 
               case, is there a way to allow the browser to send to the client a 
               pointer to the external data rather than the entire contents? This 
               capability could be implemented, for example, by having the client 
               send to the server data of type "message/external-body" with 
               "access-type" set to, say, "uri", and the URL of the remote data in 
               the body of the message. 
            5.8 File transfer with ENCTYPE=x-www-form-urlencoded 
               If a form contains <INPUT TYPE=file> elements but does not contain an 
               ENCTYPE in the enclosing <FORM>, the behavior is not specified.  It 
               is probably inappropriate to attempt to URN-encode large quantities 
               of data to servers that don't expect it. 
            5.9 CRLF used as line separator 
               As with all MIME transmissions, CRLF is used as the separator for 
               lines in a POST of the data in multipart/form-data. 
            5.10 Relationship to multipart/related 
               The MIMESGML group is proposing a new type called multipart/related. 
               While it contains similar features to multipart/form-data, the use 
               and application of form-data is different enough that form-data is 
               being described separately. 
               It might be possible at some point to encode the result of HTML forms 
               (including files) in a multipart/related body part; this is not 
               incompatible with this proposal. 
            5.11 Non-ASCII field names 
               Note that mime headers are generally required to consist only of 7- 
               bit data in the US-ASCII character set. Hence field names should be 
               encoded according to the prescriptions of RFC 1522 if they contain 
               characters outside of that set. In HTML 2.0, the default character 
              
            Nebel & Masinter              Experimental                      [Page 8] 
             
            RFC 1867             Form-based File Upload in HTML        November 1995 
             
               set is ISO-8859-1, but non-ASCII characters in field names should be 
               encoded. 
            6. Examples 
               Suppose the server supplies the following HTML: 
                 <FORM ACTION="http://server.dom/cgi/handle" 
                       ENCTYPE="multipart/form-data" 
                       METHOD=POST> 
                 What is your name? <INPUT TYPE=TEXT NAME=submitter> 
                 What files are you sending? <INPUT TYPE=FILE NAME=pics> 
                 </FORM> 
               and the user types "Joe Blow" in the name field, and selects a text 
               file "file1.txt" for the answer to 'What files are you sending?' 
               The client might send back the following data: 
                    Content-type: multipart/form-data, boundary=AaB03x 
                    --AaB03x 
                    content-disposition: form-data; name="field1" 
                    Joe Blow 
                    --AaB03x 
                    content-disposition: form-data; name="pics"; filename="file1.txt" 
                    Content-Type: text/plain 
                     ... contents of file1.txt ... 
                    --AaB03x-- 
               If the user also indicated an image file "file2.gif" for the answer 
               to 'What files are you sending?', the client might client might send 
               back the following data: 
                    Content-type: multipart/form-data, boundary=AaB03x 
                    --AaB03x 
                    content-disposition: form-data; name="field1" 
                    Joe Blow 
                    --AaB03x 
                    content-disposition: form-data; name="pics" 
                    Content-type: multipart/mixed, boundary=BbC04y 
                    --BbC04y 
                    Content-disposition: attachment; filename="file1.txt" 
              
            Nebel & Masinter              Experimental                      [Page 9] 
             
            RFC 1867             Form-based File Upload in HTML        November 1995 
             
                    Content-Type: text/plain 
                    ... contents of file1.txt ... 
                    --BbC04y 
                    Content-disposition: attachment; filename="file2.gif" 
                    Content-type: image/gif 
                    Content-Transfer-Encoding: binary 
                      ...contents of file2.gif... 
                    --BbC04y-- 
                    --AaB03x-- 
            7. Registration of multipart/form-data 
               The media-type multipart/form-data follows the rules of all multipart 
               MIME data streams as outlined in RFC 1521. It is intended for use in 
               returning the data that comes about from filling out a form. In a 
               form (in HTML, although other applications may also use forms), there 
               are a series of fields to be supplied by the user who fills out the 
               form. Each field has a name. Within a given form, the names are 
               unique. 
               multipart/form-data contains a series of parts. Each part is expected 
               to contain a content-disposition header where the value is "form- 
               data" and a name attribute specifies the field name within the form, 
               e.g., 'content-disposition: form-data; name="xxxxx"', where xxxxx is 
               the field name corresponding to that field. Field names originally in 
               non-ASCII character sets may be encoded using the method outlined in 
               RFC 1522. 
               As with all multipart MIME types, each part has an optional Content- 
               Type which defaults to text/plain.  If the contents of a file are 
               returned via filling out a form, then the file input is identified as 
               application/octet-stream or the appropriate media type, if known.  If 
               multiple files are to be returned as the result of a single form 
               entry, they can be returned as multipart/mixed embedded within the 
               multipart/form-data. 
               Each part may be encoded and the "content-transfer-encoding" header 
               supplied if the value of that part does not conform to the default 
               encoding. 
               File inputs may also identify the file name. The file name may be 
               described using the 'filename' parameter of the "content-disposition" 
               header. This is not required, but is strongly recommended in any case 
               where the original filename is known. This is useful or necessary in 
               many applications. 
              
             
            Nebel & Masinter              Experimental                     [Page 10] 
             
            RFC 1867             Form-based File Upload in HTML        November 1995 
             
            8. Security Considerations 
               It is important that a user agent not send any file that the user has 
               not explicitly asked to be sent. Thus, HTML interpreting agents are 
               expected to confirm any default file names that might be suggested 
               with <INPUT TYPE=file VALUE="yyyy">.  Never have any hidden fields be 
               able to specify any file. 
               This proposal does not contain a mechanism for encryption of the 
               data; this should be handled by whatever other mechanisms are in 
               place for secure transmission of data, whether via secure HTTP, or by 
               security provided by MOSS (described in RFC 1848). 
               Once the file is uploaded, it is up to the receiver to process and 
               store the file appropriately. 
            9.  Conclusion 
               The suggested implementation gives the client a lot of flexibility in 
               the number and types of files it can send to the server, it gives the 
               server control of the decision to accept the files, and it gives 
               servers a chance to interact with browsers which do not support INPUT 
               TYPE "file". 
               The change to the HTML DTD is very simple, but very powerful.  It 
               enables a much greater variety of services to be implemented via the 
               World-Wide Web than is currently possible due to the lack of a file 
               submission facility.  This would be an extremely valuable addition to 
               the capabilities of the World-Wide Web. 
              
              
              
              
              
              
              
              
              
              
             
            Nebel & Masinter              Experimental                     [Page 11] 
             
            RFC 1867             Form-based File Upload in HTML        November 1995 
             
            Authors' Addresses 
               Larry Masinter 
               Xerox Palo Alto Research Center 
               3333 Coyote Hill Road 
               Palo Alto, CA 94304 
               Phone:  (415) 812-4365 
               Fax:    (415) 812-4333 
               EMail:   masinter@parc.xerox.com 
             
               Ernesto Nebel 
               XSoft, Xerox Corporation 
               10875 Rancho Bernardo Road, Suite 200 
               San Diego, CA 92127-2116 
               Phone:  (619) 676-7817 
               Fax:    (619) 676-7865 
               EMail:   nebel@xsoft.sd.xerox.com 
              
              
              
              
              
              
              
              
              
              
              
              
              
              
              
            Nebel & Masinter              Experimental                     [Page 12] 
             
            RFC 1867             Form-based File Upload in HTML        November 1995 
             
            A. Media type registration for multipart/form-data 
            Media Type name: 
             multipart 
            Media subtype name: 
             form-data 
            Required parameters: 
             none 
            Optional parameters: 
             none 
            Encoding considerations: 
             No additional considerations other than as for other multipart types. 
            Published specification: 
             RFC 1867 
            Security Considerations 
              The multipart/form-data type introduces no new security 
              considerations beyond what might occur with any of the enclosed 
              parts. 
            References 
            [RFC 1521] MIME (Multipurpose Internet Mail Extensions) Part One: 
                       Mechanisms for Specifying and Describing the Format of 
                       Internet Message Bodies.  N. Borenstein & N. Freed. 
                       September 1993. 
            [RFC 1522] MIME (Multipurpose Internet Mail Extensions) Part Two: 
                       Message Header Extensions for Non-ASCII Text. K. Moore. 
                       September 1993. 
            [RFC 1806] Communicating Presentation Information in Internet 
                       Messages: The Content-Disposition Header. R. Troost & S. 
                       Dorner, June 1995. 
              
              
              
              
              
            Nebel & Masinter              Experimental                     [Page 13] 
             
             
             |