`
`Extensible Markup Language (XML)
`
`Extensible Markup Language (XML)
`
`W3C Working Draft 14-Nov-96
`
`WD-xml-961114
`
`This version:
`http://www.w3.org/pub/WWW/TR/WD-xml-961114.html
`Previous versions:
`Latest version:
`http://www.textuality.com/sgml-erb/WD-xml.html
`
`Editors:
`Tim Bray (Textuality) <tbray@textuality.com>
`C. M. Sperberg-McQueen (University of Illinois at Chicago) <cmsmcq@uic.edu>
`
`Status of this memo
`
`This is a W3C Working Draft for review by W3C members and other interested parties. It is a draft document
`and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use W3C
`Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C
`working drafts can be found at: http://www.w3.org/pub/WWW/TR
`
`Note: since working drafts are subject to frequent change, you are advised to reference the above URL, rather
`than the URLs for working drafts themselves.
`
`This work is part of the W3C SGML Activity.
`
`Abstract
`
`Extensible Markup Language (XML) is an extremely simple dialect of SGML which is completely described in
`this document. The goal is to enable generic SGML to be served, received, and processed on the Web in the way
`that is now possible with HTML. For this reason, XML has been designed for ease of implementation, and for
`interoperability with both SGML and HTML.
`Note on status of this document: This is even more of a moving target than the typical W3C working draft.
`Several important decisions on the details of XML are still outstanding - members of the W3C SGML Working
`Group will recognize these areas of particular volatility in the spec, but those who are not intimately familiar
`with the deliberative process should be careful to avoid actions based on the content of this document, until the
`notice you are now reading has been removed.
`
`Table of Contents
`
`1. Introduction
`1.1 Origin and Goals
`1.2 Relationship to Other Standards
`1.3 Notation
`
`https://www.w3.org/TR/WD-xml-961114#sec1.1
`
`IBM EX. 1023
`
`1/25
`
`
`
`1 of 25
`
`
`
`3/30/2017
`
`Extensible Markup Language (XML)
`
`1.4 Terminology
`1.5 Common Syntactic Constructs
`2. Documents
`2.1 Logical and Physical Structure
`2.2 Characters
`2.3 Syntax of Text and Markup
`2.4 Comments
`2.5 Processing Instructions
`2.6 Marked Sections
`2.7 White Space Handling
`2.8 Prolog and Document Type Declaration
`2.9 Required Markup Declaration
`3. Logical Structures
`3.1 Start- and End-Tags
`3.2 Well-Formed XML Documents
`3.3 Element Declaration
`3.3.1 Mixed Content
`3.3.2 Element Content
`3.4 Attribute Declaration
`3.4.1 Attribute Types
`3.4.2 Attribute Defaults
`3.5 Partial DTD Information
`4. Physical Structures
`4.1 Character and Entity References
`4.2 Declaring Entities
`4.2.1 Internal Entities
`4.2.2 External Entities
`4.2.3 Character Encoding in Entities
`4.2.4 The Document Entity
`4.3 XML Processor Treatment of Entities
`4.4 Notation Declaration
`5. Conformance
`A. XML and SGML
`B. References
`C. Working Group and Editorial Review Board Membership
`C.1 Working Group
`C.2 Editorial Review Board
`
`1. Introduction
`
`Extensible Markup Language, abbreviated XML, describes a class of data objects stored on computers and
`partially describes the behavior of programs which process these objects. Such objects are called XML
`documents. XML is an application profile or restricted form of SGML, the Standard Generalized Markup
`Language [ISO 8879].
`
`XML documents are made up of storage units called entities, which contain either text or binary data. Text is
`made up of characters, some of which form the character data in the document, and some of which form markup.
`Markup encodes a description of the document's storage layout, structure, and arbitrary attribute-value pairs
`associated with that structure. XML provides a mechanism to impose constraints on the storage layout and
`logical structure.
`
`A software module called an XML processor is used to read XML documents and provide access to their content
`and structure. It is assumed that an XML processor is doing its work on behalf of another module, referred to as
`https://www.w3.org/TR/WD-xml-961114#sec1.1
`2/25
`
`2 of 25
`
`
`
`
`
`3/30/2017
`Extensible Markup Language (XML)
`the application. This specification describes some of the required behavior of an XML processor in terms of the
`manner it must read XML data, and the information it must provide to the application.
`
`1.1 Origin and Goals
`
`XML was developed by a Generic SGML Editorial Review Board formed under the auspices of the W3
`Consortium in 1996 and chaired by Jon Bosak of Sun Microsystems, with the very active participation of a
`Generic SGML Working Group also organized by the W3C. The membership of these groups is given in an
`appendix.
`
`The design goals for XML are:
`
`1. XML shall be straightforwardly usable over the Internet.
`2. XML shall support a wide variety of applications.
`3. XML shall be compatible with SGML.
`4. It shall be easy to write programs which process XML documents.
`5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
`6. XML documents should be human-legible and reasonably clear.
`7. The XML design should be prepared quickly.
`8. The design of XML shall be formal and concise.
`9. XML documents shall be easy to create.
`10. Terseness is of minimal importance.
`
`This specification, together with the associated standards, provides all the information necessary to understand
`XML version 1.0 and construct computer programs to process it.
`
`This version of the XML specification (0.01) is for internal discussion within the SGML ERB only. It should not
`be distributed outside the ERB.
`
`Known problems in version 0.01:
`
`1. Several items in the bibliography have no references to them; several references in the text do not point to
`anything in the bibliograpy.
`2. The EBNF grammar has not been checked for completeness, and has at least two start productions.
`3. The description of conformance in the final section is incomplete.
`4. Language exists in the spec which describes the effect of several decisions which have not been taken.
`Specifically, XML may have INCLUDE/IGNORE marked sections as does SGML, the comment syntax
`may change, XML may have CONREF attributes, the 8879 syntax for EMPTY elements may be
`outlawed, XML may choose to rule out what 8879 calls "ambiguous" content models, XML may choose to
`prohibit overlap between enumerated attribute values for different attributes, the handling for attribute
`values in the absence of a DTD may be specified, there may be a way to signal whether the DTD is
`complete, the DTD summary may be dropped, and XML may support parameter entities, and XML may
`predefine a large number of character entities, for example those from HTML 3.2.
`
`1.2 Relationship to Other Standards
`
`Other standards relevant to users and implementors of XML include:
`
`1. SGML (ISO 8879-1986). Valid XML Documents are SGML documents in the sense described in ISO
`standard 8879.
`2. Unicode, ISO 10646. This specification depends on ISO standard 10646 and the technically identical
`Unicode Standard, Version 2.0, which define the encodings and meanings of the characters which make up
`XML text data.
`
`https://www.w3.org/TR/WD-xml-961114#sec1.1
`
`3 of 25
`
`3/25
`
`
`
`
`
`3/30/2017
`Extensible Markup Language (XML)
`3. IETF RFC 1738. This specification defines the syntax and semantics of Uniform Resource Locators, or
`URLs.
`4. World-Wide Web Consortium Working Draft WD-html32-960909: HTML 3.2 Reference Specification.
`This includes the repertoire of characters to be predefined by an XML processor.
`
`1.3 Notation
`
`The formal grammar of XML is given using a simple Extended Backus-Naur Form (EBNF) notation. Each rule
`in the grammar defines one non-terminal or terminal symbol of the grammar, in the form
`
`symbol ::= expression
`
`Symbols are written with an initial capital letter if they are defined by a regular expression, with an initial
`lowercase letter if they have a more complex definition (i.e. if they require a stack for proper recognition).
`Literal strings are quoted; unless otherwise noted they are not case-sensitive. The distinction between symbols
`which can and cannot be recognized using simple regular-expressions is made for clarity only. It may be
`reflected in the boundary between an implementation's lexical scanner and its parser, but there is are no
`assumptions about the placement of such a boundary, nor even that the implementation has separate modules for
`parser and lexical scanner.
`
`Within the expression on the right-hand side of a rule, the meaning of symbols is as shown below:
`
`#NN
`
`#xNN
`
`where NN is a decimal integer, the expression matches the character in ISO 10646 whose UCS-4 bit-
`string, when interpreted as an unsigned binary number, has the value indicated
`
`where NN is a hexadecimal integer, the expression matches the character in ISO 10646 whose UCS-4 bit-
`string, when interpreted as an unsigned binary number, has the value indicated
`[#xNN-#xNN], [a-zA-Z]
`matches any character with a value in the range(s) indicated (inclusive)
`[^#xNN-#xNN], [^a-z]
`matches any character with a value outside the range indicated
`[^abc]
`matches any character with a value not among the characters given
`"string"
`matches the literal string given inside the double quotes
`'string'
`matches the literal string given inside the single quotes
`
`a b
`
`a | b
`
`a followed by b
`
`a or b but not both
`
`a?
`
`a+
`
`a*
`
`a or nothing; optional a
`
`one or more occurrences of a
`
`zero or more occurrences of a
`(expression)
`expression is treated as a unit; allows subgroups to carry the operators ?, *, or +
`/* ... */
`comment
`[ WFC: ... ]
`
`https://www.w3.org/TR/WD-xml-961114#sec1.1
`
`4 of 25
`
`4/25
`
`
`
`
`
`3/30/2017
`
`Extensible Markup Language (XML)
`Well-formedness check; this identifies by name a check for well-formedness that is associated with a
`production.
`[ VC: ... ]
`Validity check; this identifies by name a check for validity that is associated with a production.
`
`1.4 Terminology
`
`Some terms used with special meaning in this specification are:
`
`may
`
`must
`
`error
`
`Conforming data and software may but need not behave as described.
`
`Conforming data and software must behave as described; otherwise they are in error.
`
`A violation of the rules of this specification; results are undefined. Conforming software may detect and
`report an error and may recover from it.
`reportable error
`An error which conforming software must report to the user, unless the user has explicitly disabled error
`reporting.
`validity constraint
`A rule which applies to all valid XML documents. Violations of validity constraints are errors; they must
`be reported by validating XML processors.
`well-formedness constraint
`A rule which applies to all well-formed XML documents. Violations of well-formedness constraints are
`reportable errors.
`at user option
`Conforming software may or must (depending on the verb in the sentence) provide users a means to select
`the behavior described; it must also allow the user not to select it.
`
`match
`
`Case-insensitive match: two strings or names being compared must be identical except for differences
`between upper- and lower-case letters in scripts which have such a distinction. Characters with multiple
`possible representations in ISO 10646 (e.g. both precomposed and base+diacritic forms) match only if
`they have the same representation, except for case differences, in both strings. Case folding must be
`performed as specified in The Unicode Standard, Version 2.0, section 4.1; in particular, it is recommended
`that case-insensitive matching be performed by folding uppercase letters to lowercase, not vice versa.
`exact(ly) match
`Case-sensitive match: two strings or names being compared must be identical. Characters with multiple
`possible representations in ISO 10646 (e.g. both precomposed and base+diacritic forms) match only if
`they have the same representation in both strings.
`for compatibility
`A feature of XML included solely to ensure that XML remains compatible with SGML; the expectation is
`that in many cases, those aspects of SGML that are not required to satisfy XML's requirements but
`mandated only to achieve conformance may be removed or replaced in the near future by the organizations
`that maintain that standard.
`
`1.5 Common Syntactic Constructs
`
`This section defines some symbols used widely in the grammar.
`
`S (white space) consists of one or more blank characters, carriage returns, line feeds, or tabs.
`
`< 1 White space >
`
`https://www.w3.org/TR/WD-xml-961114#sec1.1
`
`5 of 25
`
`5/25
`
`
`
`
`
`3/30/2017
`
`Extensible Markup Language (XML)
`
`S ::= (#x0020 | #x000a | #x000d | #x0009)+
`
`For some purposes, characters are classified as letters, digits, or other characters:
`
`< 2 Name >
`
`Character ::= [#x20-
`#xFFFFFFFF]
`
`Letter ::= [#x41-#x5A]
`| [#x61-
`#x7A]
`
`/* any ISO 10646 32-bit code */
`
`/* Latin 1 upper and lowercase */
`
`| #xAA |
`#xB5 | #xBA
`
`| [#xC0-
`#xD6] |
`[#xD8-#xF6]
`
`| [#xF8-
`#xFF]
`
`| [#x0100-
`#x017F]
`
`| [#x0180-
`#x0217]
`
`| [#x0250-
`#x1FFF]
`
`| [#x3040-
`#x9FFF]
`
`| [#xF900-
`#xFDFF]
`
`| [#xFE70-
`#xFEFE]
`
`| [#xFF21-
`#xFF3A]
`
`| [#xFF41-
`#xFF5A]
`
`| [#xFF66-
`#xFFDC]
`
`/* Latin 1 supplementary letters */
`
`/* Latin 1 supplementary letters */
`
`/* Extended Latin-A */
`
`/* Extended Latin-B */
`
`/* IPA extensions, spacing modifiers,
`diacritics, Greek, Coptic, Cyrillic,
`Armenian, Hebrew, ... */
`
`/* CJK */
`
`/* CJK compatibility ideographs ... */
`
`/* Arabic presentation forms B */
`
`/* Fullwidth Latin A-Z */
`
`/* Fullwidth Latin a-z */
`
`/* Halfwidth katakana, hangul */
`
`Digit ::= [#x0030-
`#x0039]
`
`/* Correct this table using section 4.5 of Unicode
`2.0
`ISO 646 digits */
`
`| [#x0660-
`#x0669]
`https://www.w3.org/TR/WD-xml-961114#sec1.1
`
`/* Arabic-Indic digits */
`
`6 of 25
`
`6/25
`
`
`
`
`
`3/30/2017
`
`Extensible Markup Language (XML)
`
`| [#x06F0-
`#x06F9]
`
`| [#x0966-
`#x096F]
`
`| [#x09E6-
`#x09EF]
`
`| [#x0A66-
`#x0A6F]
`
`| [#x0AE6-
`#x0AEF]
`
`| [#x0B66-
`#x0B6F]
`
`| [#x0BE7-
`#x0BEF]
`
`| [#x0C66-
`#x0C6F]
`
`| [#x0CE6-
`#x0CEF]
`
`| [#x0D66-
`#x0D6F]
`
`| [#x0E50-
`#x0E59]
`
`| [#x0ED0-
`#x0ED9]
`
`| [#xFF10-
`#xFF19]
`
`/* Eastern Arabic-Indic digits */
`
`/* Devanagari digits */
`
`/* Bengali digits */
`
`/* Gurmukhi digits */
`
`/* Gujarati digits */
`
`/* Oriya digits */
`
`/* Tamil digits (no zero) */
`
`/* Telugu digits */
`
`/* Kannada digits */
`
`/* Malayalam digits */
`
`/* Thai digits */
`
`/* Lao digits */
`
`/* Fullwidth digits
`Ranges taken from Java documentation. Check
`against Unicode 2.0, section 4.6.
`N.B. not clear whether the relevant Greek and
`Hebrew letters should also be digits. Will
`matter for NUMBER attributes. */
`
`A Name is a token beginning with a letter or hyphen and continuing with letters, digits, hyphens, or full stops
`(together known as name characters). The use of any name beginning with a string which matches "-XML-" in a
`fashion other than those described in this specification is a reportable error.
`
`A Number is a sequence of digits. An Nmtoken (name token) is any mixture of name characters.
`
`< 3 Names, Numbers, and Tokens >
`
`Name ::= (Letter | '-') (Letter | Digit | '-' | '.')*
`
`Number ::= Digit+
`
`Nmtoken ::= (Letter | Digit | '-' | '.')+
`
`Nmtokens ::= Nmtoken (S Nmtoken)*
`https://www.w3.org/TR/WD-xml-961114#sec1.1
`
`7 of 25
`
`7/25
`
`
`
`
`
`3/30/2017
`
`(
`
`Extensible Markup Language (XML)
`
`)
`
`Literal data is any quoted string containing neither the quotation mark used as a delimiter nor angle brackets. It
`may contain entity and character references.
`
`< 4 Literals >
`
`Literal ::= '"' [^"<>]* '"'
`
`| "'" [^'<>]* "'"
`
`QuotedCData ::= '"' [^"<>]* '"'
`
`| "'" [^'<>]* "'"
`
`QuotedNames ::= '"' Names '"' | "'" Names "'"
`
`2. Documents
`
`A textual object is an XML Document if it is either valid, or failing that, well-formed, as defined in this
`specification.
`
`2.1 Logical and Physical Structure
`
`Each XML document has both a logical and a physical structure.
`
`Physically, the document is composed of units called entities; it begins in a "root" or document entity, which
`may refer to other entities, and so on ad infinitum. Entities referred to are embedded in the document at the point
`of reference.
`
`The document contains declarations, elements, comments, entity references, character references, and processing
`instructions, all of which are indicated in the document by explicit markup. These concepts and their markup are
`all explained elsewhere in this specification.
`
`The two structures must be synchronous: tags and elements must each begin and end in the same entity, but may
`refer to other entities internally; comments, processing instructions, character references, and entity references
`must each be contained entirely within a single entity. Entities must each contain an integral number of elements,
`comments, processing instructions, and references, possibly together with character data not contained within
`any element in the entity, or else they must contain non-textual data, which by definition contains no elements.
`
`2.2 Characters
`
`The data stored in an XML entity is either text or binary. Binary data has an associated notation, identified by
`name; beyond a requirement to make available the notation name and the size in bytes of the binary data in a
`storage object, XML provides no information about and places no constraints on binary data. In fact, so-called
`binary data may in fact be textual, perhaps even well-formed XML text; but its identification as binary means
`that an XML processor need not parse it in the fashion described by the specification. XML text data is a
`sequence of characters. A character is A character is an atomic unit of text represented by a bit string; valid bit
`strings and their meanings are specified by ISO 10646.
`
`Users may extend the ISO 10646 character repertoire, in the rare cases where this is necessary, by making use of
`the private use areas.
`
`The mechanism for encoding character values into bit patterns may vary from entity to entity. All XML
`processors must accept the UTF-8 and UCS-2 encodings of 10646; the mechanisms for signalling which of the
`
`https://www.w3.org/TR/WD-xml-961114#sec1.1
`
`8 of 25
`
`8/25
`
`
`
`
`
`3/30/2017
`Extensible Markup Language (XML)
`two are in use, and for bringing other encodings into play, are discussed later, in the discussion of character
`encodings.
`
`Regardless of the specific encoding used, any character in the ISO 10646 character set may be referred to by the
`decimal or hexadecimal equivalent of its bit string:
`
`< 5 Character references >
`
`Hex ::= [0-9a-fA-F]
`
`Hex4 ::= Hex Hex Hex Hex
`
`CharRef ::= '&#' Number ';'
`
`| '&u-' Hex4 ';'
`
`2.3 Syntax of Text and Markup
`
`XML text consists of intermingled character data and markup. Markup takes the form of start-tags, end-tags,
`empty elements, entity references, character references, comments, marked sections, document type declarations,
`and processing instructions. The simplest form of XML processor thus could parse a well-formed XML
`document using the following rules:
`
`< 6 Trivial text grammar >
`
`Trivial ::= (PCDATA | Markup)*
`
`Eq ::= S? '=' S?
`
`Markup ::= '<' Name (S Name Eq QuotedCData)* S?
`'>'
`
`/* start-tags */
`
`| '</' Name S? '>'
`
`| '<' Name (S Name Eq QuotedCData)* S?
`'/>'
`
`/* end-tags */
`
`/* empty elements */
`
`| '&' Name ';'
`
`| '&#' Number ';'
`
`| '&u-' Hex4 ';'
`
`/* entity references */
`
`/* character references
`*/
`
`/* character references
`*/
`
`| '<!--' [^-]* ('-' [^-]+)* '-->'
`
`/* comments */
`
`| '<![CDATA[' MsData ']]>'
`
`/* marked sections */
`
`| '<!DOCTYPE' (Name | S)+ ('[' [^]]*
`']')? '>'
`
`/* doc type declaration
`*/
`
`| '<?' [^?]* ('?' [^>]+)* '?>'
`
`/* processing
`instructions */
`
`Most processors will require the more complex grammar given in the rest of this specification.
`
`All text that is not markup constitutes the character data of the document.
`
`https://www.w3.org/TR/WD-xml-961114#sec1.1
`
`9 of 25
`
`9/25
`
`
`
`
`
`3/30/2017
`Extensible Markup Language (XML)
`The ampersand character (&) and the left and right angle bracket (< and >) may appear in their literal form only
`when used as markup delimiters, or within comments, processing instructions, or marked sections. If they are
`needed in the text, they must be represented using the strings "&", "<", and ">". Parsed character data
`is thus any string of characters which does not contain the start-delimiter of any markup. Character data is any
`string of characters not including the marked-section-close delimiter, "]]>". For convenience, the single-quote
`character (') may be represented as "&sq;", and the double-quote character (") as "&dq;".
`
`< 7 Character Data >
`
`PCDATA ::= [^<&]*
`
`MsData ::= [^]]* (((']' ([^]])) | (']]' [^>])) [^]]*)*
`
`2.4 Comments
`
`Comments may appear anywhere that character data may, except in a marked section (more properly, comments
`appearing in a marked section will not be recognized as such). They are not part of the document's character
`data; an XML processor may, but need not, make it possible for an application to retrieve the text of comments.
`An XML processor must inform the application of the length of comments if they are not passed through, to
`enable the application to keep track of the correct location of objects in the XML document. For compatibility,
`the string -- (double-hyphen) may not occur within comments.
`
`< 8 Comments >
`
`Comment ::= '<!--' [^-]* ('-' [^-]+)* '-->'
`
`2.5 Processing Instructions
`
`Processing instructions, usually referred to as PIs, allow the XML processor to pass instructions directly to
`selected applications.
`
`< 9 Processing Instructions >
`
`PI ::= '<?' Name S [^?]* ('?' [^>]+)* '?>'
`
`PIs are not part of the document's character data, but must be passed through to the application. The Name which
`follows the '?' at the beginning of the PI is called the PI target. It is normally the name of a declared notation,
`identifying the application to which it belongs. The use of the PI target "XML" in any other way other than those
`described in this specification is a reportable error.
`
`2.6 Marked Sections
`
`Marked sections can occur anywhere character data may occur; they are used to escape blocks of text which may
`contain characters which would otherwise be recognized as markup. Marked sections begin with the string <!
`[CDATA[ and end with the string ]]>:
`
`< 10 Marked Sections >
`
`MS ::= MsStart MsData MsEnd
`
`MsStart ::= '<![CDATA['
`
`https://www.w3.org/TR/WD-xml-961114#sec1.1
`
`10 of 25
`
`10/25
`
`
`
`
`
`3/30/2017
`
`Extensible Markup Language (XML)
`
`MsEnd ::= ']]>'
`
`Within a marked section, only the MsEnd string is recognized, so that angle brackets and ampersands may occur
`in their literal form and need not be escaped using <, etc. Marked sections cannot nest.
`
`2.7 White Space Handling
`
`While authoring XML documents, it is often convenient to use "white space" (spaces, tabs, blank lines, denoted
`by the nonterminal S in this specification) to set apart the markup for greater readability. Such white space is
`typically not intended for inclusion in the delivered version of the document. On the other hand, "significant"
`white space, to be retained in the delivered version, is common; for example in poetry or source code.
`
`An XML processor must provide two distinct white space handling modes, COLLAPSE and KEEP, and have the
`ability to apply these modes on a per-element basis. They operate as follows:
`
`COLLAPSE
`The XML processor must suppress (i.e. not pass to the application) all white space in an element which
`immediately follows the start-tag and all that which immediately precedes the end-tag. In the element's
`character data, it must convert all sequences of white space characters to a single space (#x0020)
`character, before passing the data to the application.
`
`KEEP
`
`The XML processor must suppress initial line break characters which immediately follow the start-tag of
`the element, and which immediately precede its end-tag. All other characters in the character data of the
`element must be passed to the application without change.
`
`The white space handling mode is signaled through the use of a reserved attribute, whose declaration is as
`follows:
`
`<!ATTLIST * -XML-SPACE (KEEP|COLLAPSE) #IMPLIED>
`
`where the "*" signifies that this attribute may apply to any element.
`
`The value of the attribute sets the white space handling mode for the element and for any contained elements.
`Unless otherwise specified, an XML processor is to set the white space handling mode for the root element of a
`document to COLLAPSE.
`
`2.8 Prolog and Document Type Declaration
`
`The function of the markup in an XML document is to describe its storage and logical structures, and associate
`attribute-value pairs with the logical structure. XML provides a mechanism, the document type declaration, to
`define constraints on that logical structure, and to support the use of predefined storage units. An XML document
`is said to be valid if there is an associated document type declaration and if the document complies with the
`constraints expressed in it.
`
`The document type declaration must appear before the first element in the document.
`
`< 11 XML document >
`
`document ::= Prolog element Misc*
`
`Prolog ::= EncodingDecl? Misc* RMDecl? Misc* (doctypdecl | DtdSummary)?
`Misc*
`
`Misc ::= Comment | PI | S
`https://www.w3.org/TR/WD-xml-961114#sec1.1
`
`11 of 25
`
`11/25
`
`
`
`
`
`3/30/2017
`
`Extensible Markup Language (XML)
`
`For example, the following is a complete XML document, well-formed but not valid:
`
`<greeting>Hello, world!</greeting>
`
`The XML document type declaration may include a pointer to an external entity containing a subset of the
`necessary markup declarations, and may also directly include another, internal, subset of the necessary markup
`declarations.
`
`< 12 Document type declaration >
`
`doctypedecl ::= '<!DOCTYPE' S Name Extid? S? ('[' internalsubset* ']'
`S?)? '>'
`
`internalsubset ::= elementdecl | AttlistDecl | EntityDecl
`
`| NotationDecl | DtdSummary | S | Comment
`
`These two subsets, taken together, are properly referred to as the document type definition, abbreviated DTD.
`However, it is a common practice for the bulk of the markup declarations to appear in the external subset, and
`for this subset, usually contained in a file, to be referred to as "the DTD" for a class of documents. For example:
`
`<!DOCTYPE greeting SYSTEM "hello.dtd">
`<greeting>Hello, world!</greeting>
`
`The system identifier hello.dtd indicates the location of a full DTD for the document.
`
`The declarations can also be given locally, as in this slightly larger example:
`
`<?XML encoding="UTF-8">
`<!DOCTYPE greeting [
`<!ELEMENT greeting (#PCDATA)>
`]>
`<greeting>Hello, world!</greeting>
`
`The character-set label <?XML encoding="UTF-8"> indicates that the document entity is encoded using the UTF-
`8 transformation of ISO 10646. The legal values of the character set code are given in the discussion of character
`encodings.
`
`Individual markup declaration types are described elsewhere in this specification.
`
`2.9 Required Markup Declaration
`
`In many cases, an XML processor can read an XML document and accomplish useful tasks without having first
`processed the entire DTD. However, certain declarations can substantially affect the actions of an XML
`processor. A document author can communicate whether or not DTD processing is necessary using a required
`markup declaration (abbreviated RMD) processing instruction:
`
`< 13 Required markup declaration >
`
`RMDecl ::= '<?XML' S 'RMD' Eq ('NONE' | 'INTERNAL' | 'ALL') S? '?>'
`
`https://www.w3.org/TR/WD-xml-961114#sec1.1
`
`12 of 25
`
`12/25
`
`
`
`
`
`3/30/2017
`Extensible Markup Language (XML)
`In an RMD, the value NONE indicates that an XML processor can parse the containing document correctly
`without first reading any part of the DTD. The value INTERNAL indicates that the XML processor must read and
`process the internal subset of the DTD to parse the containing document correctly. The value ALL indicates that
`the XML processor must read and process the declarations in both the subsets of the DTD to parse the containing
`document directly.
`
`The RMD must indicate that the entire DTD is required if the external subset contains any declarations of
`
`1. undistinguished empty elements, and these elements occur in the document, or
`2. attributes with default values, and elements to which these attributes apply appear in the document, or
`3. entities, and references to those entities appear in the document.
`
`If such declarations occur in the internal but not the external subset, the RMD should take the value INTERNAL.
`It is an error to specify INTERNAL if the external subset is required, or to specify NONE if the internal or
`external subset is required.
`
`If no RMD is provided, the effect is identical to an RMD with the value ALL.
`
`3. Logical Structures
`
`Each XML document contains one or more elements, the boundaries of which are either delineated by start-tags
`and end-tags, or, for empty elements, are limited to the start-tag. Each element has a type, identified by name
`(sometimes called its generic identifier or GI), and may have a set of attributes. Each attribute has a name and a
`value.
`
`This specification does not constrain the semantics, use, or (beyond syntax) names of the elements and attributes.
`
`3.1 Start- and End-Tags
`
`The beginning of every XML element is marked by a start-tag.
`
`< 14 Start-tag Recognition >
`
`STag ::= '<' Name (S Attribute)* S? '>'
`
`Attribute ::= Name Eq QuotedCData
`
`[ VC: Attribute Value Type ]
`
`The Name in the start- and end-tag rules gives the element's type. The Name-QuotedCData pairs are referred to
`as the attributes of the element, with the Name referred to as the attribute name and the content of the
`QuotedCData (the characters between the "'" or '"' delimiters) as the attribute value.
`
`Validity Constraint - Attribute Value Type:
`Attribute values must be of the type declared for the attribute. (For attribute types, see the discussion of attribute
`declarations.)
`
`The end of every element which is not empty is marked by an end-tag:
`
`< 15 End-tag Recognition >
`
`ETag ::= '</' Name S? '>'
`
`The Name, once again, gives the element's type.
`
`https://www.w3.org/TR/WD-xml-961114#sec1.1
`
`13 of 25
`
`13/25
`
`
`
`
`
`3/30/2017
`Extensible Markup Language (XML)
`If an element is empty, the start-tag constitutes the whole element. An empty element takes a special form:
`
`< 16 Tags for empty elements >
`
`EmptyElement ::= '<' Name (S Attribute)* S? '/>';
`
`For compatibility, an empty element may have the same syntax as a start-tag; in this case, it cannot be
`recognized based on syntax, but must be declared as being empty. Such elements are called undistinguished
`empty elements.
`
`The text between the start-tag and end-tag is called the element's content:
`
`< 17 Content of elements >
`
`content ::= (element | PCDATA | MS | PI | Comment)*
`
`element ::= EmptyElement
`
`| STag content ETag
`
`/* empty elements */
`
`[ WFC: GI Match ]
`
`Well-Formedness Constraint - GI Match:
`The Name in an element's end-tag must match that in the start-tag.
`
`3.2 Well-Formed XML Documents
`
`A textual object is said to be a well-formed XML document if, first, it matches the production above labeled
`XML Document, and if:
`
`1. There are no undistinguished empty elements which have not been specified as such in an element
`declaration.
`2. For each entity reference which appears in the document, the entity name has been declared in the
`document type declaration.
`
`Matching the "XML Document" production implies that:
`
`1. It contains one or more elements.
`2. There is one element, called the root, for which neither the start-tag nor the end-tag are in the content of
`any other element. For all other elements, if the start-tag is in the content of another element, the end-tag is
`in the content of the same element. More simply stated, the elements, delineated by start- and end-tags,
`nest within each other properly.
`
`As a consequence of this, for each non-root element C, there is one other element P such that C is in the content
`of P, but is not in the content of any other element that is in the content of P. Then P is referred to as the parent
`of C, and C as the child of P.
`
`3.3 Element Declaration
`
`The element structure of an XML document may be declared fully or partially. Such declarations serve two
`purposes:
`
`1. To establish a set of structural constraints, i.e. a grammar, for a class of documents, and to verify that
`documents are valid, i.e. comply with that grammar.
`2. To make XML documents well-formed by declaring undistinguished empty elements.
`
`https://www.w3.org/TR/WD-xml-961114#sec1.1
`
`14 of 25
`
`14/25
`
`
`
`
`
`3/30/2017
`Extensible Markup Language (XML)
`An element declaration constrains the element's type and its content. The content constraints will be described
`first; four forms are available: empty, any, mixed content, and element content.
`
`Declarations often contain references to element types, for example when constraining which element types can
`appear as children of others, and which attributes may be attached to which element types. At user option, an
`XML processor may issue a warning when n