Chapter 4 XML Syntax
outline:
1. XML syntax rules 2. Element syntax 3. Comment syntax 4. CDATA syntax 5. Namespaces syntax 6. Entity syntax 7. DTD syntax
Through studying the previous three chapters, we already have an understanding of what XML is, its implementation principles, and related terminology. Next, we will start to learn the syntax specifications of XML and write our own XML documents.
1.XML syntax rules
XML documents are similar to HTML original codes, and also use tags to identify content. The following important rules must be followed when creating XML documents:
Rule 1: There must be an XML declaration statement. We have already mentioned this in the previous chapter. The declaration is the first sentence of the XML document and its format is as follows:
<?xml version="1.0" standalone="yes/no" encoding="UTF-8"?>
The purpose of the declaration is to tell the browser or other processing program that this document is an XML document. The version in the declaration statement indicates the version of the XML specification that the document complies with; standalone indicates whether the document comes with a DTD file, if so, the parameter is no; encoding indicates the language encoding used in the document, and the default is UTF-8.
Rule 2: Whether there is a DTD file. If the document is a "valid XML document" (see the previous chapter), then the document must have a corresponding DTD file and strictly comply with the specifications set by the DTD file. The declaration statement of the DTD file follows the XML declaration statement in the following format:
<!DOCTYPE type-of-doc SYSTEM/PUBLIC "dtd-name">
in:
"!DOCTYPE" means you want to define a DOCTYPE;
"type-of-doc" is the name of the document type, defined by you, usually the same as the DTD file name;
Only use one of the two parameters "SYSTEM/PUBLIC". SYSTEM refers to the URL of the private DTD file used by the document, while PUBLIC refers to the URL of the public DTD file used by the document.
"dtd-name" is the URL and name of the DTD file. All DTD files have the suffix ".dtd".
We still use the above example, it should be written like this:
<?xml version="1.0" standalone="no" encode="UTF-8"?>
<!DOCTYPE filelist SYSTEM "filelist.dtd">
Rule 3: Pay attention to your capitalization In XML documents, there is a difference between upper and lower case. <P> and <p> are different identifiers. Note that when writing elements, the case of the front and rear identifiers should remain the same. For example: <Author>ajie</Author>, it is wrong to write <Author>ajie</author>.
You'd better get into the habit of either all caps, all lower case, or capitalize the first letter. This reduces documentation errors caused by case mismatches.
Rule 4: Add quotes to attribute values. In HTML code, attribute values can be quoted or not. For example: <font color=red>word</font> and <font color="red">word</font> can both be interpreted correctly by the browser.
However, in XML, it is stipulated that all attribute values must be quoted (can be single quotes or double quotes), otherwise it will be regarded as an error.
Rule 5: All tags must have a corresponding closing tag. In HTML, tags may not appear in pairs, such as?lt;br>. In XML, it is stipulated that all tags must appear in pairs. If there is a start tag, there must be an end tag. Otherwise it will be considered an error.
Rule 6: All empty tags must also be closed. An empty tag is a tag with no content between the tag pairs. For example, <br>, <img> and other tags. In XML, it is stipulated that all tags must have an end tag. For such empty tags, the processing method in XML is to add / at the end of the original tag, and that's it. For example:
<br> should be written as <br />;
<META name="keywords" content="XML, SGML, HTML"> should be written as <META name="keywords" content="XML, SGML, HTML" />;
<IMG src= "cool.gif"> should be written as <IMG src= "cool.gif" />
Chapter 4 XML Syntax
2. Syntax of elements
An element consists of a pair of identifiers and their content. Like this: ajie. The name of the element and the name of the identifier are the same. Identities can be further described using attributes.
In XML, there are no reserved words, so you can use any word as an element name. However, the following regulations must also be observed:
1. The name can contain letters, numbers and other letters;
2. The name cannot start with a number or "_" (underscore);
3. The name cannot start with the letters xml (or XML or Xml ..)
4. The name cannot contain spaces.
5. The name cannot contain ":" (colon)
To make elements easier to read, understand and manipulate, we have some more suggestions:
1. Do not use "." in the name. Because in many programming languages, "." is used as an attribute of an object, for example: font.color. For the same reason, it is best not to use "-". If it must be used, replace it with "_";
2. Keep the name as short as possible.
3. Try to use the same standard for capitalization and capitalization of names.
4. The name can use non-English characters, such as Chinese. But some software may not support it. (IE5 currently supports Chinese elements.)
In addition, add a little explanation about the properties. In HTML, attributes can be used to define the display format of elements. For example: <font color="red">word</font> will display word in red. In XML, attributes are just descriptions of identifiers and have nothing to do with the display of element content. For example, the same sentence: <font color="red">word</font> will not display the word in red. (So, some netizens will ask: How to display text in red in XML? This requires using CSS or XSL, which we will describe in detail below.)
3. Syntax of comments
Comments are added to the XML document to facilitate reading and understanding, and will not be interpreted by the program or displayed by the browser.
The syntax for comments is as follows:
<!-- Here is the comment information-->
As you can see, it is the same as the comment syntax in HTML, which is very easy. Developing good commenting habits will make your documents easier to maintain, share, and look more professional.
4. Syntax of CDATA
The full name of CDATA is character data, which is translated as character data. When we write XML documents, we sometimes need to display letters, numbers and other symbols themselves, such as "<". In XML, these characters already have special meanings. What should we do? This requires the use of CDATA syntax. The syntax format is as follows:
<![CDATA[Place the characters to be displayed here]]>
For example:
<![CDATA[<AUTHOR sex="female">ajie</AUTHOR>]]>
The content displayed on the page will be "<AUTHOR sex="female">ajie</AUTHOR>"
Chapter 4 XML Syntax
5. Syntax of Namespaces
Namespaces translates to namespace. What does the namespace do? When we use other people's or multiple DTD files in an XML document, there will be such a contradiction: because the identifiers in XML are created by ourselves, in different DTD files, the identifier names may be the same but have different meanings. This may cause data confusion.
For example, in a document <table>wood table</table>, <table> represents a table.
In another document <table>namelist</table>, <table> represents a table. If I need to work on both documents at the same time, a name conflict will occur.
To solve this problem, we introduced the concept of namespaces. Namespaces distinguish these identifiers with the same name by adding a URL to the identifier name.
Namespaces also need to be declared at the beginning of the XML document. The syntax of the declaration is as follows:
<document xmlns:yourname='URL'>
Where yourname is the name of the namespaces defined by you, and URL is the URL of the namespace.
Assuming that the "table <table>" document above comes from http://www.zhuozi.com, we can declare it as
<document xmlns:zhuozi='http://www.zhuozi.com'>
Then use the defined namespace in subsequent tags:
<zhuozi:table>wood table</table>
This distinguishes the two <table>s. Note: Setting the URL does not mean that this logo really needs to be read from that URL, it is just a sign of distinction.
6. Entity syntax
Entity is translated as "entity". Its function is similar to the "macro" in word, and can also be understood as a template in DW. You can pre-define an entity and then call it multiple times in one document, or call the same entity in multiple documents.
Entity can contain characters, text, etc. The benefits of using entity are: 1. It can reduce errors. Multiple identical parts in the document only need to be entered once. 2. It improves maintenance efficiency. For example, if you have 40 documents that all contain copyright entities, if you need to modify this copyright, you don't need to modify all the files. You only need to change the originally defined entity statement.
XML defines two types of entities. One is the ordinary entity we are talking about here, used in XML documents; the other is parameter entity, used in DTD files.
The definition syntax of entity is:
<!DOCTYPE filename [
<!ENTITY entity-name "entity-content"
]
>
For example, I want to define a piece of copyright information:
<!DOCTYPE copyright [
<!ENTITY copyright "Copyright 2001, Ajie. All rights reserved"
]
>
If my copyright information content is shared with others in an XML file, I can also use the external call method. The syntax is like this:
<!DOCTYPE copyright [
<!ENTITY copyright SYSTEM "http://www.sample.com/copyright.xml">
]
>
The reference syntax of the defined entity in the document is: &entity-name;
For example, the copyright information defined above is written as ?copyright; when called.
The complete example is as follows, you can copy it and save it as copyright.xml to view the example:
<?xml version="1.0" encoding="GB2312"?>
<!DOCTYPE copyright [
<!ENTITY copyright "Copyright 2001, Ajie. All rights reserved">
]>
<myfile>
<title>XML</title>
<author>ajie</author>
<email>[email protected]</email>
<date>20010115</date>
©right;
</myfile>
Chapter 4 XML Syntax
7. DTD syntax
DTD is a necessary file for "valid XML document". We use DTD files to define the rules and mutual relationships of elements and identifiers in the document. How to create a DTD file? Let's learn together:
1. Set elements
Elements are the basic building blocks of XML documents. You need to define an element in the DTD and then use it in the XML document. The definition syntax of an element is: <!ELEMENT DESCRIPTION (#PCDATA, DEFINITION)*>
illustrate:
"<!ELEMENT" is a declaration of an element, indicating that what you want to define is an element;
"DESCRIPTION" after the statement is the name of the element;
"(#PCDATA, DEFINITION)*>" is the usage rule of this element. Rules define what elements can contain and how they relate to each other. The following table outlines the rules for elements:
2. Element rule table: