XML is an open standard that defines a plain text encoding system to compose documents. XML underlies many publishing systems, including the XHTML and the file system for Microsoft Office components. This is a very useful system that may end up having many other uses.
The XML protocol
The term “XML” stands for the Extensible Markup Language.
This is a very similar name to the Hypertext Markup Language, better known as HTML. HTML is the set of rules that dictates the layout of Web pages and XML has a similar purpose. In fact, there is a merged system, called XHTML that combines both standards.
The “markup” word in both XML and HTML is a term that developed in the publishing industry. It refers to handwritten notes in the margins of documents that gave instructions to typesetters.
Editors and proofreaders also write markup notes. Sometimes editors wroite markup symbols around text and the XML syntax mimics those bracketing and highlighting notes.
Markup languages used for digital publications are all collectively defined in the Standard Generalized Markup Language definition (SGML). This is a registered standard that lays down conventions that are common to all subsequent markup languages, including XML.
XML, HTML, and XHTML are all standards that are maintained by the World Wide Web Consortium.
The XML format
XML isn’t a programming language.
It is a standard that enables the specification of data structures.
The text files containing XML code get interpreted by an XML parser. The parse process strips out the XML codes as they implement the formatting instructions contained in XML files. So, the people who read those documents never see the XML code, just the finished text.
As it is a plain text system, XML can be written in any standard text editor, such as Notepad.
XML provides the power to create your own tags. It has a standard for syntax but no semantics. The extensibility of XML makes it a very flexible system.
However, the many layers of language declaration make learning XML very complicated. This guide is intended to be an introduction to XML, not a full tutorial.
The XML standard has its own special syntax and conventions. XML, like many markup languages, indicates formatting instructions with tags. XML tags are very similar to HTML tags. This means that XML documents look very similar to HTML documents.
An XML tag is a keyword that is surrounded by angle brackets (<>). That code is called an element and the keyword inside the tag is the element name.
The format of an XML document works through nested tags. There needs to be a root element and everything is contained within the bracket of that tag.
Most instructions are implemented in XML format by bracketing text with a start tag and an end tag. For example,
, this is a convention of XML formatting – the end tag is the same as the start tag but with a forward slash in from of its name.
Some XML tags don’t work in pairs. This is because they don’t indicate actions to performs on blocks of text but denote single features, such as . These solitary tags are called empty-element tags.
Extensible markup language
The distinctive feature of XML is the “extensible” part of its name. It is possible to make up your own tags with this standard and even create an entire new markup language within its framework. This has led to a lot of variations in the standard – in fact, a lot of new markup languages that have their own standards.
Some examples of new markup languages that are based on XML are:
- XHTML – Extensible Hypertext Markup Language
- RSS – RDF Site Summary or Really Simple Syndication
- SVG – Scalable Vector Graphics
Difference between XML and HTML
HTML was specifically designed for creating Web pages. However, there is no reason why people can’t pass HTML files between them for conveying information.
XML is widely used as the basis for a range of document formatting standards. However, it can also be used within HTML to create sections of Web pages. So, what is the difference between them and why would anyone need to learn both?
XML has no predefined tags; HTML is a library of tags. It is possible to define elements in XML to make it exactly like HTML. However, although very similar in their formats, XML and HTML have different purposes.
The key definition of the difference between HTML and XML is that HTML is designed to display data whereas XML is designed to transfer data.
This is a difficult distinction to conceptualize. XML creates data structures that can be reused. The reuse part of that ability is the key difference. Essentially, a data structure is a type of format or layout and you can make that in HTML. However, if you create a particular structure in HTML its purpose is to order data in a display on the screen. With HTML, that layout is stored as a new data type.
The idea that XML transfers data but HTML does not is undermined by the fact that content is hard-coded into a Web page by embedding it in HTML tags. Thus, the HTML codes transport data with them when they are sent from the server to the browser. Similarly, not all data held in HTML is going to be displayed.
When thinking about HTML and XML, stick with the idea that HTML is only really used for Web pages but XML can be sued for many different types of documents, such as Microsoft Word and OpenOffice documents as well as Web pages.
XHTML is a blend of XML and HTML. It is officially a subset of CML, so it should be regarded as an XML adaptation that integrates HTML characteristics. As you will read in the next section, XML is very easy to customize and integrates elements that are conventional to other languages is relatively simple.
XML doesn’t have its own library of elements, so it isn’t impossible to simply import HTML elements into XML. Essentially, that’s what XHTML is.
The XML schema is the methodology that enables the XML system to be variable, or “extensible.” There is no point in making up new elements for an XML document if none of the Web browsers or XML parsers in the world are able to know what to do with the new codes you created with your own tags.
As the XML schema describes the standard that is going to be used for the XML document, it has to be precisely defined. The precision of the XML schema definition allows for flexibility in the XML version that is being used. That is, the ability to adapt XML is only possible through the rigidity in the definition of the XML schema. Therefore, there is a language for XML schemas; it is called XML Schema Definition (XSD).
The XML schema is held in a separate document to the XML files that use the extended language defined in it. This allows the same standard to be applied to many XML files. The XML schema file is written in XSD. The XSD allows the creation of new elements, which are defined by specifying its structure and data types – an element can be formed by the combination of many components.
Like XML, XSD uses a set of tags. The naming of elements is very flexible. However, the convention is to start the XSD “xs:” after the opening angle bracket. For example: .
Some people prefer a three-letter identifier, such as “xsi.” In fact, the identifier to use is up to the programmer. The identifier to use is set up at the top of the XSD file. It is dictated by a namespace declaration, which is one of the first lines in the XML schema file. The namespace concept will be explained further down in this guide.
The first two lines of an XML schema document declare the version of XML to be used for the code and a containing tag for the entire document that explains that this is an XML Schema and the namespace to be used. The declaration line doesn’t need to be paired with a close tag but the schema tag does. So, everything in the XML Schema declaration is enclosed within the schema tag.
As you can see from the example above, The World Wide Web Consortium (W3C) provides a definition of the namespace for XSD.
Document type definition
The XML schema system was solidified in 2001 when it became a recognized standard maintained by the World Wide Web Consortium. Before XML schemas existed, XML used a system called Document Type Definition (DTD).
DTD is a system that is part of the SGML conventions. It creates a set of rules that adapt to the standard XML. This is the way programmers defined variations to the XML language for a group of XML documents – exactly the way that the XML Schema was explained above.
The conventions of DTD allow for the specification of external standards and internal standards. The external standards are called formal public identifiers and the internal standards are called system identifiers.
Like the XML schema, the DTD allows the definition of the language to be used to be stored in a separate file. In fact, it allows two files to be included for language definition – one for public identifiers and one for system identifiers.
A public identifier file is indicated by the PUBLIC keyword in the XML declaration at the top of the XML file. A SYSTEM keyword in the declaration signifies that the file contains system identifiers. The DTD file can be local, in which only its name needs to be included in the declaration. It can also be hosted on a different server and accessed in the declaration through a URL.
The format of a DTD file does not include a namespace. Instead, the syntax of a DTD element declaration starts every line with !ELEMENT after the opening angle bracket of a tag. The DTD system requires fewer declarations in order to create a new element.
XML Schema or DTD?
The XML schema definition was created in order to provide more functionality than that available in the DTD system. So, it is better to use the XML schema system because it was specifically defined to provide an improvement. However, DTD was not deprecated in the XML definition and so both methods are valid and in circulation side by side.
Whether the XTL file is going to use DTD or the XML Schema for its language definition is made clear at the beginning of the XML document.
Here is the top of an XML file that is going to use the DTD system with a local file for the DTD.
Here is the top of an XML file that will be using an XML Schema.
So, the words in a file to look out for if you want to work out whether it is using DTD or XML Schema are !DOCTYPE for DTD or xmlns for XML schema. You can also look for the presence of the .dtd or .xsd file extensions.
In the XSD examples shown above, you saw the “xs” alias used to indicate the source of an element name. In truth, the use of “xs” for that identifier isn’t set in stone. The programmer can decide to use any combination of letters. That alias is set up by the xmlns attribute. This indicates an XML namespace.
The XSD file is a library of elements. It includes all of the structures and tags that can be used within an XML document. However, it is possible to refer to several XML schemas in an XML file. This functionality provides a great deal of flexibility for organizations that develop XML systems because they can call in pre-written XML schemas and improve efficiency through the reuse of element definitions.
The concepts of XML namespace and XML schema are very closely linked. XML namespaces enable element definitions in XML schemas to be defined independently without worrying about name uniqueness.
For example, what if one XML schemas has created a structure called and there is also an element called in another schema that the development team wants to use? Even if the team only intends to use one of these definitions, the XML parser will scan through both XML schemas specified in the XML document in order to discover what that element name means. The prefix that is created by the namespace declaration solves that problem.
In this example, the two XML schemas involved will be invoked through two xmlns statements. So, if one is declared with xmlns:bk and the other is declared with xmlns:tv, the use of makes it clear which XSD file the XML parser should look in for the meaning of “book.”
The prefix for a namespace can be set up at the top of the XML document or it can be embedded within the opening tag of an element. The format for the declaration in the attribute is:
As with all XML attributes, the value has to be in quotes.
Strangely, the URL doesn’t relate to anything – the XML parser doesn’t access the file at the given address. The field is effectively defunct but it has to be there. You could just use http://www.w3.org/TR/html4/ or the address of a page on your own website.
Anyone familiar with HTML will understand the concept of style sheets., specifically CSS. Style sheets are an SGML system that is implemented in many different markup languages. In XML, the style sheet convention is called the Extensible Stylesheet Language, or XSL. As the name suggests, this isn’t just a format, but an entire language. So, this gives you more syntax to learn and semantic concepts to implement.
Given that the key characteristic of XML is that it is concerned with the transfer of data and not the display of data, you might think that it is unusual for it to include a style sheet system. After all, a style sheet is concerned with setting the appearance of elements when they appear in a visible page. Well, an XSL file can add presentational instructions into an XML file.
The XSL system works like a “search and replace” function. The XSL system includes patterns to look for and then attributes to add in when those works are encountered.
The most important function of XSL is to implement a quick language translation.
XSLT stands for XSL translation. This is the most widely-used implementation of XSL. As it has already been noted, XML is useful for the transfer of data. XSLT makes it possible to convert XML files so that they are compatible with the conventions of other protocols. A very common usage of this is to make XML pages accessible in HTML. See this guide to learn out more about XSLT and why it’s so important.
XPath expressions guide the search-and-replace action of an XSL file so that it skips to specific instances of words and it also has value test capabilities. That means it has some rudimentary programming capabilities because it optionally selects certain locations.
Those who get frustrated by the lack of programming elements in XML and HTML will be very happy to encounter Ajax. It enables interaction with a server in the background of a Web page without disturbing the static display of the page. Although this technology was written to include XML as a data management vehicle, this can be replaced with JSON and so in most implementations you encounter, it won’t really have much to do with XML.
Hopefully, this introduction has given you some idea of the power and flexibility of XML. It is a difficult technology to comprehend when its distinction from the functions of HTML is sometimes difficult to see.
Getting to groups with XML is an easier task if it carried out by experience. So, basically, you need to have a go at writing your own XML system in order to fully understand it. Fortunately, there are plenty of free tutorials in XML available on the World Wide Web.
Once you have fully absorbed the use of XML, you can add in a deeper understanding of XML schemas and then move on to try our translation systems with XLST.
Finally, you will be ready to try Ajax to add some nice features to your Web pages that both captivate your site visitors and improve the efficiency of your site’s code.