Introduction to XML

Extensible Markup Language (XML) XML is a standard, simple, self-descriptive way of encoding both text and data so that content can be processed with relatively little human intervention and exchanged across diverse hardware, operating systems, and applications. It is a subset of Standard Generalized Markup Language (SGML), which was first developed in the 1970's as a means of exchanging text files between printers. The primary driving force between the adoption of XML on the Internet was to simplify the delivery of information.

XML Specification

XML documents consist entirely of Unicode characters and according to the specification, XML documents must be well-formed, that is they satisfy a list of syntax rules provided in the specification. The list is fairly extensive; some keypoints are:

  • It contains only properly encoded legal Unicode characters.
  • None of the special syntax characters such as "<" and "&" appear except when performing their markup-delineation roles.
  • The begin, end, and empty-element tags which delimit the elements are correctly nested, with none missing and none overlapping.
  • The element tags are case-sensitive; the beginning and end tags must match exactly.
  • There is a single "root" element which contains all the other elements.

Example of XML

Here is a small but complete example of an XML document.

This example contains 5 elements: painting, img, caption, and two dates. The date elements are children of caption, which is a child of the root element painting. img has two attributes, src and alt.

Example of XML in Typesetting

Benefits of XML
Simplicity

Information coded in XML is easy to read and understand, plus it can be processed easily by computers.

Openness

XML is a W3C standard, endorsed by software industry market leaders.

Extensibility

There is no fixed set of tags. New tags can be created as they are needed.

Self-description

In traditional databases, data records require schemas set up by the database administrator. XML documents can be stored without such definitions, because they contain meta data in the form of tags and attributes.

XML provides a basis for author identification and versioning at the element level. Any XML tag can possess an unlimited number of attributes such as author or version.

Contains machine-readable context information

Tags, attributes and element structure provide context information that can be used to interpret the meaning of content, opening up new possibilities for highly efficient search engines, intelligent data mining, agents, etc.

This is a major advantage over HTML or plain text, where context information is difficult or impossible to evaluate.

Separates content from presentation

XML tags describe meaning not presentation. The motto of HTML is: "I know how it looks", whereas the motto of XML is: "I know what it means, and you tell me how it should look." The look and feel of an XML document can be controlled by XSL style sheets, allowing the look of a document (or of a complete Web site) to be changed without touching the content of the document. Multiple views or presentations of the same content are easily rendered.

Supports multilingual documents and Unicode

This is important for the internationalization of applications.

Facilitates the comparison and aggregation of data

The tree structure of XML documents allows documents to be compared and aggregated efficiently element by element.

Can embed multiple data types

XML documents can contain any possible data type - from multimedia data (image, sound, video) to active components (Java applets, ActiveX).

Can embed existing data

Mapping existing data structures like file systems or relational databases to XML is simple. XML supports multiple data formats and can cover all existing data structures and Provides a 'one-server view' for distributed data

XML documents can consist of nested elements that are distributed over multiple remote servers. XML is currently the most sophisticated format for distributed data - the World Wide Web can be seen as one huge XML database.

Rapid adoption by industry

Software AG, IBM, Sun, Microsoft, Netscape, DataChannel, SAP and many others have already announced support for XML. Microsoft will use XML as the exchange format for its Office product line, while both Microsoft's and Netscape's Web browsers support XML. SAP has announced support of XML through the SAP Business Connector with R/3. Software AG supports XML in its Natural product line and provides Tamino, a native XML database.

Letterpart latest News
Webservices Client LoginDownload our product sheets