Can XML Change the Way We Work?

April 30, 2002

In the identification, evaluation and acquisition of suitable computer-related software products and skills, as well as Internet products and skills, organisations have continuously sought to streamline their business processes. Historically the route taken has been dictated by the outcome of previous decisions – organisations have been governed by their chosen hardware and software environments or had to justify significant re-investment in a change of direction. This could, however, all be about to change radically in the immediate future.

In recent times dot.com mania has forced many organisations to take a long hard look at the way that they do business. The advent of the Internet, and in particular the World Wide Web, whilst fostering more prolific as well as enhanced and different means of communication, has produced a more favourable climate for businesses to collaborate more effectively. The emerging technologies lend themselves to streamlining both internal and external business processes and organisations are able to build on their existing investment and adopt a more evolutionary approach.

However, with the predicted downturn in the economy and the fallout from the events of 11 September, organisations are becoming more cautious and less cavalier about how they invest their time and money. In the search for the winning e-formula capable of catapulting them into the number one position ahead of their competitors, organisations will look seriously at the possibilities heralded by XML (eXtensible Mark-up Language) and many will harness the benefits it has to offer to improve their chances of success.

Enhanced communication will depend on innovation and technical advances in both the telecommunications infrastructure as well as in more efficient ways of creating, distributing and presenting data content. Despite the complexity of the problem, the answer could be simple. It could be XML.

XML Explained

Historically, the exchange of electronic data between applications running on disparate software and hardware platforms has always been a messy and extremely expensive process due to the incompatibility of proprietary systems. Most of us will have experienced this to a limited degree when converting, for example, WordPerfect files for use with Word or more recently when upgrading files generated under Word 95 for use with Word 97 and, of course, such applications are rarely backwards compatible. A warning is normally displayed to the effect that unsupported formatting may be lost in the conversion.

The ISO standard SGML (Standard Generalised Mark-up Language) ratified in 1986 was developed to provide an environment for developers to create their own mark-up language as a solution to the problems created by proprietary mark-up. Whilst SGML enables developers to create their own applications and languages to describe any type of information, thus ensuring interoperability, it is complicated and costly to implement and as a result has enjoyed limited acceptance.

On the other hand, whilst HTML (Hypertext Mark-up Language), a subset of SGML, has facilitated and accelerated the widespread exchange of data via the World Wide Web and through corporate intranets, it is limited to the presentation of data content not the manipulation of it. Despite the simplistic and open nature of the HTML tag set, both Netscape and Microsoft have added their own tags and are thus guilty of protecting their own proprietary interests by creating incompatible versions of HTML and fuelling the browser wars.

So, if SGML is too complicated and HTML is too simplistic, there had to be a compromise. You’ve guessed it – XML.

XML is another subset of SGML. It was developed for the Web as a very powerful and flexible, yet relatively simple, means of enabling interoperability in the complex world of electronic data exchange. It was designed to offer users most of the power of SGML but with HTML’s ease of use. XML was not designed to replace HTML or SGML but to complement them.

XML offers users the ability to mark-up data content using tags akin to database fields for subsequent post-processing whilst HTML, with its limited tag set and functionality, presides over the presentation of the data. It also offers the ability to separate the mark-up of content, structure and format. Structure, in this sense, being the logical organisation of information rather than the preformatted layout of a document.

HTML is simple. Although, initially, people baulk at the prospect of learning a programming language they soon learn that HTML is not in fact a programming language merely a simple mark-up language and once they get to grips with it they are very often astounded at just how easy it is to create a credible hand-coded Web page. The proliferation of HTML editors and authoring packages have enabled novices to self-publish with a degree of confidence whilst having no knowledge of or interest in the underlying technology and for the professional user such packages also take the pain out of generating huge volumes of Web pages all of which contribute to the exponential growth of the Web.

Fortunately, Web browsers are quite forgiving – overlooking, as they do, many of the typical mistakes generated by the vast majority of novice Web page publishers who show little regard for prescribed HTML standards. XML, on the other hand, is less forgiving and, because it is extensible and a self-describing database, it demands that documents be ‘well-formed’ – otherwise browsers are unable to open them. For a document to be ‘well-formed’ it must conform to a minimum structural requirement. Thus it must contain a minimum of one element and a unique root element under which all other elements are nested, all tags must be properly nested and end tags must match opening tags. It should also be remembered that XML is case sensitive. XHTML, the latest version of the HTML specification to be ratified by W3C (the World Wide Web Consortium), adopts stringent rules borrowed from XML requiring HTML to be ‘well-formed’.

XML’s greatest benefit has to be its ability to make information exchange easier. Its extensible, non-proprietary nature enables it to be used by any individual, company or industry regardless of software or hardware environment. Its adoption, however, will necessitate commitment to a more detailed understanding of business processes and to the introduction of a degree of structure to both documents and those processes.

The most important feature XML inherits from SGML is its extensibility. Unlike HTML, XML is not a predefined mark-up language, but a meta-language. It can be used to create your own mark-up language providing the opportunity to assign your own meaning to the data content and, because content and formatting are separated in XML, output can be published as print on paper or electronically, for example, as a Web page, CD-ROM or database file.

A quick look at both HTML and XML mark-up reveals that, whilst describing the same content, the tags perform different functions. The HTML tags are principally concerned with the layout and size of content within the document whereas the XML tags assign meaningful field names to the data such that it can be located for post-processing purposes or even for sorting or filtering prior to generation of the final document.

HTML Mark-up

Working with XML

XML: Introduction

One Day Course Module

£345 + VAT

First day of the 5 day XML For The Web training package.

XML Mark-up

XML: Introduction

One Day Course Module

£345 + VAT

First day of the 5 day XML For The Web training package.

XML is a key component in the evolution of electronic data exchange providing a powerful yet affordable alternative to the EDI (Electronic Data Interchange) projects hitherto reserved for multinational corporations.

Additional Elements to XML

In addition to version 1.0 of the XML specification, which governs the way in which a document is marked up, other working drafts and specifications such as DTDs and XML Schemas, XSL Formatting Objects and XSL Transformations, and XLink and XPointer are being developed by the W3C in order to offer extensible formatting and linking capability.

DTDs and Schemas

As previously stated, the XML specification deals specifically with the data content of a document with XML tags describing the content they enclose. In order for the data content to be output, developers need to define the rules for a document to be considered valid.

A valid XML document is a document that is ‘well-formed’ and that conforms, when compared by a validating parser, to the rules laid down by the DTD (Document Type Definition) or XML Schema.

The DTD, which can either be embedded within the XML document or held separately, defines which elements the XML document contains, what the relationship is between each element and what attributes an element may have. The presence of a DTD will ensure that authors subsequently create documents according to a predefined structure and syntax. XML documents that conform to a DTD may be confidently displayed using CSS (Cascading Style Sheets), XSL or other style sheet languages. DTDs also make it easier to share data with colleagues, other departments or within an industry or broader interest group.

Whilst DTDs have been used to describe the structure of a document for over 10 years, they can be limiting when faced with a large number of data elements and, as a result, applications developers have turned increasingly to XML Schemas. Since XML’s ability to mark-up data lends itself to database-like functions, many large database systems are beginning to incorporate XML functionality.

The XML Schema specification was ratified by the W3C in May 2001 and a number of software packages and XML parsers have since appeared in support of the standard. Developers and many industries are now creating their own Schemas, details of which may be found at www.xml.org and www.schema.net. Microsoft has also developed its own proprietary specification under the brand name Biztalk and Internet Explorer currently supports a reduced version of this known as the XML-DR (Data-Reduced Schema). OASIS, the Organisation for the Advancement of Structured Information Standards, is an XML interoperability consortium committed to the ratification of Schemas such as the BASDA eBis initiative. Please see the OASIS Web site listed at the foot of this article for further details.

XSL Formatting Objects and Transformations

XSL, eXtensible Stylesheet Language, is separated into two parts. XSL Formatting Objects presides over traditional formatting of a document such as selected font types and weights whilst XSL Tranformations enables the manipulation of content held within an XML document in order to create new documents.

Style sheets, which can be embedded within an XML document or can be linked to it, define the rules by which a document is formatted and are necessary to display the content of an XML document via a Web browser or other output device. Currently, there are three ways to display an XML document via a Web browser – using the browser’s default style, using CSS or using XSL.

Although XSLT (XSL Transformations) as a recommendation has been ratified, XSL Formatting Objects is still a working draft. As a result, the most usual way to output an XML document to the Web is using CSS coupled with XSLT.

The main drawback to using CSS is the fact that there is no provision, as there is with XSL, to sort or filter data elements held in an XML document prior to display. Therefore data elements need to be marked-up in the XML document in the order they are required to be output.

XSLT can also be used to transform XML in order to create a new XML document.

XLink and XPointer

As with the XSL specification, the linking specification is divided into two parts – XLink to link externally and XPointer to link within a document. Although the XLink specification is still at the draft stage it will enable simple, bi-directional and multi-directional links allowing users to return to the document from which they just came, choose from a list of destinations and open a document in a new browser window. It will also enable the possibility of assembling a document from data held in different XML source documents. XPointer will provide the ability to link to any location within a document without the need to have named your target location.

In addition to the specifications mentioned above, the W3C standard XML DOM (XML Document Object Model) together with SAX (the Simple API for XML) have been developed to provide software applications with easy access to content held within XML documents.

XML Searching

Seasoned searchers of professional online services, have come to expect a degree of precision and accuracy in our search results which cannot currently be matched by any of the first or second-generation search engines. We search, often in vain, for information we are convinced must be available on the Web and can waste valuable time and money in the process of poring over page after page of irrelevant listings trying to pinpoint sites that meet our exact requirement.

Data held in static HTML page format is currently more difficult to locate and extract than information held in the professionally structured database systems we are so used to. The widespread uptake of XML technologies by publishers and ordinary businesses will eventually provide us with a more efficient means of extracting relevant information more quickly from Web-based sources.

Interoperability

The World Wide Web has produced an even playing field whereby, regardless of an organisation’s internal systems, it is able to communicate in an extremely simple manner with the world at large. However, it is no longer enough to have a Web presence – the world and his wife has one of those – organisations now need to be looking at integrating their shop-window with their backend processes. As businesses in general begin to look at interoperability and the possibility of supply chain automation, the creation and implementation of e-business strategies will necessitate a closer look at how an organisation does business.

The XML specification reaches far beyond the World Wide Web and can be used to structure, store and disseminate data between many different computing systems. Workflow analysis, and the establishment of basic document management policies – particularly electronic document management policies, will therefore be crucial to this process.

As more and more businesses seek to embrace e-Business solutions, the creation, storage, exchange and retrieval of electronic data is becoming fundamental to all business processes. The scrutiny of individual working practices, corporate culture, industry standards and the way in which information is exchanged is key to progress and a greater understanding of our business environments.

To date, input to and output from the various computing platforms have taken significantly different formats and vast resources have been employed in the physical re-keying or manipulation of data destined for another system. At first sight, the confusing mass of data the process is expected to uncover seems daunting but, with the introduction of XML, elements can be assembled without human intervention in a number of meaningful ways to convey contextual information upon which informed decisions may be based.

The beauty of XML is the fact that it obviates the need to reinvent the wheel. Data can be captured once and repackaged according to the intended purpose, audience or output device – be it Web browser, WAP phone, PDA, WebTV, computer application or whatever else the future throws up.

HTML dictates the layout of the data whilst XML conveys the meaning. Data stored in the XML format constitutes both ‘smart data’ and a ‘smart document’ enabling applications to carry out data and document processing at the same time. With XML you create your own tags so that they can be applied to any business or industry process rendering the possibilities endless. This degree of flexibility lends itself to applications ranging from science to commerce and from multimedia to the more sophisticated applications such as Microsoft’s .NET.

Since the data is reusable, much time could ultimately be saved in both internal and external day-to-day data exchange operations as well as when migrating to future business applications. This allows effort to be directed in more profitable directions.

Despite the obvious benefits, however, the changeable nature of XML content could compromise the integrity of original documents since XML documents are virtual data repositories and can be used to generate any number of other documents. Conversely, documents can also be assembled from data held in a number of other XML documents. It is essential, therefore, that any future electronic document management system has a means of creating and saving a snapshot in time of an organisation’s critical transaction documents as opposed to saving the base data elements.

The Future of XML

Architects of the original XML specification saw the future of XML as a system-independent document exchange facility and provided for the use of DTD (Document Type Definitions) in order to validate the format of an XML file. In practice, however – and far more importantly – XML has become a common data exchange system with XML documents being handled by applications directly. This means that instead of being parsed and saved according to rules set out in a DTD, application’s with in-built automated parsing require the more powerful XML Schemas to define the structure, content and semantics of XML documents – a precurser to Tim Berners Lee’s vision of the ‘semantic web’.

After much debate, XML Schema was finally released as a full W3C Recommendation in May last year and is available for use by developers and organisations alike. Whilst many are still using DTDs, it is likely that XML Schema as a means of defining data types will become integral to e-business frameworks in the future and a significant number of industry-specific XML Schema have already been deposited with OASIS (www.xml.org), who intend to create an open registry and provide access to published Schema on a licensed basis.

Whilst XML-based applications are not confined to use with Internet and Web-related applications, they are widely heralded as key to the next-generation of Web services where XML interfaces will facilitate seamless data interchange.

BIC, the Business Internet Consortium, is a non-profit generating consortium formed by 20 leading technology firms, including IBM, Intel and Microsoft to look at e-business technology priorities. The consortium is forming workgroups to address the key issues concerning all aspects of e-business implementation in order to facilitate the transformation of traditional business practices into e-business practices and views XML as an important technology for enabling the exchange of electronic content via multiple output devices.

XML Skills

For the XML novice, wading through jargon and acronyms is no fun but, for those who are committed to bringing order and structure to a complex world and who can demonstrate the discipline required to impose structure to their information, I believe the prospect of enhanced communication will be the reward.

Thousands of non-computer literate individuals, having already grasped the point and clickability of the Internet, have migrated to the happy band of self-publishers on the Web and, as XML editors and other software tools continue to emerge, I am confident that the Web community will evolve to the point where a knowledge of the XML mark-up language becomes the norm.

Understanding the benefits and drawbacks of what XML has to offer will be crucial to decisions concerning the formulation and implementation of Electronic Document Management policies as well as future e-business strategy.

Organisations will continue to look for ways to streamline their business processes and the widespread uptake of XML will enable them to do so whilst at the same time reducing the need for human intervention in the more mundane administrative processes. XML developers, I believe, will become an indispensable asset for the foreseeable future.

Conclusion

The biggest problem facing businesses today is how to get their staff, customers, suppliers and partners to share information for the greater long-term good and profitability of the organisation. In the final analysis, joining business processes together rather than merely linking documents or Web pages is central to a successful e-business strategy but, as with all things complex, in order to achieve this we need to start with the capture, storage and manipulation of the basic data elements. So far the only universal solution to emerge is the XML-enabled technologies, which promise to transform enterprise structure as well as relationships within and between companies.

In addition to the promised economies of scale, one of the great benefits will be the ability to export records and all metadata intact for transfer to another record-keeping system – ensuring in-built future proofing since, once input, data elements can be utilised by upgraded existing or future hardware or software platforms as well as re-purposed to meet as yet unimagined requirements. In time this will significantly reduce the cost of system upgrades and will also minimise the risk of getting it wrong.

Since in future documents are likely to become nebulous entities generated on the fly from a database of basic data elements, we should not only explore the optimum manner of storing data in the most flexible and reusable state but should not lose sight, for ordinary business, regulatory or legal purposes, of the fact that certain whole documents should be saved securely and as easily accessible as their hard-copy counterparts, and not just their electronic data components, would have been.

Whilst HTML has provided a simplistic but effective means of communicating worldwide, the technology now exists, in the form of XML, to enhance our ability to communicate by the integration of all our business processes and I believe it will not be long before we see its widespread uptake and use.

Keep Informed, www.keepinformed.com, specialises in the provision of Internet related training, consultancy and recruitment. A full training portfolio, including courses on HTML, CSS and XML as well as e-Business Strategy, is available from the company’s Web site.

Useful XML Resources

www.iso.ch/ information on SGML

www.w3.org/MarkUp/ information on HTML

www.xml.org information on XML

www.oasis-open.org OASIS and XML-DEV mailing list

www.biztalk.org Microsoft’s XML Schema

www.xmlsoftware.com links to XML tools