Title: On The Same Page With XML.
Subject(s): XML (Document markup language); APPLICATION software
Source: InformationWeek, 05/08/2000, Issue 785, p207, 4p, 1c
Abstract: Deals with the growing popularity of Extensible Markup Language (XML) in application. Benefit offered by XML in data exchange between applications; How XML can be a central technology for E-commerce; Web sites offering information on XML.
AN: 3083602
ISSN: 8750-6874
Database: Academic Search Elite

Section: APPLICATION DEVELOPMENT EXTENSIBLE MARKUP LANGUAGE

ON THE SAME PAGE WITH XML

FOR THE STANDARD TO REACH ITS POTENTIAL, EVERYONE MUST AGREE ON HOW TO USE IT

As a nonproprietary, standards-based communication medium, the Extensible Markup Language is making inroads into virtually every part of application development, where transport and architectural requirements are increasingly being transformed by the overriding need for dynamic content delivery. Because it is a text-based data type, XML is lightweight and small, making it an efficient way to transport dynamic content, both in distributed applications in general and for Web-based applications in particular.

Data exchange between applications can be done more efficiently with XML. And it enables interactions across platforms that otherwise share no common data formats. This makes it especially useful in distributed application development, in which applications are split into tiers of logically related code that usually execute on different machines, often running different operating systems. Because XML is platform independent, the different tiers can exchange data using its features.

XML is also being used to control and execute code in the different tiers, acting as the glue that integrates application components. In this role, it acts as a gateway between autonomous, heterogeneous systems. (For a look at this use of XML, see Don Box's "Lessons from the Component Wars: An XML Manifesto" at msdn.microsoft.com/xml/articles/xmlmanifesto.asp.)

But for XML to reach its fullest potential, everyone must agree to use the language in the same ways. This means that both the sending and the receiving parties must agree to use a particular schema, a way of describing data using XML notation. To accomplish this, applications will need additional standards from the World Wide Web Consortium (W3C), some of which are just in the early proposal stages.

XML will fully blossom into a valuable technology for E-commerce when industry groups work together to define XML schemas to describe industry data. For example, participants in an industrywide E-marketplace might define a schema with the specific tags needed to encapsulate information on the customers, products, and services that form the basis of their transactions. These schemas would have to be agreed upon and shared by the people and companies that use them.

One such initiative is Microsoft's BizTalk (www.biztalk.org), a platform for launching the company's BizTalk Server. Microsoft is defining the BizTalk Framework, a set of guidelines for publishing schemas in XML and using XML messages to integrate software programs. The BizTalk Server, available as a technology preview, provides Internet application integration using XML and the BizTalk Framework. It includes secure and reliable data delivery, routing and transformation of business documents, and development tools for existing applications.

The Mathematical Markup Language is one of the earliest XML schemas. It was developed to describe mathematical notation, including structure and content, letting content be served, received, and processed on the Web.

The Organization for the Advancement of Structured Information Standards sponsors www.xml.org. This site features a wealth of information about XML and its uses, as well as an impressive catalog of XML schemas by industry group. The site provides a registry and repository for XML schemas and other public resources. Industry groups can register their XML data-exchange specifications, individuals can look for specifications in their areas of interest, and applications can access the XML resources needed to act on an XML document.

XHTML is the latest of recent XML-related standards work published by the W3C (see story, p. 210). In early April, the consortium published the latest working draft of XML Schema, which provides a means for defining the structure, content, and semantics of XML documents, using an inventory of XML markup constructs. This working draft arose from several proposals received by the W3C, including XML-Data, which was spearheaded by Microsoft, Document Content Description for XML, and others.

The XForms 1.0 working draft defines the next generation of Web forms that capture the device-independent data model and logic of form-based Web applications. These new forms will encompass an explicit data model that defines the form as a composite data type with constraints on values. The model also defines the user interface, expressed as a set of presentation controls bound to the data model, and the use of XML and Unicode for exchanging form data with servers.

The Extensible Stylesheet Language, which is used to express style sheets on the Web, is nearing completion as an official W3C recommendation. The two parts of XSL provide a language for transforming XML documents and a vocabulary for specifying formatting.

The W3C has received a number of proposals related to XML that it has published as Notes, which are preliminary documents meant to begin discussion. Recent Notes include XML Japanese Profile for expressing Japanese characters in XML, and the XML Document Navigation Language for content navigation in a variety of devices.

The W3C is likely to stay busy for a long time with work on extensions to XML, a sign many observers construe as a good one for XML's continuing health and longevity.
--Don Kiely

XHTML: A Bridge To The Future

THE W3C'S RECOMMENDATION BLENDS XML AND HTML TO PRODUCE EXTENSIBLE WEB-PAGE FORMATTING

Hypertext Markup Language, an aging, inflexible formatting standard, has fueled the phenomenal growth of the Web. Now a new technology, a flexible data-markup standard called Extensible Markup Language, promises nearly complete flexibility. In a flash of brilliance, the World Wide Web Consortium (W3C) has combined HTML and XML into the new XHTML recommended standard, which reformulates HTML 4.02--the latest version--with XML document type definitions (D).

HTML is the language behind one of the fastest, most widespread technology adoptions ever. Derived from Standard General Markup Language (SGML), HTML is simple to learn and reasonably flexible for formatting text and graphics, but it doesn't have the extensibility to adapt to dynamic Web applications. Most every site with valuable content is more of a Web application than a Web site, requiring code components, multimedia effects, and other features that strain the limits of HTML.

HTML is usually extended by innovations in a single browser, usually Microsoft's Internet Explorer or America Online's Netscape Communicator, and these changes gradually make their way into other browsers. Inevitably, the implementations are different enough that Web authors have a tough time making their sites viewable from different browsers, much less older versions of those browsers. The more popular extensions eventually make their way into the group's HTML standard--frames and scripting languages, for example.

In the last couple of years, XML has been taking the Web by storm. Whereas HTML formats and presents information, XML marks up data so that the individual pieces of information on a Web page are identified as being of a particular type. In a bank's data, for example, $4,562.03 is marked as the outstanding balance of a customer's loan, and $123.90 as the monthly payment, identifying them as particular kinds of data points. Without XML, these would be just two character strings in a sea of text on a Web page. XML provides metadata--data about data.

The most important feature of XML is the "X." HTML has a fixed set of tags, but with XML you can create multiple namespaces that define custom tags. Industries can band together and create namespaces that facilitate the exchange of information.

Continuing the bank example, <Balance> and <Payment> can identify the two character strings as being specific types of information. This facilitates exchanging data between applications and computer systems, limiting the need for expensive, complex data-conversion programs.

The XHTML recommendation was published by the consortium on Jan. 26, and refers to XHTML as "a bridge to the future." This new standard promises to make the Web more adaptable to existing and future uses, without forcing a wholesale redesign of millions of Web sites. Backward compatibility was a major concern of the W3C committee, and by staying faithful to the HTML 4.0 standard but extending it with XML, the consortium has created a very flexible technology for building Web sites.

According to the W3C, XHTML offers three major advantages to Web-site developers: extensibility, portability, and modularity. XHTML is extensible, as it adds new elements to HTML without altering the entire D that the document is based on.

The second major advantage is portability, sometimes referred to as interoperability. Most Internet access is through browsers on desktop computers, though more and different types of devices are being introduced. Some of these devices, such as cell phones, won't have the processing power of a desktop PC, and browsers on them will be less tolerant of any malformed markup used to render a document. XHTML is designed to make Web documents accessible and interoperable across platforms, in part by enforcing a rigorous coding standard.

Modularity made it into the specification late in the process, and acknowledges the growing role that the Web is playing in handheld devices. Browsers on these devices won't need all XHTML elements, so XHTML allows subsets of elements. The primary focus for the next version of XHTML, already called XHTML 1.1, will be modularization.

The semantics of XHTML elements and their attributes are defined by the current HTML 4.02 specification. XHTML 1.0 specifies three XML document types that correspond to the three HTML 4.02 Ds: Strict, Transitional, and Frameset. These XHTML Ds are more restrictive than HTML because XML is more restrictive in its syntax. The table below lists the three Ds and the DOCTYPE tag for each. The recommendations describe a number of ways in which XHTML and HTML differ--some due to the relative sloppiness allowed by most browsers when rendering HTML, and others because of the way XML does things. Most of the issues involved with displaying HTML pages as XHTML are a result of XML's validation and conformance requirements. An XHTML document must be structured properly, and elements that HTML doesn't require must be included in an XHTML document.

As with any new technology, a Web author has to decide whether to migrate pages from HTML or start from scratch and take full advantage of XHTML. There are a number of benefits to upgrading, as well as some major pitfalls.

Because HTML is a pervasive standard and XML is becoming one, users can view carefully crafted XHTML documents in current versions of many browsers. XHTML supports three main media types supported by most browsers: text/ html, text/xml, and application/xml. Any scripting code that uses the HTML or XML document object models will work well in the new format.

As this new standard is more widely adopted, Web editors are supporting XHTML; some will automatically convert existing pages. Code translators have long been the Holy Grail of computer science, but there is a reasonable chance that HTML to XHTML tools will actually work reliably. Sloppy HTML code, acceptable to many browsers, will translate poorly in some cases. The new code must then be validated against the HTML 4.02 compatibility guidelines. Following these guidelines means that the same code base can be used with XHTML-compliant browsers as well as those supporting straight HTML, as long as you avoid using new tag definitions. There are various tools listed on the W3C's XHTML Web site, such as HTML Tidy (www.w3.org/People/ Raggett/tidy), HTML Kit Web editor, with support for HTML Tidy (www.chami.com/html-kit), and WDG HTML Validator (www.htmlhelp.com/ tools/validator/batch.html).

Despite the best efforts of the consortium, HTML has evolved in a less-than-orderly fashion. Because HTML is itself not extensible, browser vendors have added tags rather haphazardly. HTML has evolved at a faster pace than any standards body could possibly keep up with, so the HTML standard is mostly a codification of existing practice, rather than a source of innovation. As a result, any given HTML authoring tool supports--at best--a snapshot of HTML tags in use at a given time, no matter how fast the tool's author tries to keep up.

Unfortunately, this means that today's favorite HTML editing tool may not be the tool of choice tomorrow, when XHTML becomes the norm--unless, of course, you write HTML in a plain text editor; then you're in fine shape to write new code. XHTML is too new for any of the major players to have made a commitment to support it. But with the rapid spread of support for XML, it would be surprising if all the major editors didn't rush to implement support for XHTML.

During the transition to XHTML, validating code will be one of the biggest challenges. Validation is a process that verifies documents against the associated D, checking to make sure that the structure, elements, and attributes are consistent with the definitions in the D. Validating an XHTML 1.0 document involves verifying its markup against one of the three XHTML Ds.

The W3C has an HTML Validation Service that's based on an SGML parser, with options such as including Weblint debugging results and displaying the parse tree. The good news is that when HTML compatibility guidelines are followed, XHTML 1.0 documents can be rendered on HTML 4.0-compliant browsers. One way to use the W3C validator is to place a link to validator.w3.org/check/referrer on your Web page. Clicking the link with your page loaded validates your page.

XHTML 1.1 is under development, and should make this next stage of Web technologies even more flexible. XML and HTML have a lot to offer each other. XML isn't the "HTML-killer" it was once touted to be, but when teamed with its alleged victim, it promises to take over the Web.
--Don Kiely

Defining XML Schemas

www.biztalk.org Defining guidelines for publishing schemas in XML

www.xml.org A registry and repository for XML schemas and other public resources

DATA: INFORMATIONWEEK

XHTML Document Type Definitions

XHTML 1.0 Strict

Use when you're doing all of your formatting in cascading style sheets and not using <font> and <table> tags to control how the browser displays your documents.

<!DOCTYPE html

PUBLIC "-//W3C//D XHTML 1.0 Strict//EN""D/xhtml1-strict.dtd">

XHTML 1.0 Transitional

Use when you need presentational markup in your document, so that you don't limit your audience to users with browsers that support cascading style sheets.

<!DOCTYPE html

PUBLIC "-//W3C//D XHTML 1.0 Transitional//EN"

"D/xhtml1-transitional.dtd">

XHTML 1.0 Frameset

Use when your documents have frames.

<!DOCTYPE html

PUBLIC "-//W3C//D XHTML 1.0 Frameset//EN"

"D/xhtml1-frameset.dtd">

DATA: WORLD WI WEB CONSORTIUM

Tags And Attributes In XHTML

Requirements

A strictly conforming XHTML 1.0 document is restricted to tags and attributes from the XHTML 1.0 namespace. Such a document must meet some exacting requirements: It must validate against one of the three document type definitions (Ds); and the root element of the document must be <html> and designate an XHTML 1.0 namespace using the xmlns attribute. There must be a DOCTYPE declaration in the document prior to the root element and, if present, the public identifier in the DOCTYPE declaration must reference one of the three required Ds.

Minimal XHTML

The following code, taken from the XHMTL proposed recommendation, is an example of a minimal XHTML 1.0 document:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE html

PUBLIC "-//W3C//D XHTML 1.0 Strict//EN"

"D/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<head> Virtual Library

</head>

<body> Moved to <a href="http://vlib.org/">vlib.org </a>.

</body>

</html>

Some specifics

XHTML documents must be well formed, strictly complying with syntax rules. Tags must be nested properly and all tags must either have closing tags, or be written in a form that combines the opening and closing tag.

The <head> and <body> elements cannot be omitted, and the element must be the first element in the <head> element.

Nested tags. Tags must be nested properly and all tags must have closing tags or be written in a form that combines the opening and closing tag.

Empty elements. Empty elements must either have an end tag, or the start tag must end with />. This is sometimes called a self-terminating element.

Element and attribute names. XML is case-sensitive, and the XHTML Ds, element, and attribute names must be in lower case. All attribute values must be quoted in single or double quotes:

Nested elements. Elements must also be properly nested, so that closing tags must be in reverse order of the opening tags.

No minimized attributes. XML, and therefore XHTML, does not support attribute minimization.

Script and style tags. Because any < and & characters are considered parts of tags in XHTML, any script and style tag sections must be wrapped in a CDATA section to otherwise ignore characters that would normally be considered markup. The only delimiter that's recognized in a CDATA section is the "]]>" string that ends the section. You can also use external script and style documents to solve the problem, as illustrated in the following example:

<script language="JavaScript">

<!--

<![CDATA[

// JavaScript code

]]>

//-->

</script>

PHOTO (COLOR):

"X" IS FOR EXTENSIBLE: The opening page at www.cmp.com is shown in before and after its validation by HTML Tidy, a free HTML editor and validation tool that converts HTML to XHTML.



Copyright of InformationWeek is the property of CMP Media Inc. and its content may not be copied without the copyright holder's express written permission except for the print or download capabilities of the retrieval software used for access. This content is intended solely for the use of the individual user.
Source: InformationWeek, 05/08/2000, Issue 785, p207, 4p, 1c.
Item Number: 3083602