Title: XML Xtends Its Reach.
Subject(s): XML (Document markup language); WELLS Fargo & Co.; HTML (Document markup language)
Source: Computerworld, 10/18/99, Vol. 33 Issue 42, p77, 3p, 1 diagram, 1c
Author(s): Johnson, Amy Helen
Abstract: Deals with the benefit gained by Wells Fargo & Co. in putting its service information into a database of Extensible Markup Language (XML)-tagged documents on the company intranet. Features of XML; Challenges faced in converting Hypertext Markup Language (HTML) into XML version; Purpose of companies for choosing XML option.
AN: 2431398
ISSN: 0010-4841
Database: Academic Search Elite

Section: TECHNOLOGY

FIELD REPORT

XML XTENDS ITS REACH

XML finds favor in many IT shops, but it's still not right for everyone

EMPLOYEES NEED several hundred pages' worth of products, policies and procedures to service customers of San Francisco-based Wells Fargo & Co. But the in formation in those pages changes frequently, so if it can't be updated easily, it's virtually useless.

Enter XML.

Robert Bean, vice president at Wells Fargo's Minneapolis-based institutional trust division, says the bank solved its updating problems by putting service information into a database of Extensible Markup Language (XML)-tagged documents on the company intranet. An employee who needs the latest policy or form simply aims his Web browser at the online manual. "The most current version is resident in one spot," says Bean. That means employees make fewer mistakes than before.

Content management is one of the things XML does best. Nearly every large company interested in messaging, component technology or the Internet is building XML applications, says Mike Gilpin, an analyst for application-development strategies at Giga Information Group Inc. in Cambridge, Mass. But early adopters are finding that today's XML picture isn't all rosy; the current state of XML standards and applications is about where the Internet programming language HTML was years ago, and that's not saying much.

Unlike HTML, XML makes it easy to quickly locate and reuse data. An XML listing in a catalog might label tags denoting the manufacturer's name, product name, product size, composition, shipping weight and price. You not only have the actual data, but you also know what that text means. Other applications can access that catalog and use the same information.

That's a marked contrast to HTML, which can describe only how to display the content. There is no difference among pieces of content.

That's why manipulating content in an HTML environment -- to repurpose it, search it and display it in different formats -- is so difficult. XML offers the self-describing capabilities that can solve that problem. A catalog listing can be repurposed to select all instances of a particular product, select weights and prices of each and perform a cost-per-pound comparison.

Wells Fargo's XML application is relatively straightforward. But XML's ability to act as a universal framework for swapping data among applications is a hot topic in information technology shops these days.

Gilpin says corporate moves to XML for application integration are "solidifying quite rapidly." Companies that might have traded data via commadelimited ASCII files a year ago might use XML today.

Unfortunately, the amount of HTML-to-XML conversion needed to make XML useful as a corporate data repurposing tool is staggering. There are few native XML parsers, the tools that read tags and use the data they contain intelligently, so it's not easy putting XML on the screen. Most applications still need an extra conversion step to translate XML into HTML before use.

That extra step in HTML translation can mean slower performance. And because the object databases that store XML data and tools aren't as well-tuned as their database and HTML cousins, performance can again be a problem.

XML standardization efforts haven't really caught on, organizations find it easier to create custom tags and just map data to achieve interoperability.

At Chipshot.com, a custom golf equipment manufacturer in Sunnyvale, Calif., using XML tags to identify the different pieces of its Web-site content made it easy to create a second site for Japanese-speaking customers. Tagging let Chipshot choose only the items that needed translating, says Nick Mehta, vice president of marketing. Now the company can just as easily create a site in Spanish or German, he adds.

Chipshot chose the XML option because only certain information had to be in Japanese, so it was inefficient to give all the pages to a translator. "If we had to manually maintain two current versions of the site, it wouldn't have been feasible," Mehta says.

In Wells Fargo's case, an XML intranet was the best way to disseminate frequently changing information to 1,200 employees in 21 states. The first attempt at supplementing the binders was to build a static HTML Web site, but information was often outdated by the time it was posted, says Ben Moore, a managing associate at New York-based Micro Modeling Associates Inc., the consulting firm that helped build Wells Fargo's policies and procedures site. And there was yet another twist: Wells Fargo wanted to present a different view of the content based on specific roles in the organization. A branch manager might see activity report forms that go to upper management, while customer service representatives might see blank loan applications. Such dynamic capabilities are better suited to dynamic XML content stores than static HTML, Moore says.

Employing consultants like Micro Modeling Associates is common practice among corporations that are starting to use XML, as is the use of third-party products designed for nondevelopers. Enterprises were once stuck with using basic text editors to create XML content, much like the early days of HTML. Nowadays, they can choose from several user-friendly tag editors such as Burlington, Mass.-based Arbortext Inc.'s Adept, San Jose-based Adobe Systems Inc.'s Framemaker+SGML and Toronto-based SoftQuad Software Inc.'s XMetaL.

Once the tags are added, applications that can use that data are needed, which fuels the rise of another XML product category: content repositories and integrated development environments. In this space are Burlington, Mass.-based Object Design Inc.'s eXcelon (which Wells Fargo uses), Boston-based Inso Corp.'s DynaBase, San Mateo, Calif.-based Poet Software Corp.'s Content Management Suite and San Diego-based Chrystal Software Inc.'s Astoria, among others.

Major vendor support is critical for XML to achieve corporate acceptance, and it's happening faster than many predicted. Microsoft Corp. and IBM, for example, are pushing hard to own the XML market. IBM has a wide range of XML-enabled products, from its WebSphere application server to the XML Productivity Kit for Java, for writing XML processing applications in Java. Microsoft is working on the BizTalk Server -- an XML processor that parses the XML, maps tags and sends the data to an application -- and has enabled a wide range of products such as SQL Server and Office 2000 for XML.

Ingenta Ltd., a vendor of scientific, medical and technical articles in Oxford England, adopted XML for its online publishing architecture, choosing DynaBase as its repository and development system. Mark Rowse, the company's CEO, says a key benefit to using XML is the ability to pass data objects among different subsystems within the Ingenta service. "There's much less conversion," he says. "It's easier to get modules to talk to each other."

Passing XML data objects is also used to integrate back-office business-to-business systems. Atlanta-based Clarus Corp. sells a commerce suite and procurement system that relies on XML documents and Microsoft's Message Queue Server to share information among customer-facing systems and enterprise resource planning applications.

Portsmouth, N.H.-based Bowstreet focuses on a slightly different business-to-business niche: extranets. Not only do applications within the enterprise need to share data, but a company may want to share data with partners and customers. That's the case at Milpitas, Calif.-based NetRatings Inc., which measures audiences for Internet sites. The company stores its raw data in an Oracle database, generates reports that the Bowstreet package converts into XML, then sends the data objects to their customers.

The advantage of XML, says NetRatings CEO Dave Toth, is that his customers -- Internet advertising agencies that track the reach of their online ad campaigns -- can import the audience information into their internal applications for processing. Advertising agencies then massage the data and present reports to their customers.

Software vendor WebMethods Inc. in Fairfax, Va., also uses XML as the common information description language in its business-to-business application integration product. Like other XML packages, it relies on XML standards to avoid interoperability problems. These efforts generally focus on two pieces of the XML puzzle: the XML framework, which is grammar for the XML language, and industry-specific tags, which are the industry's dictionary of XML tags that comprise the framework.

Standards groups abound to deal with these problems from an application-integration and e-commerce perspective: The Organization for the Advancement of Structured Information Standards; Microsoft BizTalk; Rosetta-Net, a nonprofit organization in Los Angeles; and Open Applications Group are among the largest.

At the same time, HTML itself is getting an overhaul to become more XML-like. Last August, the World Wide Web Consortium (W3C) issued XHTML 1.0 as a proposed new standard. W3C has targeted XHTML as a chief migration tool for bringing the vast archives of HTML documents into the XML world; pages developed in XHTML 1.0 instead of the current HTML 4.0 can be processed by standard XML tools without becoming completely useless to older HTML-only technologies.

But many corporations today are building effective XML utilities without waiting for standardization. Moore points out that it's easy to map one application's tag set to another if tagging standards change. "XML is flexible enough, as long as I follow a high-level set of rules," he says.

Gilpin agrees. Companies aren't looking for external definitions of XML tags, he says. They're building their own sets of tags for internal use and translating them to communicate with external partners, he says.

Although enterprises are gaining benefits from adopting XML, they have also encountered some pretty thorny problems. One of the worst: figuring out how to get the information that's stored in a variety of document formats converted to XML-tagged documents. At Wells Fargo, Moore says, technicians ended up using brute force to bring data into XML; they simply typed everything into an XML format. Such labor-intensive conversion methods can hold back XML adoptions for many IT shops. XML has barely penetrated the vast established base of Web browsers and other HTML tools, which means XML data may not display properly in all applications. Older browsers don't natively render XML, points out Don Klein, Ingenta's manager of new media services. Ingenta still has to convert XML to HTML on the fly. That's one feature to look for in an XML application server. DynaBase does the conversion for Ingenta, Klein says. "The trick is how well the system does it, and so far we have been pleased."

Then, too, says Klein, overall response time of XML applications slows as the data store grows larger. Object databases, which store XML content, aren't as highly tuned as standard SQL-based relational database management systems. "Using XML is not necessarily proven in terms of performance," he says. Ingenta is trying to speed things up with DynaBase's caching features.

NetRatings also suffers from performance problems. It's an inherent problem when dynamically rendering XML pages, Toth says: "Computers only compute so fast."

Most say these problems will disappear as XML settles in. IT executives like Wells Fargo's Bean are enthusiastic about XML. "We have gotten what we were looking for," he says. "[We have] one place to look up the procedures and forms to do something."

Ingenta Ltd.

Primary goal: Build Web pages dynamically from multiple XML-tagged sources.

Greatest benefit: Repurposing. Using XML allows Ingenta to present the same content in different formats for different audiences.

Tool box: Sun Microsystems Inc.'s Enterprise 450 server running Solaris 7 operating system; Netscape Communications Corp.'s Web server; Boston-based Inso Corp.'s Dyna-Base XML content management system.

Alternatives considered: Ingenta evaluated many relational database management systems, such as Austin, Texas-based Vignette Corp.'s StoryServer. Ingenta rejected Story-Server, says Don Klein, manager of new-media services, because it didn't work with the multiple media types Ingenta uses.

Biggest hurdle: Content must be converted from XML to HTML to work with many browsers.

Ingenta Ltd, in Oxford, England, has long been the place to go to get electronic versions of medical, technical and scientific research in the U.K. But the front end of the service was written in the days before the Web became the pre-eminent method of publishing online, says CEO Mark Rowse. Its inflexibility was

limiting the company's ability to make the interface user-friendly. Plus, Ingenta had new markets it wanted to enter, such as vertical portals for specific scientific disciplines - nutrition, veterinary science and so forth - and electronic communities for scientific societies. The idea, says Rowse, was to comb through its existing stores of abstracts, journal articles, conference proceedings and technical papers for information relevant to a narrow topic and put it on-screen. What the company needed was a way to store and aggregate content that could adapt to different presentation formats.

Ingenta's solution was to mark its presentation-level information (help files, editorials and so forth) with XML tags and store it in DynaBase, a content management system from Inso Corp. All the presentation elements (templates, GIFs, JavaScript) are held in DynaBase's object database, along with the presentation-level content files. When a page is requested by an end user, DynaBase converts the XML content to HTML, combines it with presentation elements and passes the package to Ingenta's Web server for display.

XML, says Klein, can manage large amounts of content dynamically. Because Ingenta can identify what a particular piece of text is - title, author name, abstract, bibliography, equation, article body - it can pick out particular items from the database and easily repackage the information. Klein says Ingenta can combine information from a general portal on veterinary medicine and a specific one on animal nutrition to create a new one on keeping cats healthy.

Ingenta spent more than six months looking at various options before deciding on an XML-based object-repository system, says Rowse. According to Klein, the most viable option to XML was an RDBMS, such as Vignette's StoryServer. RDBMS works well in high-volume Web sites, but it doesn't offer the flexibility to work with multiple media types, which was a must-have for Ingenta, Klein says. Although the company would put the presentation content into XML - the items that appear on the upper-level Web pages - the meat of the site is print sources from scientific publishers, which send information to Ingenta in a variety of non-XML electronic formats, such as abstracts stored in database files and research journals in PDF.

In order to retrieve this content from nonXML data stores and combine it with the XML-tagged presentation content, Ingenta built an interface into the non-XML data stores using primarily Java and Java Database Connectivity. Ingenta leverages XML messaging abilities at this point, using XML messaging to pass search requests and results back and forth between DynaBase and the legacy systems.

Ingenta built its XML system using three full-time people, Klein says -a C++ programmer; a Web developer working on Common Gateway Interface scripting, HTML and the like; and a systems administrator to maintain the server. Klein says that in addition to Java and Java Database Connectivity, Ingenta used DynaBase's scripting language, DSL. He describes it as "quick and simple," similar to Visual Basic and JavaScript in syntax. However, he says he wishes the scripting components could be handled directly by an open-standard language, such as Java or JavaScript.

Klein says XML allows Ingenta to combine published content with user-generated content, such as threads from Ingenta-hosted discussion groups. The company has a custom software utility that takes an e-mail sent to one of its discussion groups and tags the contents with XML before storing it. Often these threads, which consist of exchanges among leading scientific researchers, are an education in themselves. By tagging them with XML, Ingenta can easily access the content and combine it with journal articles for a richer information stream.

Ingenta has had some problems, though. One major consideration was that few browsers natively render XML, so the Web pages had to be converted into HTML before displaying them for end users. DynaBase takes care of this step, Klein says. Ingenta has also defined most of its own XML tags, using standards only for rendering equations, such as in math and chemistry articles.

Object databases like DynaBase don't have a proven history of high performance, like relational database systems, Klein says. But Ingenta had overriding flexibility needs and was willing to serve the page a little slower in order to get the ability to combine content from multiple sources and repurpose it at will. Klein says DynaBase's caching takes care of some performance issues.

Klein says an XML-based package is so vendor-neutral and flexible that Ingenta could change content management applications, add new data sources and launch dozens of portals without making significant changes. "It makes it future-perfect for us," he says.
-- Amy Helen Johnson

JUST THE FACTS

Pros and Cons

Why XML is a good idea:

Its self-describing tags identify what your content is all about

Data is easily repurposed via tags

Creating, using and reusing tags is easy, making XML highly extensible

XML data types map easily among different applications, so it's very interoperable

It makes transferring data easy; simply give it XML tags

Why it may not be such a good idea for your application:

Majority of online browsers still see only HTML; you will need to add an XML-to-HTML translator

Performance is still slower than equivalent HTML documents

The XML-tagged document is still rare; you will likely be doing a lot of conversion of older data

Standard tag sets for different applications and industries aren't in widespread use yet

DIAGRAM: What's the Difference Between HTML and XML? As shown below, either HTML or XML can be used to create what you see in the browser window. And at first glance, HTML might look much simpler. XML's usefulness, however, comes into play when you look at the greater whole. Just think of Web search engines. If you're looking for ice cream with a certain retail price threshold and particular ingredients, an XML search engine could carry out such a request, while an HTML search engine could do only a brute text search. Because XML describes data, it can support a wide variety of uses for that data.

PHOTO (COLOR): BEFORE USING XML, Wells Fargo employees kept information in several binders. Now, Robert Bean. vice president of product development, says, "The most current version is resident in one spot"

~~~~~~~~

By Amy Helen Johnson

Johnson is a freelance writer in Seattle.


Copyright of Computerworld is the property of Computerworld and its content may not be copied without the copyright holder's express written permission except for the print or download capabilities of the retrieval software used for access. This content is intended solely for the use of the individual user.
Source: Computerworld, 10/18/99, Vol. 33 Issue 42, p77, 3p, 1 diagram, 1c.
Item Number: 2431398