Developments in XML Schema Languages June 2004


A joint meeting of XML UK and W3C Office for UK and Ireland

Developments in XML Schema Languages
Thursday 24 June 2004
Rutherford Appleton Laboratory
Didcot, Oxon UK

Conference Chair: Michael Wilson, W3C Office for UK and Ireland

The W3C XML Schema recommendation published in May 2001 has been adopted to define the validation rules for a wide range of XML documents. In the meantime, alternative validation languages for XML have continued to be developed, notably RELAX NG, Schematron and ISO’s integration of these languages, DSDL. These alternative approaches provide a variety of levels of expressiveness that can complement W3C XML Schemas.

In this seminar, we will be discussing different approaches to defining schema languages for XML, and looking at examples of how XML schema languages are being used in practice.

Provisional Programme

09:00 – 10:00 Registration and Coffee
10:00 – 10:45 W3C XML Schema: Key features, plans and prospects
Henry Thompson
W3C & Univ. of Edinburgh
11:15 – 11:45 ISO’s Document Schema Definition Languages (DSDL)
Martin Bryan
CSW Group Ltd
11:45 – 12:15 Scruffy Validation
Alex Brown
Griffin Brown Digital Publishing Ltd
12:15 – 12:45 Is there still life in DTDs?
Francis Cave
Francis Cave Digital Publishing Ltd
12:45 – 14:00 Lunch and Technology Showcase
14:00 – 14:30 Using RDF to Derive Schema Mappings
Brian Matthews
CCLRC
14:30 – 15:00 RELAX NG in a complex project
Sebastian Rahtz
TEI, University of Oxford
15:00 – 15:30 Metadata Schema Repositories
Rachel Heery
UKOLN, University of Bath
15:30 – 16:15 Tea and Technology Showcase
16:15 – 16:45 Panel discussion
Chair: Michael Wilson
W3C Office for UK and Ireland
16:45 CLOSE
16:50 – 18:00 XML UK AGM – MEMBERS ONLY

Venue

The Council for the Central Laboratory of the Research Councils’s (CCLRC’s) Rutherford Appleton Laboratory is situated in the Oxfordshire countryside some 6 miles south of Didcot. Access via road is straightforward via the M4 and A34, whilst Didcot Station is the closest railway station. See the CCLRC website for further details.

Registration
Registration costs £60/£110/£85/£45 (members/non-members/corporate extras/student members) and includes refreshments and buffet lunch. To register, please download the registration form (PDF download) which contains full details of the event.

All registration is required in advance as due to security arrangements at CCLRC, we cannot accept registration on the day.

Exhibition
During the day there will be an exhibition area with company stands. If you would like to exhibit, please contact the exhibition organizer, Nigel Bray.

Presentations and Speakers

W3C XML Schema: Key features, plans and prospects
Henry Thompson, University of Edinburgh and W3C
In this talk I’ll concentrate on a number of features of W3C XML Schema which distinguish it from other approaches to document structure definition, and which are proving to be a good fit for user and developer requirements:
Type derivation: for managing language change
Annotation: for managing databinding
Rich PSVI: for Web Services versioning architectures

I’ll also discuss the W3C’s XML Schema Working Group’s plans for the next version of the XML Schema Recommendation.

Henry S. Thompson divides his time between the School of Informatics at the University of Edinburgh, where he is Reader in Artificial Intelligence and Cognitive Science, based in the Language Technology Group of the Human Communication Research Centre, and the World Wide Web Consortium (W3C), where he works in the XML Activity.

He received his Ph.D. in Linguistics from the University of California at Berkeley in 1980. His university education was divided between Linguistics and Computer Science, in which he holds an M.Sc. While still at Berkeley he was affiliated with the Natural Language Research Group at the Xerox Palo Alto Research Center, where he participated in the GUS and KRL projects. He research interests have ranged widely, including natural language parsing, speech recognition, machine translation evaluation, modelling human lexical access mechanisms, the fine structure of human-human dialogue, language resource creation and architectures for linguistic annotation. His current research is focussed on articulating and extending the architectures of XML.

He was a member of the SGML Working Group of the World Wide Web Consortium which designed XML, is the author of the XED, the first free XML instance editor and co-author of the LT XML toolkit and is currently a member of the XSL and XML Schema Working Groups of the W3C. He is lead editor of the Structures part of the XML Schema W3C Recommendation, for which he co-wrote the first publicly available implementation, XSV. He has presented many papers and tutorials on SGML, DSSSL, XML, XSL and XML Schemas in both industrial and public settings over the last eight years.

ISO’s Document Schema Definition Languages (DSDL)
Martin Bryan, CSW Group Ltd
The multipart ISO 19757 Document Schema Definition Languages (DSDL) will provide an integrated suite of data validation techniques that will inclcude grammar-based validation (e.g. RELAX NG), rule-based validation (e.g. Schematron), namespace-based segmentation of validation candidates (e.g. NVDL), advanced user-customizable datatyping, path-based inter-element validation, character repertoire definition and validation, declarative document architectures and extensions to DTDs to access namespaces, datatypes, etc. The suite will interact through a validation management standard that will be used to control the order in which otherwise separated validation processes are fully integrated.

Martin Bryan, a Senior Technical Consultant at CSW Informatics, convenes the ISO working group responsible for the development of DSDL. He represents XML UK on BSI’s IST/41 panel that monitors the work of ISO/IEC JTC1/SC34. A regular contributor to Interchange and a member of ISUG, Martin has for many years promoted the use of structured document standards such as SGML, DSSSL, Topic Maps, XML and XSL througout Europe.

Scruffy Validation
Alex Brown, Griffin Brown Digital Publishing Ltd
In the real world, many XML documents do not have structures or content that can be adequately tested by neat grammar-centric schema languages such as XML DTDs or RELAX NG schemas. Similarly, it is sometimes the case that ‘validity’ is not a cleanly boolean condition, but one that involves more nuanced judgements about a document’s content.

In this presentation Alex Brown will demonstrate how some current and up-coming validation technologies can be used to assess and report on the state of such ‘scruffy’ documents.

In 1997 Alex Brown was one of the founding directors of Griffin Brown Digital Publishing Ltd, a UK-based company providing XML-based services and products. He is responsible for leading the company’s XML consulting and implementation, and his work includes advising clients on XML/IT strategy and practice, mentoring clients’ staff, writing DTDs and Schemas, and designing and developing XML software systems in C++, Java and other languages. In 2002, Alex was invited to join the British Standards Institute (BSI) Technical Committee IST/41, where he contributes to ISO/IEC JTC1/SC34 in its formation of the DSDL ISO standard, among other things.

Is there still life in DTDs?
Francis Cave, Francis Cave Digital Publishing Ltd
For many XML users, especially those still primarily concerned with document applications, considerable time and money have been invested over many years in the development and maintenance of DTDs. To such users, the benefits to be gained by switching schema languages may not appear to justify the costs of new tools, new procedures and re-training that would be involved. Having learnt to love entities, the lack of entity support in other schema languages is reason enough to be put off such a switch. Must these pioneer adopters of XML, and SGML before that, pay yet again to take advantage of current developments in schema languages? Francis Cave will take a look at what might be done in DSDL to prolong the life of DTDs.

Francis Cave is an independent consultant with over 20 years of experience with markup technologies. He provides a range of XML and SGML consultancy, training and related services to publishers and to other businesses and organisations concerned with their use in publishing and information management and delivery. Francis is chairman of XML: UK and of the Technical Committee of BSI responsible for SGML, XML and related standards, and is currently coordinating the drafting of Part 9 of DSDL, which is concerned with extension of the DTD language.

Using RDF to Derive Schema Mappings
Brian Matthews, CCLRC Rutherford Appleton Laboratory
The problem of converting XML data between different schemas remains a problem. Different developers will typically use different structures to represent the same underlying information, which makes the communication of information difficult. In this talk, I shall consider how emerging developments in the Semantic Web, particularly RDF and OWL, might be used to provide a common semantics for different XML Schemas and give rise to a mapping between different XML structures.

Brian Matthews currently leads a research and development team specialising in information science and engineering within CCLRC’s Rutherford Appleton Laboratory. He has more than 15 years experience in computer science research, with interests in formal modelling, scientific metadata, Grid-based distributed systems and security. He has been involved with the W3C since 1997, and is currently deputy Manager of the W3C Office for the UK and Ireland, and is involved in the European Project, Semantic Web Advanced Development in Europe. He also lectures on Web technologies at Oxford Brookes University.

RELAX NG in a complex project
Sebastian Rahtz, Oxford University Computing Services
This paper will describe the political and technical processes the Text Encoding Initiative has gone through over the last few year or so in converting to XML schemas. The TEI is maintained in a literate programming system which uses RELAX NG schema language at its core, and has mechanisms to generate DTD, RELAX NG and W3C Schema as neeed. Among the practical issues we will address are the design of TEI literate programming language; providing an interface for users to design their own view of; the TEI; practical tools for RELAX NG; establishing a relationship with Docbook.

Sebastian Rahtz has been in and around computers, publishing and information for the last 20 years. He is now Information Manager for Oxford University Computing Services, concerning himself with XML-based web sites, portals, and the like. He is also currently seconded part-time to manage the JISC-funded Open Source Advisory Service. As Oxford’s representative on the Text Encoding Initiative Consortium Board of Directors and TEI Technical Council, he spends a fair amount of time working on the TEI’s Guidelines. For the last two years he has been working on changing the TEI to have RELAX NG at its base.

Metadata Schema Registries
Rachel Heery, UKOLN, University of Bath
Metadata schema registries are intended to provide services based on aggregated data from many different schemas. A metadata schema registry provides information about the terms and relationships within metadata vocabularies. The registry allows human users to discover and re-use existing schemas. There is also potential for such registries to support access to machine processable descriptions of vocabularies, and machine interpretable mapping between vocabularies. This presentation will review the role of metadata schema registry initiatives in the context of digital libraries. In particular it will focus on some of the challenges facing the current JISC Information Environment Metadata Schema Registry project which is developing a pilot registry service over the next year.

Rachel Heery works for UKOLN at the University of Bath as Assistant Director leading the Research and Development team. Rachel has been involved in a number of projects in recent years exploring the role of metadata to support digital library services. She brings to this role wide experience of the implementation and development of information management systems in the commercial and library sectors. She has a particular interest in metadata schema registries and metadata application profiles, and has been involved on projects developing prototype registries (CORES, MEG). Rachel has been active in the development of the Dublin Core, she co-chairs the DCMI Registry Working Group, and is a member of the Dublin Core Advisory Board.

This presentation was cancelled.