XIB - XML Inline Binary

Jeremie Miller

The Jabber.org Foundation

Cascade IA
52033
US
jeremie@jabber.org

01/15/2002

This document describes a technique for handling inline binary data inside XML documents.


Introduction

The advantages of including binary data inline inside XML is left as an exercise for the reader, this technique is a way to accomplish such a goal while optimizing for simplicity, compatibility, and efficiency.

XIB has two representations, a base64 encoded form that meets the XML specification [1], and a compiled form that differs only in that the binary blocks are included inline with the XML and not encoded or altered in any way.


Encoded Form

Since raw binary data cannot comply with the XML specification, it must be represented differently in a well-formed or valid XML document. An XML Namespace [2] is used to wrap the encoded form of the binary data. This XIB namespace has one element named "xib" with three optional attributes: length, id, and encoding. The only contents allowed within the xib element is the binary data encoded in base64 or the the alternate specified encoding.

Simple example of the XIB encoded form within an XML document:

<book xmlns="http://book.org/namespace">
  <title>...</title>
  <author>...</author>
  <coverart type="tiff"><xib xmlns="http://.../xib/ns" length="24">Zm9sbG93IHRoZSBibHVlIGVsZXBoYW50</xib></coverart>
  <toc>...</toc>
  <contents>...</contents>
</book>

XML Schema for the XIB Namespace:

<schema 
  xmlns='http://www.w3.org/2001/XMLSchema' 
  xmlns:xib='http://.../xib/ns'
  targetNamespace='http://.../xib/ns' 
  elementFormDefault='qualified'>
    <element name='xib:xib'>
        <complexType mixed='true'>
            <attribute name='length' type='unsignedShort' use='optional'/>
            <attribute name='id' type='string' use='optional'/>
            <attribute name='encoding' type='string' use='optional'/>
        </complexType>
    </element>
</schema>


Compiled Form

There are no changes to any other elements, namespaces, or encodings in the XML document containing compiled XIB. In other words the XML document is preserved as-is, and the only change is in the representation of the XIB namespace. The xib element is instead replaced by a Unicode [3] character U+0001 (Start Of Header) followed by a network byte order 16 bit short integer representing the number of following bytes in the binary frame. Immediately following the last byte based on the given length, the XML document continues as-is. The id attribute is not directly represented, as it is optional and generated on the fly if required for the encoded form.

Approximate example of the compile form:

<auth xmlns="http://auth.org/namespace">
  <user>...</user>
  <token>...</token>
  <secret>\124follow the blue elephant</secret>
</auth>

This representation provides a very simple mechanism for including any binary data within the data structures that XML offers. There is no associated encoding loss in space or time and it's still very easy to process with minimal additional overhead. When being stored and transmitted the compiled format should be typed and flagged accordingly to ensure that there is no confusion with the well formed representation, such as naming the file with an alternate .xib extension or using an appropriate alternate MIME type.


Compiled Processing Techniques

If possible the easiest way is to simply add support internally in an existing XML parser, but that is often either not possible or undesirable. The alternative is still very simple, pre-process the XML and extract the binary blocks before they are delivered to any existing parser. Some parsers externally expose the exact parsing state and that can be used by the pre-processor to ensure the integrity of the XML based on the location of the binary frame. When the parsing state is not available or there is internal caching in the parser being used, it is important then for the pre-processor to insert the qualified xib element representing the binary frame. The application can then handle the xib element and use it as a reference to the extracted binary frame.

Notes

[1]

The W3C XML Recommendation http://www.w3.org/TR/REC-xml

[2]

XML Namespaces http://www.w3.org/TR/REC-xml-names/

[3]

Unicode Standard http://www.unicode.org/unicode/standard/standard.html