OSSP: CVS Repository: ossp-pkg/xds/docs/libxds.tex 1.11

ossp-pkg/xds/docs/libxds.tex 1.11
% -*- mode: LaTeX; fill-column: 75; -*-
%
% $Id: libxds.tex,v 1.11 2001/08/09 15:23:28 simons Exp $
%
\documentclass[a4paper,10pt,pointlessnumbers,bibtotoc]{scrartcl}
\usepackage[dvips,xdvi]{graphicx}
\usepackage{fancyvrb}
\typearea[2cm]{12}
\fussy

\begin{document}

\titlehead{Cable \& Wireless Deutschland GmbH\\Application Services\\Development Team}
\title{OSSP XDS ---\\eXtensible Data Serialization}
\author{Peter Simons $<$simons@computer.org$>$}
\date{2001-08-01}
\maketitle

\section{Introduction}

In today's networked world, computer systems of all brands and flavours
communicate with each other. Unfortunately, these systems are far from
being identical: Many systems use different internal representations for
the same thing. Look at the (hexadecimal) number \$1234 for instance: On a
big endian machine, this number will be stored in memory the way you'd
intuitively expect: \$12~\$34 --- the more significant byte preceeds the
less significant one. On a little endian machine, though, the number \$1234
will be stored like this: \$34~\$12 --- exactly the other way round.

As a result, you cannot just write the number \$1234 to a socket and expect
the other end to understand it correctly, because if the endians differ,
the reader will read a different number than the writer sent. Things will
get even more complicated when you start exchanging floating point numbers,
for which about a dozen different encodings exist!

\begin{figure}[tbh]
    \begin{center}
        \includegraphics[width=\textwidth]{data-exchange.eps}
        \caption{Data exchange using XDS}
        \label{data exchange}
    \end{center}
\end{figure}

Solving these problems is the domain of XDS; its purpose is to encode data
in a way that allows this data to be exchanged between computer systems of
different types. Assume you'd want to reliably transfer the value \$1234
from host A to host B. Then you would encode the value using XDS, transfer
the encoded data over the network, and decode the value again at the other
end. Every application that follows this process will read the correct
value no matter what native representation its hosting platform uses
internally.

There is a rich variety of applications for such a functionality: XDS
may be used to encode data before it is written to disk or read from the
disk, it may be used to encode data to be exchanged between processes over
the network, etc. Because of this variety, special attention has been paid
to the library design.

\paragraph{The library has been designed to be extensible.}
The functionality is split into a generic encoding and decoding framework
and a set of encoding and decoding engines. These engines can be plugged
into the framework at run-time to actually do the encoding and decoding of
data. Because of this architecture, XDS can be customized to deploy any
data format the developer sees fit. Included in the distribution are
engines for the XDR format specified in \cite{xdr} and for the XML format
specified in \cite{xml}.

\paragraph{The library is convenient to use.}
An arbitrary number of variables can be encoded or decoded with one single
function call. All memory management is done by XDS, the developer
doesn't need bother to allocate or to manage buffers for the encoded or
decoded data. Automatic buffer management can be switched off at run-time,
though, for maximum performance.

\paragraph{Performance.}
Since all transferred data has to wander through XDS, the library has been
written to encode and decode with maximum performance. The generic encoding
framework adds almost no run-time overhead to the encoding process. If
non-automatic buffer management has been selected, hardly anything but the
actual encoding/decoding engines is executed.

\paragraph{Robustness.}
In order to verify that the library is working correctly, a set of
regression tests is included in the distribution. The test suites will ---
among other things --- encode known values and compare the result with the
expected (correct) values. This ensures that XDS works correctly on any
platform.

\paragraph{Use standard formats.}
The supported XDR and XML formats are widely known and accepted, meaning
that they are interoperable with other marshaling implementations. For XDR
for instance, it would be possible to encode data with XDS and to decode it
with an entirely different XDR implementation or vice versa.

\paragraph{Portability.}
XDS has been written with portability in mind. Development took place on
FreeBSD, Linux and Solaris; other platforms were used to test the results.
It is expected that XDS will compile and function on virtually any
POSIX.1-compliant system with a moderately modern ISO-C compiler. GNU's
C~Compiler~(gcc) is known to compile the library just fine. For maximum
portability, GNU Autoconf has been used to determine the target system's
properties.

\section{Architecture of XDS}

\begin{figure}[htb]
    \begin{center}
        \includegraphics[width=\textwidth]{architecture.eps}
        \caption{Components of XDS}
        \label{XDS components}
    \end{center}
\end{figure}

The architecture of XDS is illustrated in figure~\ref{XDS components}. XDS
consists of three components: The generic encoding and decoding framework,
a set of engines to encode and decode values in a certain format, and a
run-time context, which is used to manage buffers, registered engines, etc.

In order to use the library, the first thing the developer has to do is to
create a valid XDS context by calling \textsf{xds\_init()}. The routine
requires one parameter that determines whether to operate in encoding- or
decoding mode. A context can be used for encoding or decoding only; it is
not possible to use the same context for both operations. Once a valid XDS
context has been obtained, the routine \textsf{xds\_register()} can be used to
register an arbitrary number of encoding or decoding engines within the
context.

A set of engines has been included in the library. These routines will
handle any elementary datatype included in the ISO-C language such as
32-bit integers, 64-bit integers, unsigned integers (of both 32- and
64-bit), floating point numbers, strings and octet streams.

Once all required encoding/decoding engines are registered, the routines
\textsf{xds\_encode()} or \textsf{xds\_\-decode()} may be used to actually
perform the encoding or decoding process. Any data type for which an engine
has been registered can be handled by the library.

This means, that it is possible for the developer to write custom engines
for any data type he desires to use and to register them in the context as
long as these engines adhere to the \textsf{xds\_engine\_t} interface defined
in \textsf{xds.h}.

In particular it is possible to register meta engines. That is an engine
designed to encode or decode structures --- data types which consist of
several elementary data types. The engine for such a structure will simply
re-use the existing engines in order to encode or decode the whole
structure. The clou here is that the meta engine doesn't even need to know
\emph{which} low-level engines are registered in order to use them. Hence,
a meta engine may format the whole structure in XDR, XML, or any other
format without needing to know anything about the details.

This topic is addressed in great detail in section~\ref{meta engines} of
this document, but before we come to that rather advanced topic, let us
start by studying two simple examples of how data is encoded and decoded
using XDS.

\section{Using the XDS library}

\subsection{Encoding}

The following example program will encode three variables using the XDR
engines. The result of the process will then be written to the standard
output stream, which can be redirected to a file or piped into the decoding
program described in the next section. Just take a look at the source code
for a moment, we will then go on to discuss all relevant sections line by
line.

\begin{Verbatim}[numbers=left,fontsize=\small,frame=lines]
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <xds.h>

static void error_exit(int rc, const char* msg, ...)
    {
    va_list args;
    va_start(args, msg);
    vfprintf(stderr, msg, args);
    va_end(args);
    exit(rc);
    }

int main()
    {
    xds_t* xds;
    char*  buffer;
    size_t buffer_size;

    xds_int32_t  int32  = -42;
    xds_uint32_t uint32 = 0x12345678;
    const char*  string = "This is a test.";

    xds = xds_init(XDS_ENCODE);
    if (xds == NULL)
        error_exit(1, "Failed to initialize XDS context: %s\n", strerror(errno));

    if (xds_register(xds, "int32",  &xdr_encode_int32, NULL) != XDS_OK ||
        xds_register(xds, "uint32", &xdr_encode_uint32, NULL) != XDS_OK ||
        xds_register(xds, "string", &xdr_encode_string, NULL) != XDS_OK)
        error_exit(1, "Failed to register my encoding engines!\n");

    if (xds_encode(xds, "int32 uint32 string", int32, uint32, string) != XDS_OK)
        error_exit(1, "xds_encode() failed!\n");

    if (xds_getbuffer(xds, XDS_GIFT, (void**)&buffer, &buffer_size) != XDS_OK)
        error_exit(1, "getbuffer() failed.\n");

    xds_destroy(xds);

    write(STDOUT_FILENO, buffer, buffer_size);

    free(buffer);

    fprintf(stderr, "Encoded data:\n");
    fprintf(stderr, "\tint32   = %d\n", int32);
    fprintf(stderr, "\tuint32 = 0x%x\n", uint32);
    fprintf(stderr, "\tstring = \"%s\"\n", string);

    return 0;
    }
\end{Verbatim}

\paragraph{Lines 1--5.}
The program starts by including several system headers, which define the
prototypes for some routines we use. The most interesting header in our
case is of course \textsf{xds.h} --- the header of XDS. Please note that
all declarations required to use XDS are included in that file.

\paragraph{Lines 7--13.}
The \textsf{error\_exit()} routine is not relevant for the example; we just
define it to make the rest of the source code shorter and easier to read.

\paragraph{Lines 16--53.}
This is the interesting part: The \textsf{main()} routine. This function will
create the variables to be encoded on the stack, assign values to them,
initialize the XDS library, use it to encode the values, and write the
result of the encoding process to the standard output stream. Read on for
further details.

\paragraph{Lines 26--28.}
First of all we have to obtain an XDS context for all further operation.
This is done by calling \textsf{xds\_init()}. Since we intend to \emph{encode}
data, we initialize the context in encoding mode. The only other mode of
operation would be decoding mode, but this is demonstrated in the next
section.

All routines in XDS return a code from a small list of return codes defined
in \textsf{xds.h}, but \textsf{xds\_init()} is different: It will return a
pointer to an \textsf{xds\_t} in case of success and \textsf{NULL} in case
of failure. One reason why \textsf{xds\_init()} would fail is because it
can't allocate the memory required to initialize the context. In this case,
the system variable \textsf{errno} is set to \textsf{ENOMEM}. Another
reason why \textsf{xds\_init()} would fail is because the mode parameter is
invalid, in which case \textsf{errno} woulde be set to \textsf{EINVAL}. If
XDS has been compiled with assertions enabled, such an error would result
in an assertion error, terminating the program with a diagnostic message
immediately.

\paragraph{Lines 30--33.}
Once we have obtained a valid XDS context, we register the engines we need.
In this example, we'll encode a signed 32-bit integer, an unsigned 32-bit
integer, and a string. We'll be using XDR encoding in this case, so the
engines to register are \textsf{xdr\_encode\_int32()},
\textsf{xdr\_encode\_uint32()}, and \textsf{xdr\_encode\_string()}. (A
complete list of available engines can be found in \textsf{xds.h}, in
section~\ref{xdr}~and~\ref{xml}, or in the manual pages for the library.
Please note that we could switch the deployed encoding format simply be
using the corresponding \textsf{xml\_encode\_XXX()} engines here. We could
even mix XDR and XML encoding as we see fit but it's hard to think of a
case where this would make sense.

As you can see in the code, the developer is free to choose a name he'd
like to register the engine under. These names may only contain
alphanumerical characters plus the hyphen (``\verb#-#'') and the underscore
(``\verb#_#''). You can choose any name you want, but it is recommended to
follow the naming scheme of the corresponding engine. Why this is
recommended will be seen in section~\ref{meta engines}.

\paragraph{Lines 35--36.}
This is the place where the actual encoding takes place. As parameters,
\textsf{xds\_encode()} requires a valid encoding context plus a format string
that describes how the following parameters are to be interpreted. While
the concept is obviously identical to \textsf{sprintf()}, the syntax is
different. The format string may contain an arbitrary number of names,
which are delimited by an arbitrary number of any character that is not a
legal character for engine names. Thus you can delimit the names by colons,
blanks, or whatever you like.

For each valid engine name in the format string, a corresponding parameter
must follow. What these parameters mean depends on the engine you're using.
The engines provided with the XDS library will expect the value to encode,
but theoretically developers are free to write encoding and decoding
engines that expect virtually any kind of information here. More about this
will explained in section~\ref{meta engines}.

\paragraph{Lines 38--39.}
We have encoded all values we wanted to encode, now we can get the result
from the library. This happens by calling \textsf{xds\_getbuffer()}. The
routine will store the buffer's address and length at the locations we
provided as parameters. Please note that we can choose whether we want the
buffer as a ``gift'' (\textsf{XDS\_GIFT}) or as a ``loan'' (\textsf{XDS\_LOAN}).

The buffer being a ``loan'' means that the buffer is still owned by the
library --- we're only allowed to peak at it. But any call to an XDS
routine may potentially modify the buffer or even change the buffers
location. Hence the result of a \textsf{xds\_getbuffer()} call with loaning
semantics is only valid until the next XDS routine is called. After
that, it is invalid.

If we choose the gift semantics, the buffer we receive will be owned by us;
the library will not touch the buffer again. This means of course, that
we're responsible for \textsf{free()}ing the buffer when we don't need it
anymore.

\paragraph{Line 41.}
Destroy the XDS context and all data associated with it. This is possible
because we requested the buffer as ``gift''; the buffer is not associated
with XDS anymore.

\paragraph{Line 43.}
Write the buffer with the encoded data to the standard output stream.

\paragraph{Line 45.}
Now that we don't need the buffer anymore, we have to return the memory it
uses to the system. XDS won't do that for us.

\paragraph{Lines 47--50.}
Write a short report of what we have done to the standard error channel.

\bigskip
Finally, let us compile and execute the example program shown above. For
convenience, it is included in the distribution under the name
\textsf{docs/encode.c}. You can compile and execute the program as follows:

\begin{quote}
\begin{verbatim}
$ gcc -I.. encode.c -o encode -L.. -lxds
$ ./encode >output
Encoded data:
        int32   = -42
        uint32 = 0x12345678
        string = "This is a test."
$ ls -l output
-rw-r--r--  1 simons  simons  28 Aug  2 15:21 output
\end{verbatim}
\end{quote}

The result of executing the programm --- the file \textsf{output} --- can be
displayed with \textsf{hexdump(1)} or \textsf{od(1)} and should look like this:

\begin{quote}
\begin{Verbatim}[fontsize=\small]
$ hexdump -C output
00000000  ff ff ff d6 12 34 56 78  00 00 00 0f 54 68 69 73  |.....4Vx....This|
00000010  20 69 73 20 61 20 74 65  73 74 2e 00              | is a test..|
0000001c
\end{Verbatim}
\end{quote}

\noindent
We will also re-use this file in the next section, where we'll read it and
decode those values again.


\subsection{Decoding}

The following example program will read the result of the encoding example
shown in the previous section and decode the values back into the native
representation. Then it will print those values to the standard error
stream so that the user can see the values are correct. Please take a look
at the source now, we'll discuss all relevant details in the following
paragraphs.

\begin{Verbatim}[numbers=left,fontsize=\small,frame=lines]
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <xds.h>

static void error_exit(int rc, const char* msg, ...)
    {
    va_list args;
    va_start(args, msg);
    vfprintf(stderr, msg, args);
    va_end(args);
    exit(rc);
    }

int main()
    {
    xds_t* xds;
    char   buffer[1024];
    size_t buffer_len;
    int rc;

    xds_int32_t  int32;
    xds_uint32_t uint32;
    char*        string;

    buffer_len = 0;
    do
        {
        rc = read(STDIN_FILENO, buffer + buffer_len, sizeof(buffer) - buffer_len);
        if (rc < 0)
            error_exit(1, "read() failed: %s\n", strerror(errno));
        else if (rc > 0)
            buffer_len += rc;
        }
    while (rc > 0 && buffer_len < sizeof(buffer));

    if (buffer_len >= sizeof(buffer))
        error_exit(1, "Too much input data for our buffer.\n");

    xds = xds_init(XDS_DECODE);
    if (xds == NULL)
        error_exit(1, "Failed to initialize XDS context: %s\n", strerror(errno));

    if (xds_register(xds, "int32",  &xdr_decode_int32, NULL) != XDS_OK ||
        xds_register(xds, "uint32", &xdr_decode_uint32, NULL) != XDS_OK ||
        xds_register(xds, "string", &xdr_decode_string, NULL) != XDS_OK)
        error_exit(1, "Failed to register my decoding engines!\n");

    if (xds_setbuffer(xds, XDS_LOAN, buffer, buffer_len) != XDS_OK)
        error_exit(1, "setbuffer() failed.\n");

    if (xds_decode(xds, "int32 uint32 string", &int32, &uint32, &string) != XDS_OK)
        error_exit(1, "xds_decode() failed!\n");

    xds_destroy(xds);

    fprintf(stderr, "Decoded data:\n");
    fprintf(stderr, "\tint32   = %d\n", int32);
    fprintf(stderr, "\tuint32 = 0x%x\n", uint32);
    fprintf(stderr, "\tstring = \"%s\"\n", string);

    free(string);

    return 0;
    }
\end{Verbatim}

\paragraph{Lines 1--25.}
Include the required header files, define the \textsf{error\_exit()} helper
function, and create the required variables on the stack.

\paragraph{Lines 27--39.}
These instructions will read an unspecified number of bytes from the
standard input stream --- as long as the input does not exceed the size of
the \textsf{buffer} variable. In order to provide the program with the
appropriate input, redirect the standard input stream to the file
\textsf{output} created in the previous section or connect the encoding and
decoding programs directly by a pipe.

\paragraph{Lines 41-43.}
Create a context for decoding the values. The semantics are identical to
those described in the previous section.

\paragraph{Lines 45--48.}
Register the decoding engines in the context. Please note that obviously
the decoding engines must correspond to the encoding engines used to create
the data we're about to process. Using, say, an XML engine to decode XDR
data will at best return with an error --- in the worst case, it will
return incorrect results!

\paragraph{Lines 50-51.}
Here we do not get a buffer from the library, we \emph{set} the buffer
we've read earlier in the context for decoding. Please note that we use
loan semantics in this case, not gift semantics. This is necessary because
\textsf{buffer} has not been allocated by \textsf{malloc()} --- the variable
lives on the stack. This means that we cannot give it to XDS because
XDS expects to be able to \textsf{free()} the buffer when the context is
destroyed.

Loan semantics are fine, though, all we have to do is to take care that we
don't erase or modify the contents of \textsf{buffer} while XDS operates on
it. The library itself will never touch the buffer in decode mode, no
matter whether loan or gift semantics have been chosen.

\paragraph{Lines 53--54.}
Here come the actual decoding of the buffer's contents using {\sf
xds\_decode()}. The syntax is identical to \textsf{xds\_encode()}'s, the only
difference is that the decoding engines do not expect the values --- like
the encoding engines did --- but the location where to store the value.
Thus we pass the addresses of the appropriate variables here. If the routine
returns with \textsf{XDS\_OK}, the decoded values will have been stored in
those locations.

It should be noted that the decoded string cannot trivially be returned
this way. Instead, \textsf{xds\_decode()} will use \textsf{malloc()} to allocate
a buffer barely large enough to hold the string. The address of that buffer
is then stored in the pointer \textsf{string}. Of course this means that the
application has to \textsf{free()} the string once it's not required anymore.

\paragraph{Line 56.}
We don't need the context anymore, so we destroy it and free all used
resources. This does not affect \textsf{buffer} in any way because we used
loan semantics.

\paragraph{Lines 58-61.}
Print the decoded values to the standard error stream for the user to take
a look at them.

\paragraph{Line 63.}
Now that we don't need the contents of \textsf{string} anymore, we must return
the buffer allocated in \textsf{xds\_decode()} to the system.

\bigskip
Like the encoding program described earlier, the source code to this
program is included in the library distribution as \textsf{docs/decode.c}.
You can compile and execute the program like this:

\begin{quote}
\begin{verbatim}
$ gcc -I.. decode.c -o decode -L.. -lxds
$ ./decode <output
Decoded data:
        int32   = -42
        uint32 = 0x12345678
        string = "This is a test."
\end{verbatim}
\end{quote}

Of course we assume that the \textsf{output} file has been created as
described in the previous section, otherwise you cannot trivially use the
example program. Alternatively, you could execute both programs like this:

\begin{quote}
\begin{Verbatim}[fontsize=\small]
$ ./encode | ./decode
Encoded data:
        int32   = -42
        uint32 = 0x12345678
        string = "This is a test."
Decoded data:
        int32   = -42
        uint32 = 0x12345678
        string = "This is a test."
\end{Verbatim}
\end{quote}

\noindent
This will encode and decode the values without the need for a temporary
file.

\section{Extending the XDS library}
\label{meta engines}

Now that we know how primitive data types can be encoded and decoded, let's
write a ``meta engine'' that will handle complex data structures. For the
example, we'll use the structure ``mystruct'', which is defined as follows:

\begin{quote}
\begin{verbatim}
struct mystruct
    {
    xds_int32_t   small;
    xds_int64_t   big;
    xds_uint32_t  positive;
    char          text[16];
    };
\end{verbatim}
\end{quote}

Some readers might wonder why the structure is defined using these weird
data types rather than the familiar ones like \textsf{int}, \textsf{long},
etc. The reason is that these data types have an undefined size. An
\textsf{int} variable will have, say, 32 bits when compiled on the average
Unix machine, but when the same source is compiled on a 64-bit machine,
like TRUE64 Unix, it will have a size of 64 bit. That is a problem when
those structures have to be exchanged between entirely different systems,
because the structures are binary incompatible --- something even XDS
cannot remedy.

Anyway, in order to encode an instance of this structure, we write an
encoding engine:

\begin{quote}
\begin{verbatim}
static int encode_mystruct(xds_t* xds, void* engine_context,
                           void* buffer, size_t buffer_size,
                           size_t* used_buffer_size,
                           va_list* args)
    {
    struct mystruct* ms;
    ms = va_arg(*args, struct mystruct*);
    return xds_encode(xds, "int32 int64 uint32 octetstream",
                      ms->small, ms->big, ms->positive,
                      ms->text, sizeof(ms->text));
    }
\end{verbatim}
\end{quote}

This engine does nothing but take the address of the ``mystruct'' instance
from the stack and then use xds\_encode() to handle all elements of
``mystruct'' separately --- which is fine, because these data types are
supperted by XDS already. It is worth noting, though, that we refer to
the other engines by name, meaning that these engines must be registered in
``xds'' by that name!

What is very nice, though, is the fact that this encoding engine does not
even need to know which  engines are used to encode the actual
values! If the user registeres the XDR engines under the appropriate names,
``mystruct'' will be encoded in XDR. If the user registeres the XML engines
under the appropriate names, ``mystruct'' will be encoded in XML. Because of
that property we call such an engine a ``meta engine''.

Of coures you need not necessarily implement an engine like that: Rather
than going through xds\_encode(), it would be possible to execute the
appropriate encoding engines directly. This had the advantage of not
depending on those engines being registered at all, but it would make the
meta engine depend on the elementary engines unnecessarily.

One more word about the engine syntax and semantics: As has been mentioned
earlier, any function that adheres to the shown above is potentially an
engine. These parameters have the following meaning:

\begin{itemize}

\item xds --- This is the XDS context that was originally provided to the
xds\_encode() call, which in turn executed the engine. It may be used, for
example, for executing xds\_en\-code() again like we did in the example
engine shown before.

\item engine\_context --- The engine context can be used by the engine to
store any type of internal information. The value the engine will receive
must have been provided when the engine was registered by xds\_register().
Engines may obviously neglect this parameter if they don't need a context
of their own --- all engines included in this distribution do so.

\item buffer --- This parameter points to the buffer in which the encoded
data should be written. In decoding mode, ``buffer'' points to the encoded
data, which should be decoded; the location where the results should be
stored at can be found on the stack then.

\item buffer\_size --- The number of bytes that are available in
``buffer''. In encoding mode, this means ``free space'', in decoding mode,
``buffer\_size'' determines how many bytes of encoded data are available in
``buffer'' for consumation.

\item used\_buffer\_size --- This parameter points to a variable, which the
callback must set before returning in order to let the framework know how
many bytes it actually used in ``buffer''. A callback encoding, say, an
int32 number into a 8 byte text representation would set the
used\_buffer\_size to 8, for instance:
\begin{quote}
\begin{verbatim}
*used_buffer_size = 8;
\end{verbatim}
\end{quote}
In encoding mode, this variable determines how many bytes the engine has
written into ``buffer''; in decoding mode the variable determines how many
bytes the engines has read from ``buffer''.

\item args --- This pointer points to an initialized varadic argument. Use
the standard C macro va\_arg() to fetch the actual data.

\end{itemize}

A callback may return any of the following return codes as defined in
\textsf{xds.h}:

\begin{itemize}
\item XDS\_OK --- No error.

\item XDS\_ERR\_NO\_MEM --- Failed to allocate required memory.

\item XDS\_ERR\_OVERFLOW --- The buffer is too small to hold all encoded
data. The callback may set ``*used\_buffer\_size'' to the number of bytes
it needs in ``buffer'', thereby giving the framework a hint by how many
bytes it should enlarge the buffer before trying the engine again, but just
leaving ``*used\_buffer\_size'' alone will work fine, too, it may just be a
bit less efficient in some cases. Obviously this return code does not make
much sense in decoding mode.

\item XDS\_ERR\_INVALID\_ARG --- Unexpected parameters.

\item XDS\_ERR\_TYPE\_MISMATCH --- This return code should be returned in
decoding mode in case the decoding engine realizes that the data it is
decoding does not fit what it's expecting. Not all encoding formats will
allow to detect this at all. XDR, for example, does not.

\item XDS\_ERR\_UNDERFLOW --- In decode mode, this error should be returned
when an engine needs, say, 4 byte of data in order to decode a value but
``buffer''/''buffer\_size'' provides less.

\item XDS\_ERR\_UNKNOWN --- Any other reason to fail than those listed
before.

\end{itemize}

Let's take a look at the corresponding decoding engine now:

\begin{quote}
\begin{verbatim}
static int decode_mystruct(xds_t* xds, void* engine_context,
                           void* buffer, size_t buffer_size,
                           size_t* used_buffer_size,
                           va_list* args)
    {
    struct mystruct* ms;
    size_t i;
    char*  tmp;
    int rc;
    ms = va_arg(*args, struct mystruct*);
    rc = xds_decode(xds, "int32 int64 uint32 octetstream",
                    &(ms->small), &(ms->big), &(ms->positive),
                    &tmp, &i);
    if (rc == XDS_OK)
        {
        assert(i == sizeof(ms->text));
        memmove(ms->text, tmp, i);
        free(tmp);
        }
    return rc;
    }
\end{verbatim}
\end{quote}

The engine simply calls xds\_decode() to handle the separate data types.
The only complication is that the octet stream decoding engines return a
pointer to \textsf{malloc()}ed buffer --- what is not what we need. Thus we
have to manually copy the contents of that buffer into the right place in
the structure and free the (now unused) buffer again.

A complete example program encoding and decoding ``mystruct'' can be found
at \textsf{docs/\-extended.c} in the distribution.

\section{The XDS Framework}
\label{xds}

\subsection{xds\_t* xds\_init(xds\_mode\_t~\underline{mode});}

This routine creates and initializes a context for use with the XDS
library. The ``mode'' parameter may be either \textsf{XDS\_ENCODE} or
\textsf{XDS\_DECODE}, depending on whether you want to encode or to decode
data. If successful, xds\_init() returns a pointer to the XDS context
structure. In case of failure, though, xds\_init() will return
\textsf{NULL} and set \textsf{errno} to ENOMEM (failed to allocate internal
memory buffers) or EINVAL (``mode'' parameter was invalid).

A context obtained from xds\_init() should be destroyed by calling
xds\_destroy() when it is not needed any more.

\subsection{void xds\_destroy(xds\_t*~\underline{xds});}

xds\_destroy() will destroy an XDS context created by xds\_init(). Doing so
will return all resources associated with this context --- most notably the
memory used to buffer the results of encoding or decoding any values. A
context may not be used anymore after it has been destroyed.

\subsection{int xds\_register(xds\_t*~\underline{xds}, const~char*~\underline{name}, xds\_engine\_t~\underline{engine}, void*~\underline{engine\_context});}

This routine will register an engine in the provided XDS context. An
``engine'' is potentially any function that fullfils the following
interface:

\begin{quote}
\begin{verbatim}
int engine(xds_t* xds, void* engine_context,
           void* buffer, size_t buffer_size, size_t* used_buffer_size,
           va_list* args);
\end{verbatim}
\end{quote}

By calling xds\_register(), the engine ``engine'' will be registered under
the name ``name'' in the XDS context ``xds''. The last parameter
``engine\_context'' may be specified as the user sees fit: It will be
passed when the engine is actually called and may be used to implement an
engine-specific context. Most engines will not need a context of their own,
in which case \textsf{NULL} should be used here.

Please note that until the user calls xds\_register() for an XDS context he
obtained from xds\_init(), no engines are registered for that context. Even
the engines included in the library distribution are not registered
automatically.

For engine names, any combination of the characters ``a--z'', ``A--Z'',
``0--9'', ``-'', and ``\_'' may be used; anything else is not a legal
engine name component.

xds\_register() may return the following return codes: \textsf{XDS\_OK}
        (everything went fine; the engine is registered now),
\textsf{XDS\_ERR\_INVALID\_ARG} (either ``xds'', ``name'', or ``engine''
        are \textsf{NULL} or ``name'' contains illegal characters for an engine
        name), or
\textsf{XDS\_ERR\_NO\_MEM} (failed to allocate internally required
        buffers).

\subsection{int xds\_unregister(xds\_t*~\underline{xds}, const~char*~\underline{name});}

xds\_unregister() will remove the engine ``name'' from XDS context ``xds''.
The function will return \textsf{XDS\_OK} in case everything went fine,
\textsf{XDS\_ERR\_UNKNOWN\_ENGINE} in case the engine ``name'' is not
registered in ``xds'', or \textsf{XDS\_ERR\_INVALID\_ARG} if either ``xds''
or ``name'' are \textsf{NULL} or ``name'' contains illegal characters for
an engine name.

\subsection{int xds\_setbuffer(xds\_t*~\underline{xds}, xds\_scope\_t~\underline{flag}, void*~\underline{buffer}, size\_t~\underline{buffer\_len});}

\begin{figure}[tbh]
    \begin{center}
        \includegraphics[width=\textwidth]{setbuffer-logic.eps}
        \caption{xds\_setbuffer() modes of operation}
        \label{setbuffer logic}
    \end{center}
\end{figure}

This routine allows the user to control XDS' buffer handling: Calling it
will replace the buffer currently used in ``xds''. The address and size of
that buffer are passed to xds\_setbuffer() via the ``buffer'' and
``buffer\_len'' parameters. The ``xds'' parameter determines for which XDS
context the new buffer will be set. Furthermore, you can set ``flag'' to
either \textsf{XDS\_GIFT} or \textsf{XDS\_LOAN}.

A setting of \textsf{XDS\_GIFT} will tell XDS that the provided buffer
is now owned by the library and that it may be resized by calling
\textsf{realloc(3)}. Furthermore, the buffer is \textsf{free(3)}ed when
``xds'' is destroyed. If ``flag'' is \textsf{XDS\_GIFT} and ``buffer'' is
\textsf{NULL}, the xds\_setbuffer will simply allocate a buffer of its own
to be set in ``xds''. Please note that a buffer given to XDS as gift
\emph{must} have been allocated using \textsf{malloc(3)} --- it may not
life on the stack because XDS will try to free or to resize the buffer
as it sees fit.

Passing \textsf{XDS\_LOAN} via ``flag'' tells xds\_setbuffer() that the
buffer is owned by the application and that XDS should not free nor
resize the buffer in any case. In this mode, passing a buffer \textsf{NULL}
will result in an invalid-argument error.

\subsection{int xds\_getbuffer(xds\_t*~\underline{xds}, xds\_scope\_t~\underline{flag}, void**~\underline{buffer}, size\_t*~\underline{buffer\_len});}

This routine is the counterpart to xds\_setbuffer(): It will get the buffer
currently used in the XDS context ``xds''. The address of that buffer is
stored in the location ``buffer'' points to; the length of the buffer's
content will be stored in the location ``buffer\_len'' points to.

The ``flag'' argument may be set to either \textsf{XDS\_GIFT} or
\textsf{XDS\_LOAN}. The first setting means that the buffer is now owned by
the application and that XDS must not use it after this xds\_getbuffer()
call anymore; the library will instead allocate a new buffer for itself. Of
course this also means that the buffer will not be freed in xds\_destroy():
The application has to \textsf{free(3)} the buffer itself when it is not
needed anymore.

Setting ``flag'' to \textsf{XDS\_LOAN} tells XDS that the application
just wishes to peek into the buffer and will not modify it. The buffer is
still owned (and used) by XDS. Please note that the loaned address
returned by xds\_getbuffer() may become invalid change after any other
xds\_xxx() function call! If you need a reliable address, use
\textsf{XDS\_GIFT} mode.

The routine will return \textsf{XDS\_OK} (everything went fine) or
\textsf{XDS\_ERR\_INVALID\_ARG} (``xds'', ``buffer'' or ``buffer\_len'' are
\textsf{NULL} or ``flag'' is invalid) signifying success or failure
respectively.

Please note: It is perfectly legal for xds\_getbuffer() to return a buffer
of \textsf{NULL} and a buffer length of 0! This happens when
xds\_getbuffer() is called for an XDS context before a buffer has been
allocated.

\subsection{int xds\_vencode(xds\_t*~\underline{xds}, const~char*~\underline{fmt}, va\_list~\underline{args});}

This routine will encode one or several values using the appropriate
encoding engines registered in XDS context ``xds''. The parameter ``fmt''
contains a \textsf{sprintf(3)}-alike descriptions of the values to be
encoded; the actual values are provided in the varadic parameter ``args''.

The format for ``fmt'' is simple: Just provide the names of the engines to
be used for encode the appropriate value in ``args''. Any non-legal
engine-name character may be used as a delimiter. In order to encode two
32-bit integers followed by a 64-bit integer, the format string
\begin{quote}
\begin{verbatim}
int32 int32 int64
\end{verbatim}
\end{quote}
could be used. In case you don't like the blank, use the colon instead:
\begin{quote}
\begin{verbatim*}
int32:int32:int64
\end{verbatim*}
\end{quote}

Of course the names to be used here have to correspond to the names used to
register the engines in ``xds'' earlier.

Every time xds\_vencode() is called, it will append the encoded data at the
end of the internal buffer stored in ``xds''. Thus, you can call
xds\_vencode() several times in order to encode several values, but you'll
still get all encoded values stored in one buffer. Calling xds\_setbuffer()
or xds\_getbuffer() at any point during the encoding will re-set the buffer
to the beginning. All values that have been encoded into that buffer
already will eventually be overwritten when xds\_encode() is called again.
Hence: Don't call xds\_setbuffer() or xds\_getbuffer() unless you actually
want to access the data stored in the buffer.

Also it should be noted that the data you have to provide for ``args''
depends entirely on what the deployed engines expect to find on the stack
--- there is no ``standard'' on what should be put on the stack here. The
XML and XDR engines included in the distribution will simply expect the
value to be encoded to be found on the stack, but other engines may act
differently. See section~\ref{meta engines} for an example of such an
engine.

xds\_vencode() will return any of the following return codes:
\textsf{XDS\_OK} (everything worked fine), \textsf{XDS\_ERR\_NO\_MEM}
(failed to allocate or to resize the internal buffer),
\textsf{XDS\_ERR\_OVER\-FLOW} (the internal buffer is too small but is not
owned by us), \textsf{XDS\_ERR\_INVALID\_ARG} (``xds'' or ``fmt'' are
\textsf{NULL}), \textsf{XDS\_ERR\_UNKNOWN\_ENGINE} (an engine name
specified in ``fmt'' is not registered in ``xds''),
\textsf{XDS\_ERR\_INVALID\_MODE} (``xds'' is initialized in decode mode),
or \textsf{XDS\_ERR\_UNKNOWN} (the engine returned an unspecified error).

\subsection{int xds\_encode(xds\_t*~\underline{xds}, const~char*~\underline{fmt}, \dots{});}

This routine is basically identical to xds\_vencode(), only that it uses a
different prototype syntax.

\subsection{int xds\_vdecode(xds\_t*~\underline{xds}, const~char*~\underline{fmt}, va\_list~\underline{args});}

This routine is almost identical to xds\_vencode(): It expects an XDS
context, a format string and a set of parameters for the engines, but
xds\_vdecode() does not encode any data, it decodes the data back into the
native format. The format string again determines which engines are to be
called by the framework in order to decode the values contained in the
buffer. The native values will then be stored at the locations found in the
corresponding ``args'' entry. But please note that the exact behavior of
the decoding engines is not specified! The XML and XDR engines included in
this distribution expect a pointer to a location where to store the decoded
value, but other engines may vary.

xds\_vdecode() may return any of the following return codes:
\textsf{XDS\_OK} (everything went fine), \textsf{XDS\_ERR\_INVALID\_ARG}
(``xds'' or ``fmt'' are \textsf{NULL}), \textsf{XDS\_ERR\_TYPE\_MISMATCH}
(the format string says the next value is of type $A$, but that's not what
we found in the buffer), \textsf{XDS\_ERR\_UNKNOWN\_ENGINE} (an engine name
specified in ``fmt'' is not registered in ``xds''),
\textsf{XDS\_ERR\_INVALID\_MODE} (``xds'' has been initialized in encode
mode), \textsf{XDS\_ERR\_UNDER\-FLOW} (an engine tried to read $n$ byte from
the buffer, but we don't have that much data left), or
\textsf{XDS\_ERR\_UNKNOWN} (an engine returned an unspecified error).

\subsection{int xds\_decode(xds\_t*~\underline{xds}, const~char*~\underline{fmt}, \dots{});}

This routine is basically identical to xds\_vdecode(), only that it uses a
different prototype syntax.

\section{The XDR Engines}
\label{xdr}

\begin{tabular}{|c|c|c|c|} \hline
\bf Function Name     & \bf Expected ``args'' Datatype & \bf Input & \bf Output \\ \hline
xdr\_encode\_uint32()      & xds\_uint32\_t   & 4 bytes  & 4 bytes  \\
xdr\_decode\_uint32()      & xds\_uint32\_t*  & 4 bytes  & 4 bytes  \\[1ex]
xdr\_encode\_int32()       & xds\_int32\_t    & 4 bytes  & 4 bytes  \\
xdr\_decode\_int32()       & xds\_int32\_t*   & 4 bytes  & 4 bytes  \\[1ex]
xdr\_encode\_uint64()      & xds\_uint64\_t   & 4 bytes  & 4 bytes  \\
xdr\_decode\_uint64()      & xds\_uint64\_t*  & 4 bytes  & 4 bytes  \\[1ex]
xdr\_encode\_int64()       & xds\_int64\_t    & 4 bytes  & 4 bytes  \\
xdr\_decode\_int64()       & xds\_int64\_t*   & 4 bytes  & 4 bytes  \\[1ex]
xdr\_encode\_double()      & xds\_double\_t   & ?? bytes & ?? bytes \\
xdr\_decode\_double()      & xds\_double\_t*  & ?? bytes & ?? bytes \\[1ex]
xdr\_encode\_octetstream() & void*, size\_t   & variable & variable \\
xdr\_decode\_octetstream() & void**, size\_t* & variable & variable \\[1ex]
xdr\_encode\_string()      & char*            & variable & variable \\
xdr\_decode\_string()      & char**           & variable & variable \\ \hline
\end{tabular}
\medskip

Please note that the routines xdr\_decode\_octetstream() and
xdr\_decode\_string() return a pointer to a buffer holding the decoded
data. This buffer has been allocated with \textsf{malloc()} and must be
\textsf{free()}ed by the application when it is not required anymore. All
other callbacks write the decoded value into the location found on the
stack, but these behave differently because the length of the decoded data
is not known in advance and the application cannot provide a buffer that's
guaranteed to suffice.

\section{The XML Engines}
\label{xml}

\begin{tabular}{|c|c|c|c|} \hline
\bf Function Name     & \bf Expected ``args'' Datatype & \bf Input & \bf Output \\ \hline
xml\_encode\_uint32()      & xds\_uint32\_t   & 4 bytes      & 18--27 bytes \\
xml\_decode\_uint32()      & xds\_uint32\_t*  & 18--27 bytes & 4 bytes      \\[1ex]
xml\_encode\_int32()       & xds\_int32\_t    & 4 bytes      & 16--26 bytes \\
xml\_decode\_int32()       & xds\_int32\_t*   & 16--26 bytes & 4 bytes      \\[1ex]
xml\_encode\_uint64()      & xds\_uint64\_t   & 8 bytes      & 18--37 bytes \\
xml\_decode\_uint64()      & xds\_uint64\_t*  & 18--37 bytes & 8 bytes      \\[1ex]
xml\_encode\_int64()       & xds\_int64\_t    & 8 bytes      & 16--36 bytes \\
xml\_decode\_int64()       & xds\_int64\_t*   & 16--36 bytes & 8 bytes      \\[1ex]
xml\_encode\_double()      & xds\_double\_t   & ?? bytes     & ?? bytes     \\
xml\_decode\_double()      & xds\_double\_t*  & ?? bytes     & ?? bytes     \\[1ex]
xml\_encode\_octetstream() & void*, size\_t   & variable     & variable     \\
xml\_decode\_octetstream() & void**, size\_t* & variable     & variable     \\[1ex]
xml\_encode\_string()      & char*            & variable     & variable     \\
xml\_decode\_string()      & char**           & variable     & variable     \\ \hline
\end{tabular}
\medskip

Please note that the routines xml\_decode\_octetstream() and
xml\_decode\_string() return a pointer to a buffer holding the decoded
data. This buffer has been allocated with \textsf{malloc()} and must be
\textsf{free()}ed by the application when it is not required anymore. All
other callbacks write the decoded value into the location found on the
stack, but these behave differently because the length of the decoded data
is not known in advance and the application cannot provide a buffer that's
guaranteed to suffice.

\section{Frequently Asked Questions}

\subsection{Why do we have separate encoding and decoding modes?}

Some users complained about having to maintain separate XDS contexts for
encoding and decoding. They wondered, why it is not possible to encode and
decode with a single XDS context. The reason is that this limitatiton makes
the XDS context structure and the programmer API for XDS much simpler. If
we were able to use a single context for encoding and decoding, we had to
maintain \emph{two} lists of registered engines per XDS context: One set of
encoding engines and one set of decoding engines. Consequently, the
\textsf{xds\_register()} function would need to take an additional
parameter, which determines whether you're registering an encoding or an
decoding engine. All this is not necessary in the current design, because
one list of registered engines suffices.

Another important topic is buffer management. The buffer handling in
encoding mode is subtly different from that in encoding mode: The XDS
context contains a buffer, the size of that buffer and a kind of ``current
position'' pointer. When an engine stores, say, 8 bytes of encoded data in
the buffer, \textsf{xds\_vencode()} will increase the ``current position''
by 8 bytes --- the next encoding engine will append its encoded data at the
end of the buffer. If the ``current position'' reaches the end of the
buffer, the buffer is reallocated with an appropriately bigger size.

In decoding mode, the same variables in the XDS context have a different
meaning: Since the buffer is never going to be resized, the buffer size
does not correspond to the size of the memory chunk that constitutes the
buffer, it says how many bytes of information the buffer contains; it's the
length of the contents. The ``current position'' is initialized at the
beginning of the buffer and every time an engine claims to have decoded,
say, 8 bytes from the buffer, the ``current position'' is increased by 8
bytes towards the end of the buffer. If the ``current position'' reaches
the end of the buffer's contents, an \textsf{XDS\_UNDERFLOW} error is
returned.

Buffer handling is different in encoding and decoding mode in so far as
that in encoding mode, the initial buffer is empty and the current position
moves with the end of the content, determining where new data should be
appended. In decoding mode, the initial buffer is filled and the current
position wanders from the beginning to the end of the content.

Thus, if an XDS context should be used for both encoding and decoding, the
library had to manage two different buffers because the encoding and
decoding buffers have different semantics. Thus, the
\textsf{xds\_setbuffer()} and \textsf{xds\_getbuffer()} routines would need
an additional parameter in order to set the two buffers independently.

Considering all that, we found that the current design greatly reduces the
complexity of the implementation and of the API while putting the user only
through minimum inconvenience.

\subsection{What are those xds\_int-something types good for?}
\label{xds int stuff}

The XDS library uses the data types \textsf{xds\_int32\_t}, etc. rather
than \textsf{int}. This is necessary because we need to have a definive
size for each data type. In ISO-C, though, the actual size of an
\textsf{int} is undefined. In theory, the system header
\textsf{sys/types.h} defines types with fixed sizes, but unfortunately the
names of these data types vary from vendor to vendor. To solve that, we
defined our own data types. The application programmer might want to take a
look at the top few lines of the \textsf{xds.h} include file to see how the
actual data types are mapped to the \textsf{xds\_xxx\_t} variant.

\subsection{Why do I have to register all the engines manually?}

One idea that came up during the design of the API was to provide a way to
register all elementary XML or XDR engines with a single function call,
something like this:
 %
\begin{quote}
\begin{verbatim}
xds = xds_init(XDS_ENCODE, XDS_XML);   /* Use the XML engines. */
xds = xds_init(XDS_ENCODE, XDS_XDR);   /* Use the XDR engines. */
\end{verbatim}
\end{quote}
 %
The advantage of this approach is that the application developer does not
need to bother about registering some obscure functions like
\textsf{xdr\_encode\_octetstream()}. We dismissed the idea nonetheless for
the following reasons:

\begin{itemize}

\item Since the library is meant to be extensible, the \textsf{xds\_init()}
has no (good) way of knowing which engines actually exist for an encoding
scheme. Suppose someone writes a whole set of engines that implement the
CORBA format, then he would not be able to register his engines without
re-writing \textsf{xds\_init()}.

\item On a similar note, the \textsf{xds\_init()} would not know about the
meta engines required by the application developer. The call outlined above
would only register the engines for the elementary data types, for the very
good reason that the meta engines do not even ``exist'' when the XDS
library is compiled.

\item This approach would make it hard to mix engines from different
formats. If all engines are registered manually, the application programmer
may chose to use the XDR format for encoding all kinds of integers, but to
use the XML format for encoding strings, octet streams, or floating point
numbers.

\item If one routine would reference all engines of an encoding format, it
meant that all engines of that format were linked into the binary once the
application accesses that routine. It would not be possible to, say,
register the engines dealing with integers without pulling the floating
point engines into the program too --- even though nobody uses them.

The author of this document wishes to remark, though, that this property of
the library was later uh \dots{} removed by the decision of the team leader
to merge all engines into one source module per format. Sorry.

\end{itemize}

\begin{thebibliography}{xxx}

\bibitem{xdr} RFC 1832: ``XDR: External Data Representation Standard'',
R.~Srinivasan, August~1995

\bibitem{xml} XML-RPC Home Page: \textsf{http://www.xmlrpc.org/}

\end{thebibliography}

\end{document}
OSSP CVS Repository