Typed Variables in DAVE-ML

TypedVars-v0.01.doc - 03/10/04
Giovanni A. Cignoni

[Converted from the original Microsoft Word® document]

This is a draft proposal for supporting specification of types (int, float, bool, ...) in the DAVE-ML variable definitions. In general, the type specification provides more information about the variable. In particular, when the XML source is used to generate code in a language that supports types (C, C++, Java, ...) the specification can be exploited to generate more efficient or more accurate code.

In the current DAVEfunc.dtd (v. 1.7b1) the variableDef element is specified as:

<!ELEMENT variableDef
    (description?, calculation?, isOutput?, uncertainty?)
>
<!ATTLIST variableDef
    name CDATA   #REQUIRED
    varID ID #REQUIRED
    units CDATA   #REQUIRED
    axisSystem CDATA   #IMPLIED
    sign CDATA   #IMPLIED
    alias CDATA   #IMPLIED
    symbol CDATA   #IMPLIED
    initialValue CDATA   #IMPLIED
>
    
<!ELEMENT variableRef EMPTY>
<!ATTLIST variableRef
    varID IDREF   #REQUIRED
>

A straightforward solution is to add a new attribute to VariableDef:

type CDATA #IMPLIED

In the DTD the type specification is set as optional: old files are still valid. In any case it is the application reading the XML that decides to use or to simply ignore the type info.

The order of the attributes is not important. Probably, to improve human readability, it will be useful to suggest a conventional order, may be varID (as it is important as a reference), name, type, units, initialvalue, , but this of course is land of personal taste.

It has to be discussed if the value of the type attribute has to be free or constrained to an enumerated set of values. The main problem of an enumerated set of values is backward compatibility: changing the set in the future may make illegal a number of already existent files. It is important to define a "good" initial set so that future changes, if any, will only add new types.

In the following it is prposed a possible set of values for the type attribute, and, as examples, their interpretation in C/C++ and Java.

*Attribute value*	*Interpretation*	*Example C/C++*	*Example Java*
int	Default integer	int	int
int8	Signed 8 bit integer	char	byte
int16	Signed 16 bit integer	short	short
int32	Signed 32 bit integer	int or long	int
int64	Signed 64 bit integer	long	long
uint8	Unsigned 8 bit integer	unsigned char	short
uint16	Unsigned 16 bit integer	unsigned short	int
uint32	Unsigned 32 bit integer	unsigned int or long	long
uint64	Unsigned 64 bit integer	unsigned long	long *
float	Default floating point	float	float
float32	32 bit floating point	float	float
float64	64 bit floating point	double	double
float96	96 bit floating point	long double *	double *
float128	128 bit floating point	long double *	double *
bigint	Multiple precision integer, such as in the GMP library	mpz_t (GMP)	-
bigratio	Multiple precision fraction, such as in the GMP library	mpq_t (GMP)	-
bigfloat	Floating point with arbitrary precision mantissa and limited precision exponent, such as in the GMP library	mpf_t (GMP)	-
bool	Boolean	bool (std. typedef)	boolean
text8	Text made by ASCII characters	string (std. C++ lib)	string
text16	Text made by characters in an extended coding (e.g. unicode)	-	string

Notes

Apart of general types with unspecified bit size, size is explicit in the type name. It seems the better way to clearly express the will of the author of the model. Float 96 and 128 are included because there are architectures that implement extended precision in three words (or little less) and other that do in four.

Big numbers, booleans and text types are included in the list for completeness, it has to be decided if they are useful, maybe they can be discarded.

In some cases (maked with a *) the translation in a specific language may result in a loss of data. For instance Java does not have unsigned integers, then an unsigned long cannot always be correctly converted in a long. But this is a problem of the target language.