databene

 
  • Increase font size
  • Default font size
  • Decrease font size

XML Schema support

As of version 0.5.0 benerator provides support for serious XML generation:

  • benerator imports data and constraint definitions from XML schema files
  • exported XML data files are formatted according to an XML schema
  • benerator setup can be included in XML schema annotations without interfering with other applications

introduction

You can provide benerator with an XML schema file and have it automatically create XML files that match the schema.

First of all, care for proper namespaces in your schema definition:

<xs:schema
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:ben="http://databene.org/benerator-0.6.0.xsd"
    xmlns="http://myCompany/myProduct/mySchema-version.xsd"
    targetNamespace="http://myCompany/myProduct/mySchema-version.xsd"
    elementFormDefault="qualified">

 

The 'ben' namespace will be used for benerator-specific schema annotations later.

Assume you would use an XML schema file 'demo/shop/product-simple.xsd' with the following definitions:

...

<xs:simpleType name="ean13-type">
     <xs:restriction base="xs:string">
        <xs:pattern value="[0-9]{13}" />
    </xs:restriction>
</xs:simpleType>

<xs:simpleType name="price-type">
    <xs:restriction base="xs:decimal">
        <xs:minInclusive value="0"/>
        <xs:totalDigits value="8" />
        <xs:fractionDigits value="2" />
    </xs:restriction>
</xs:simpleType>

<xs:element name="product">
    <xs:complexType>
        <xs:attribute name="ean_code" type="ean13-type" use="required"/>
        <xs:attribute name="name" type="xs:string" use="required"/>
        <xs:attribute name="price" type="price-type" use="required"/>
        <xs:attribute name="manufacturer" type="xs:string" use="required"/>
    </xs:complexType>
</xs:element>
...

 

 

 

you could then create an XML file from it by invoking

 createXML demo/shop/product-simple.xsd product product-{0}.xml 1 

from the root directory of your distribution, which would result in a file 'product-1.xml' like this

<?xml version="1.0" encoding="UTF-8"?>
<product elementFormDefault="unqualified"
        ean_code="1600604358820"
        price="5"
        manufacturer="GLMCKFTXLIIOSGIUNORKTLCUQ">
     <name>PMOQFMXJLZUIHTQTW</name>
</product>

This is valid according to the XML schema, but not for an application. You can add generator setup in XML schema annotations, e.g.:

<xs:simpleType name="ean13-type">
    <xs:annotation><xs:appinfo>
        <ben:type generator="org.databene.domain.product.EAN13Generator"/>
    </xs:appinfo></xs:annotation>

    <xs:restriction base="xs:string">
        <xs:pattern value="[0-9]{13}" />
    </xs:restriction>
</xs:simpleType>

<xs:simpleType name="price-type">
    <xs:annotation><xs:appinfo>
        <ben:type min="0.49" max="99.99" precision="0.10" distribution="cumulated"/>
    </xs:appinfo></xs:annotation>

    <xs:restriction base="xs:decimal">
        <xs:minInclusive value="0"/>
        <xs:totalDigits value="8" />
        <xs:fractionDigits value="2" />
    </xs:restriction>
</xs:simpleType>

<xs:element name="product">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="name" type="xs:string">
                <xs:annotation><xs:appinfo>
                   <ben:type values="Apples,Bananas,Cherries"/>
                </xs:appinfo></xs:annotation>

            </xs:element>
        </xs:sequence>
        <xs:attribute name="ean_code" type="ean13-type" use="required"/>
        <xs:attribute name="price" type="price-type" use="required"/>
        <xs:attribute name="manufacturer" type="xs:string" use="required">
            <xs:annotation><xs:appinfo>
                <ben:part pattern="[BDFGH][aeiou][lr][tpmn] (Inc\.|Corp\.)"/>
            </xs:appinfo></xs:annotation>

        </xs:attribute>
    </xs:complexType>
</xs:element>

 

 

With this approach, you can annotate even your production XML schemas safely with benerator configuration without danger of interfering with other applications!

This setup file sets the following configuration:

  • any value of type ean13-type is created by a custom generator of class 'org.databene.domain.product.EAN13Generator'
  • any created price is 0.49 + x * 0.10. The maximum value is 99.99. The probability distribution is 'cumulated' which means, values in the middle of the range (50) are more frequent than at the ends.
  • for the product element, all name elements have one the values: Apples, Bananas, Cherries
  • Each generated manufacturer attribute matches the regular expression '[BDFGH][aeiou][lr][tpmn] (Inc\.|Corp\.)'

Running benerator with the new schema file creates e.g.

<?xml version="1.0" encoding="UTF-8"?>
<product elementFormDefault="unqualified"
        ean_code="7693659353226"
        price="50.69"
        manufacturer="Bert Corp.">
     <name>Cherries</name>
</product>

Download the distribution and have a look at the file demo/shop/shop.xsd for a more detailed impression. Check the file format documentation for more a complete overview of configuration options.

 

Limitations

  • No support for recursion of the same element type, e.g. categories containing other categories
  • No support for mixed content. benerator is concerned with generation of data structures, while mixed-type documents generally apply for natural-language documents.
  • groups are not supported yet
  • sequences must not have maxOccurs > 1 yet
  • namespace support is only basic, problems may arise on different types with equal names
  • schema include is not supported yet