overview

There are quite different scenarios in which you might use benerator:

  • Synthesizing production data for preparing load and performance tests
  • Creating, anonymizing and restoring production database snapshots
  • Create batch files for performance tests
  • Feed a load runner with useful client request data
  • Do any generation or ETL task by writing programs that use the benerator api

    For the first three steps you can run benerator from the command line

installation

  • download the lastest version of databene benerator at the download page
  • unzip the downloaded file into a directory. Create an environment variable BENERATOR_HOME that points to this path.
  • if you want to use a database, download the jdbc driver appropriate for your database and put it into a directory, e.g. root/lib
  • edit the file root/benerator_common[.bat] and add the necessary file to the classpath setup
  • under Unix/Linux/Mac, open a shell and call chmod a+x bin/*.sh
set LOCALCLASSPATH=%LOCALCLASSPATH%;lib\mydriver.jar

benerator

benerator can execute a setup file from the command line, e.g.

benerator my_benerator_job.ben.xml

As a naming convention, all benerator setup files should have the suffix .ben.xml. For information on the setup file format, see the file format page.

quick start

If you want to get quick a quick impression of what benerator is good for and how it is used:

  1. Check the presentation
  2. read this document
  3. read the demo files in the distribution's demo directory.
  4. check the API doc for a detailed documentation of the provided generators, especially the domain packages.

db snapshot tool

The db snaphot tool creates a database snapdhot in DbUnit xml file format. You can use it to easily create a basic configuration for your load test system: Create the database tables with content, make a snapshot and import it in data generation by using the DbUnitEntityImporter.

If you have run the populate_db.oracle.xml demo file, you can extract a snapshot by the shell file snapshot :

snapshot -Ddb.url=jdbc:oracle:thin:@localhost:1521:XE \
  -Ddb.driver=oracle.jdbc.driver.OracleDriver \
  -Ddb.user=benerator -Ddb.password=benerator -Dfile.encoding=UTF-8

xml file generation

You can provide benerator with an XML schema file and have it automatically create XML files that match the schema.

Assume you would use an XML schema file 'demo/shop/product-simple.xsd' with the following definitions:

<xs:simpleType name="ean13-type">
    <xs:restriction base="xs:string">
        <xs:pattern value="[0-9]{13}" />
    </xs:restriction>
</xs:simpleType>

<xs:simpleType name="price-type">
    <xs:restriction base="xs:decimal">
        <xs:minInclusive value="0"/>
        <xs:totalDigits value="8" />
        <xs:fractionDigits value="2" />
    </xs:restriction>
</xs:simpleType>

<xs:element name="product">
    <xs:complexType>
        <xs:attribute name="ean_code" type="ean13-type" use="required"/>
        <xs:attribute name="name" type="xs:string" use="required"/>
        <xs:attribute name="price" type="price-type" use="required"/>
        <xs:attribute name="manufacturer" type="xs:string" use="required"/>
    </xs:complexType>
</xs:element>

you could then create an XML file from it by invoking

 bin/createXML demo/shop/product-simple.xsd product product-{0}.xml 1 

from the root directory of your distribution, which would result in a file 'product-1.xml' like this

<?xml version="1.0" encoding="UTF-8"?>
<product elementFormDefault="unqualified"
        ean_code="1600604358820" 
        price="5" 
        manufacturer="GLMCKFTXLIIOSGIUNORKTLCUQ">
    <name>PMOQFMXJLZUIHTQTW</name>
</product>

This is valid according to the XML schema, but not for an application. You can add generator setup in XML schema annotations, e.g.:

<xs:simpleType name="ean13-type">
    <xs:annotation><xs:appinfo>
        <ben:type generator="org.databene.domain.product.EAN13Generator"/>
    </xs:appinfo></xs:annotation>
    <xs:restriction base="xs:string">
        <xs:pattern value="[0-9]{13}" />
    </xs:restriction>
</xs:simpleType>

<xs:simpleType name="price-type">
    <xs:annotation><xs:appinfo>
        <ben:type min="0.49" max="99.99" precision="0.10" distribution="cumulated"/>
    </xs:appinfo></xs:annotation>
    <xs:restriction base="xs:decimal">
        <xs:minInclusive value="0"/>
        <xs:totalDigits value="8" />
        <xs:fractionDigits value="2" />
    </xs:restriction>
</xs:simpleType>

<xs:element name="product">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="name" type="xs:string">
                <xs:annotation><xs:appinfo>
                    <ben:type values="Apples,Bananas,Cherries"/>
                </xs:appinfo></xs:annotation>
            </xs:element>
        </xs:sequence>
        <xs:attribute name="ean_code" type="ean13-type" use="required"/>
        <xs:attribute name="price" type="price-type" use="required"/>
        <xs:attribute name="manufacturer" type="xs:string" use="required">
            <xs:annotation><xs:appinfo>
                <ben:part pattern="[BDFGH][aeiou][lr][tpmn] (Inc\.|Corp\.)"/>
            </xs:appinfo></xs:annotation>
        </xs:attribute>
    </xs:complexType>
</xs:element>

With this approach, you can annotate even your production XML schemas safely with benerator configuration without danger of interfering with other applications!

This setup file sets the following configuration:

  • any value of type ean13-type is created by a custom generator of class 'org.databene.domain.product.EAN13Generator'
  • any created price is 0.49 + x * 0.10. The maximum value is 99.99. The probability distribution is 'cumulated' which means, values in the middle of the range (50) are more frequent than at the ends.
  • for the product element, all name elements have one the values: Apples, Bananas, Cherries
  • Each generated manufacturer attribute matches the regular expression '[BDFGH][aeiou][lr][tpmn] (Inc\.|Corp\.)'

Running benerator with the new schema file creates e.g.

<?xml version="1.0" encoding="UTF-8"?>
<product elementFormDefault="unqualified"
        ean_code="7693659353226"
        price="50.69"
        manufacturer="Bert Corp.">
    <name>Cherries</name>
</product>

Download the distribution and have a look at the file demo/shop/shop.xsd for a more detailed impression. Check the XML and file format documentation for a complete overview.

flat file generation sample

for generating files from a configuration file, Benerator can be run from a setup file. See the following example:

  <?xml version="1.0" encoding="iso-8859-1"?>
  <!DOCTYPE setup SYSTEM "http://databene.org/benerator-0.3.dtd">
  <setup>
      <create-entities name="product" count="3">
          <attribute name="id" type="long" min="1" max="1000000000" distribution="step"/>
          <attribute name="ean_code" generator="org.databene.domain.product.EANGenerator"/>
          <attribute name="name" type="string" minLength="5"/>
          <attribute name="price" type="big_decimal" min="0.49" max="99.99" precision="0.10"/>
          <consumer class="org.databene.platform.flat.FlatFileEntityExporter">
              <property name="uri" value="products.flat"/>
              <property name="properties" value="id[8r0],ean_code[13],name[20],price[8r0]"/>
          </consumer>
      </create-entities>
  </setup>

The file has a central create-entities element which tells benerator to create three entities of type 'product'. An entity is the word, that benerator uses for business objects or, more general, composite data objects. Entities are composed of attributes .

The attributes are the data components of an entity. The attribute elements above describe, how the attributes should be generated:

  • id is generated as continuous long values from 1 to 1000000000
  • ean_code is created by a distinct 'Generator' class. This is the first place where you can insert plugins. You will learn more about it later
  • name is an arbitrary string of lenght 5
  • price is a big decimal value between 0.49 and 99.99 (both inclusive), with a precision of 0.10. This means the values 0.49, 0.59, 0.69, ..., 99.79, 99.89, 99.99

The consumer element tell benerator to instantiate the JavaBean class org.databene.platform.flat.FlatFileEntityExporter and have it process each generated entity. The entities will be written to a file 'products.flat' using a flat file format, rendering the columns with fixed with, alignment and padding character:

id[8r0] means that the attribute 'id' is padded to eight characters, aligned to the right and padded with '0' characters.

ean_code[13] means padding to 13 characters using default alignment (left) and character (' ')

(For a more detailed explanation of the file format, see file_format.html )

So, when running the example from the root directory of your benerator installation on a Unix system via

 bin/benerator demo/file/create_flat.ben.xml

or, on Windows, via

 bin\benerator demo\file\create_flat.ben.xml

benerator will create a flat file 'transactions.flat' like this:

 00000001800035300638600009.850006
 00000002800035000334000002.490018
 00000003800035300638600009.850010
 00000004807680000008500000.890022
 00000005807680000008500000.890024
 00000006807680000008500000.890024
 ...

(When running offline, you will need to remove the DOCTYPE declaration from the file demo/file/create_flat.ben.xml)

database population sample

A shop demo shows how to fill database schemas based on a setup file.

The shop database schema is as follows:

You can find it in the benerator distribution . It supports seven major databases:

  • Oracle (oracle)
  • DB2 (db2)
  • MS SQL Server (sql_server)
  • MySQL (mysql)
  • PostgreSQL (postgres)
  • HSQL (sql)
  • Derby (derby)

If you have a look at the directory demo/shop of you distribution, you will find

  • the main file shop.ben.xml and sub directories named with the database identifiers above (e.g. postgres for PostgreSQL).
  • the files shop.stage .properties files, e.g. shop.development.properties and shop.perftest.properties
  • sub directories like oracle with three files each:
    • shop.database .properties with the setup variables specific for this database
    • create_tables.database .sql
    • drop_tables.database .sql

Running the demo

Time to get going:

  • Choose a database for your first steps, e.g. oracle
  • Install benerator, the database and an appropriate jdbc driver
  • start the database and create or select a schema
  • edit the file BENERATOR_HOME/demo/shop/oracle/shop.oracle.properties and set the correct database url, driver, schema, user and password:
     db_uri=jdbc:oracle:thin:@localhost:1521:XE
     db_driver=oracle.jdbc.driver.OracleDriver
     db_user=benerator
     db_password=benerator
     db_schema=benerator
     id_strategy=seqhilo
     id_param=seq_id_gen
  • now go to the root directory of your benerator installation and run (from the command line):
    Windows:
      set BENERATOR_OPTS=-Dstage=development -Ddatabase=hsql
      bin\benerator demo\shop\shop.ben.xml
    *ix:
      export BENERATOR_OPTS=-Dstage=development -Ddatabase=hsql
      bin/benerator demo/shop/shop.ben.xml

    If you've done everything right, benerator will now populate your schema with some entities.

    If you want to test performance, you can use the perftest stage, editing the file shop.perftest.properties and starting benerator with -Dstage=perftest. In the shop.perftest.properties you can adapt the settings:

     product_count=5000
     customer_count=10000
     orders_per_customer=3
     items_per_order=3

If you have a look at the file shop.ben.xml, you see how the pieces are put together:

First, the -Dstage and -Ddatabase parameters are evcaluates for importing the corresponding properties files:

<include uri="{demo/shop/${database}/shop.${database}.properties}" />
<include uri="{demo/shop/shop.${stage}.properties}" />

The following echo elements print some settings to the console, e.g.

<echo message="{  ${product_count + 6} products}" />

After that, the database is defined with the id 'db'.

A Task class is instantiated by JavaBean mechanisms and executed, running an SQL script on the database 'db'.

<run-task class="org.databene.platform.db.RunSqlScriptTask">
        <property name="uri" value="{demo/shop/${database}/create_tables.${database}.sql}" />
        <property name="db" ref="db" />
</run-task>

A DbUnit file is used for creating a basic predefined setup that may serve for unit tests and regression tests:

<create-entities source="demo/shop/shop.dbunit.xml" consumer="db" />

Let's skip the following examples, you have got the idea from the flat file generation sample .

Just one more create-entities example (the last one of the file and the most sophisticated):

<!-- create order items -->
<create-entities name="db_order_item"
        count="{ftl:${customer_count * orders_per_customer * items_per_order}}" 
        consumer="db">
    <variable name="product" source="db" selector="select ean_code, price from db_product" 
        distribution="cumulated" />
    <id name="id" strategy="{${id_strategy}}" source="db" param="{${id_param}}" />
    <attribute name="number_of_items" min="1" max="27" distribution="cumulated" />
    <attribute name="order_id" source="db" selector="select id from db_order" cyclic="true" />
    <attribute name="product_ean_code" script="{${product[0]}}" />
    <attribute name="total_price" 
        script="{${(product[1] * db_order_item.number_of_items)?c}}" />
</create-entities> 

The attribute count ="{ftl:${customer_count * orders_per_customer * items_per_order}}" tells benerator to use a scripting engine (FreeMarker, registered as 'ftl') for calculating the number of entities to create.

The id is generated by one of the predefined strategies by $id_strategy and $id_param ), parameters which are set in the database properties file.

A variable product is used for querying the db_product database table, extracting ean_code and price from all rows, buffering them in memory and providing them with a probability distribution of bell shape: The middle products will be used frequently, the first and last products just rarely

The attribute total_price is determined by evaluating a script '$(product[1] * db_order_item.number_of_items)?c ' with the default script engine. This multiplies column 1 of the 'product' variable (the price column) with the number_of_items attribute of the current entity and renders it for a 'computer audience' because of '?c'. Otherwise, you might get locale.dependent formatting, which would confuse benerator.

benerator api

If your Task is very special, you might need to learn the benerator api and plumb things together as you need.

A Generator is an object responsible for creating objects. You can instantiate most predefined Generators by the GeneratorFactory:

Generator<String> salutation = GeneratorFactory.getSampleGenerator("Hi", "Hello", "Howdy");
Generator<String> name = GeneratorFactory.getSampleGenerator("Alice", "Bob", "Charly");
for (int i = 0; i < 5; i++)
    System.out.println(salutation.generate() + " " + name.generate());
salutation.close();
name.close();

First, two generators are created by the GeneratorFactory, used 5 times by invoking generate() and finally closed.

This code will print out something similar to the following text:

Hi Charly
Howdy Bob
Hello Alice
Hi Charly
Hello Bob

PersonGenerator

This is an example of a domain generator. This one creates person data with title, address and phone numbers:

run org.databene.benerator.demo.PersonDemo

PersonGenerator generator = new PersonGenerator();
for (int i = 0; i < 3; i++)
    System.out.println(generator.generate());
generator.close();

It evaluates the default locale, find out the country it belongs to and create and print addresses for this country, all with consistent salutation, given name, zip code, city and phone number!

Here's an example output for Germany:

Frau Helga Schmidt, *25.07.1956
Herr Dr. Johann Weber, *18.10.1975
Herr Prof. Dr. Richard Lange, *18.05.1936

Domains support localization: When using

PersonGenerator generator = new PersonGenerator(Country.UNITED_KINGDOM, Locale.ENGLISH);

the output looks like this:

  Mrs. Dr. Ellie Thomas, *18.04.1991
  Mr. Harry Smith, *05.05.1961
  Mr. Harry Thompson, *07.10.1905

AddressGenerator

Here's an example of using the address generator:

  public class AddressDemo {
      public static void main(String[] args) {
        AddressGenerator generator = new AddressGenerator(Country.US);
        for (int i = 0; i < 3; i++) {
            Address address = generator.generate();
                        System.out.println(address);
            System.out.println("phone: " + address.getPrivatePhone());
            System.out.println("fax: " + address.getFax());
            System.out.println();
        }
  }

It creates addresses:

35 College Street
Loysville, PA 17047
United States
phone: +1-717-3534478
fax: +1-717-5975441

11 Fourth Street
Triangle, VA 22172
United States
phone: +1-703-6005196
fax: +1-703-5068030

33 Church Street
Elkhorn, WI 53121
United States
phone: +1-414-5832000
fax: +1-414-7840306

Currrently only US and German addresses are supported.

Regex Generator

Strings can be created by providing a regular expression, e.g. for creating phone numbers:

run org.databene.benerator.demo.RegexDemo

Here's an excerpt from the RegexDemo :

String PHONE_PATTERN = "\\+[1-9][0-9]{1,2}/[1-9][0-9]{0,4}/[1-9][0-9]{4,8}";
Generator<String> phoneGenerator = GeneratorFactory.getRegexStringGenerator(PHONE_PATTERN, 1, 16, null, 0);
for (int i = 0; i < 5; i++)
    System.out.println(phoneGenerator.generate());
phoneGenerator.close();

which creates output like this:

+802/85/810794
+92/7261/622937
+61/95/31527
+30/258/71172783
+59/755/8861307

custom file builder

The PersonXMLBuilderDemo demonstrates the usage of the FileBuilder and the ScriptBasedDocumentWriter for creating xml files with person data.

Running the demo

run org.databene.benerator.demo.PersonXMLBuilderDemo

will cause creation of the file persons.xml with the following content:

  <?xml version="1.0" encoding="iso-8859-1"?>
  <persons>
    <person number="1">
      <salutation>Frau</salutation>
      <title>Dr.</title>
      <givenName>Renate</givenName>
      <familyName>Walter</familyName>
    </person>
    <person number="2">
      <salutation>Herr</salutation>
      <title></title>
      <givenName>Markus</givenName>
      <familyName>Schneider</familyName>
    </person>
  </persons>

Now, find out, how it this is achieved: The core code of PersonXMLBuilderDemo creates a ScriptBasedDocumentWriter that assembles three FreeMarker templates (*.ftl) and a Writer (out). This object is used by the FileBuilder for creating 2 objects by the PersonGenerator and serializing them to the writer out .

Writer out = new BufferedWriter(new FileWriter("persons.xml));
ScriptBasedDocumentWriter<Person> writer = new ScriptBasedDocumentWriter<Person>(
    out,
    "org/databene/benerator/demo/xmlHeader.ftl",
    "org/databene/benerator/demo/xmlPart.ftl",
    "org/databene/benerator/demo/xmlFooter.ftl"
);
FileBuilder.build(new PersonGenerator(), 2, writer);
IOUtil.close(out);

The templates that rendered the output are:

org/databene/benerator/demo/xmlHeader.ftl

  <?xml version="1.0" encoding="iso-8859-1"?>
  <persons>

org/databene/benerator/demo/xmlPart.ftl

<person number="${var.part_index + 1}">
    <salutation>${part.salutation}</salutation>
    <title>${part.title!}</title>
    <givenName>${part.givenName}</givenName>
    <familyName>${part.familyName}</familyName>
</person>

org/databene/benerator/demo/xmlFooter.ftl

  </persons>

distribution demo

Running the distribution demo visualizes the predefined sequences - the generated values are displayed from left to right, the amounts are indicated vertically. See org.databene.benerator.demo.DistributionDemo