general

What is benerator good for?

databene generator initially was built to generate data for load tests, but turns out to become the swiss army knive of data creation:

You can

  • significantly reduce the effort of creating or updating test definitions when data specifications change.
  • execute load tests in early stages of your development phase and get the chance to change critical elements before the project runs off time.
  • save time by reusing data definitions for unit, integration and load testing.
  • set up showcases within hours (and let your working students do some more useful work in the time saved)
  • get more realistic performance measurements with significantly less work.
  • get the chance to perform load tests even in lower-budget or short-term projects.

[top]

Why yet another data generator?

There are many data generators available (see similar products ), but each one lacks features I desired - especially the open source products.

Thus I started the development of benerator in June 2006 with the following goals (and USPs):

  • Creating and importing data in various formats, supporting csv, xml and flat files as well as databases or other systems (this includes import and anonymization of production data).
  • Supporting all major database products out-of-the-box.
  • Automatically importing data constraints from (e.g. database) systems or setup files.
  • Support complex constraints.
  • Creating realistic data. Realistic data must obey business constraints as well as statistical characteristics.
  • Creating mass data. Special issues arise when having to create millions of data sets efficiently.
  • Defining, bundling and reusing generators for data that is specific for a business domain (e.g. addresses, finance).
  • Strongly reducing data definion effort in software testing by using benerator as a common data (generator) repositiory for functional tests, integration test, load tests and showcase setup.
  • Allowing for easy extension of generators, supported systems and file formats and more.
  • Exporting generated data in a way that test runners (like JMeter) can easily use it.

Try to find another test data generator that provides these features. You can start with this list . If you really should find one, please tell and surprise me .

The best thing is: you get all this as open source! OK, the license costs of a commercial generator do not really matter compared to personnel costs, but you can get developer support for benerator and easily fix problems in source code or customize it to your individual needs.

[top]

What is a load test?

Load testing is the process of testing an application's behavior under the load conditions expected for productive use. These conditions are (1) the expected database capacity and (2) the amount of user activity. Load testing is done in three steps:

  1. The system is setup and filled with data that pretends the system has been running for months or years. databene generator is intended to perform this.
  2. A stress test client is used to generate user load. The same tool or another one measures and logs the behavior of the tested system. JMeter is a popular example of stress test clients. databene generator can support stress test tools by providing appropriate data for a request.
  3. The logged system behaviour is evaluated against performace requirements.

[top]

What systems does benerator support?

benerator supports the following databases:

  • Oracle
  • DB2
  • MS SQL Server
  • MySQL
  • PostgreSQL
  • HSQL
  • Derby

If your database is not listed here, try if benerator accepts it anyway. Please report any problems with your database to benerator@databene.org .

The following data file formats can be used for import or export:

  • CSV
  • Flat file
  • DBUnit (input only)
  • Script (output only, e.g. for formatting XML)

[top]

How fast is benerator?

These numbers apply for a common developer notebook (Dell Latitude 620, Dual Core, 2 GHz, 2 GB RAM) with a local database instance running in an own process (MySQL/Oracle):

oracle export (on local system) 3 million entities per hour
csv export: 60 million entities per hour
flat file export: 60 million entities per hour
xml export (via FreeMarker script): 45 million entities per hour

[top]

Why don't you provide a GUI, yet?

benerator is in early stages of development, still much time is needed to improve or finalize functionality. When the benerator engine design is stable, a GUI will be added (not before release 0.4).

[top]

license

Why did you choose this license?

On one side the availability open source software strongly leverages productivity (you can extend the work done by others at no charge) on the other side creating a useful, easy-to-use yet sophisticated tool means a lot of work: Not only creating and using, but also testing, documenting, promoting, planning, bug tracking and more.

If you are very lucky, you are paid for that work. Well, I am not, thus I had to make a decision: Developing benerator with the least effort needed for own use in performance testing or publishing it for common welfare and doing all the extra work in my spare time? What if a company likes benerator, repackages it and resells it? ...or integrates it into its $10,000 product for free? ...while I struggle to pay my rent or salaries of my developers?

I chose this solution:

  • Choosing the GPL license for permitting everybody to use and extend benerator free of charge while preventing him/her to resell it.
  • Restricting some activities around benerator, requiring acquisition of a commercial license (see below).
  • Registering the names benerator and databene as a safety net.
  • Offering services for benerator as support and training for funding further development.

[top]

What does the license mean for me?

The GNU Public License fully applies. You are welcome to do e.g. the following free of charge:

  • use and extend benerator without any restriction, independent of the license that applies for the tested system. You may also use it in consulting services for 3rd parties.
  • redistribute benerator as is, as part of a product or a derivative work that is redistributed under the GPL license only.
  • publish articles, introductions and tutorials about benerator in printed or electronic form.

For redistributing benerator (or derivative works) or selling services in a way that is not covered by the GPL you can obtain a commercial license. Each such license is negotiated individually, a standard gauge for the fee is 10% of the gross revenue from your product, service or book.

[top]

usage

Can I run benerator offline?

Yes, if you adapt the DTD location in the DOCTYPE definition of your benerator XML file

DOCTYPE setup LOCAL "local_path/benerator-0.3.dtd"

[top]

Data generation is too slow. What can I do?

When using default settings and database export, each entity is persisted in a single transaction. You can increase the number of entities created in a transaction by setting the pagesize attribute of the create-entities elements.

Run benerator multithreaded, e.g.

create-entities name="db_user" count="10" pagesize="1000" threads="2"

[top]

How to generate strings with a prefix or suffix?

Use an appropriate regular expression, e.g.

pattern="pre_[a-z]{1,7}_post"

[top]

Why do I get the error message 'Don't know how to handle descriptor'?

As of release 0.3.01, the minimum meta information that benerator needs fore choosing a generator one of the following:

  • type
  • source
  • generator
  • values (if no 'type' is specified, they will be used as strings)
For example, providing a regular expression 'pattern' alone is not enough for string generation setup. You need to explicitly state the type, e.g.
type="string" pattern="[A-Z]{4,8}"

As of release 0.3.02, specifying a pattern alone implies string type.

[top]

How to use another separator for csv i/o?

Create the source or processor as JavaBeans and set their 'separator' property explicitely! Use them by referencing the JavaBean id in the 'source' or 'processor' attribute.

[top]