features
generic approach
benerator is not limited to a special kind of system or platform - XML
and relational (database) data
are supported yet, Java annotation support will come soon. The abstract data model allows application to virtually any specific technology for representing data. The long term goal is to support all major standards and standard applications (web services, SAP, Siebel, ...)
easy design, implementation and usage
of data providers for load testing: The process of designing and generating valid complex load test data is reduced from weeks to days. Even better: smaller-budget projects now get the chance to load-test at all!
provides for high volume processing
: benerator is designed to process and create resources of unlimited size.
efficient operation
:
- the minimum requirement for any generation feature is to generate at least one million objects per hour on common development hardware.
- benerator can run multithreaded, making efficient use of multi-core systems.
- benerator's database access is highly optimized, supporting persistence of several thousand rows per second.
domain packages
provide for easy localized and regionalized creation of commonly used entities:
- address
: Street, house number, zip code city name, country, phone number.
- person
: names, titles, salutations, address.
- further domain packages
are planned and developed on demand or posibility.
data quality assurance
- supports single and multi-field constraints
e.g. generating consistent values for a person's gender, salutation and first name.
- ability to validate generated data
: Data will be generated according to the constraints definitions. If the tested application uses secret knowledge for input validation, a custom validator may be plugged in to filter out inadequate data sets, e.g. for validating addresses against a postal database.
ease of use for programmers
: APIs are provided or planned for the following purposes:
- dynamic data creation or access
for stress test applications (planned).
- command line invocation
for continuous integration (planned).
- Providing an initial database setup for application deployment
(planned).
- Providing and ensuring consistent data for unit tests
(planned).
component based, easily extensible API
- Predefined generators
provide generation of simple data types, arrays, collection and strings that match regular expressions
- extensibility by custom generators
: A clear component contract for generators provides for easy implementation of custom generators and clean life cycle and resource management.
- internationalization
: Generated data can be converted with different formats (like time values) or different languages (like salutations or titles).
- dataset concept
: Data can be categorized and grouped hierarchically (e.g. cities of a state, country or continent).
customizability
- accepts input in multiple formats from multiple sources
: Specifying a data model is easy. A multitude of generator mechanisms is provided, like file or database import, regular expressions generators, sample lists, distribution functions and different input formats.
- provides output in multiple formats at the same time (planned)
: Since generated information later may not be retrievable from the target systems (e.g. pin numbers), simultaneous output into multiple databases should be provided (e.g. users into database and csv file). A plugin mechanism for data output should be provided to store data in other systems (e.g. LDAP) or file format (e.g. proprietary formats).
- import of complex data
(planned): Import of entites (or -better- entity graphs) from databases and files.
- offers powerful randomization options
and is extendable by custom ones.
- supports grouping of data into hierarchical data sets. data sets may overlap and form several parallel types of hierarchy.
data generation from scratch
import and anonymization of production data
: Existing data can be imported and anonymized by overwriting certain attributes with generated data.
little dependency to external libraries
: For maximum compatibility with the runtime environment, the use of 3rd party tools is avoided where possible.
- FreeMarker is not required for operation (unless you relly need to use FreeMarker templates.
- commons-logging is required but actually used to increase plattform independence by allowing to plug in to different logging infrastructures.
Support for all major databases
- all common SQL data types are supported (for a list of unsupported types, see limitations
)
- benerator was tested with and provides examples for
- Oracle 10g (thin driver)
- DB2
- MS SQL Server
- MySQL 5
- PostgreSQL 8.2
- HSQL 1.8
- Derby 10.3
- At least Java 5.0 required
- The following SQL types are not
supported:
- Types.ARRAY
- Types.DISTINCT
- Types.NULL
- Types.STRUCT
- API not final
- Database persistence supports only inserts, no updates of pre-existing or previously persisted data.
- Sequence concept is not final, yet