The Syntho Engine’s supported data

Preserve referential integrity in an entire relational data ecosystem

Syntho supports any form of tabular data

Syntho supports any form of tabular data and also supports complex data types. Tabular data is a type of structured data that is organized in rows and columns, typically in the form of a table. Most of the times, you see this type of data in databases, spreadsheets, and other data management systems.

Time series data

Generate synthetic datasets to simulate time-based patterns for analysis and forecasting.

Multi-table datasets & databases

Create synthetic datasets across multiple tables to replicate complex database structures.

Any language (Dutch, English etc.)

Generate data in any language for diverse applications.

Any alphabet (English, Chinese, Japanese etc.)

Produce data in various alphabets for global use.

Geographical location data (like GPS)

Generate realistic GPS and location-based data for geospatial applications.

Check our User Documentation here

Large multi-table datasets & databases

Syntho supports for large multi-table datasets and databases. Also for multi-table datasets and databases, we maximize the data accuracy for every synthetic data generation job and demonstrate this via our data quality report. In addition, the SAS data experts assessed and approved our synthetic data from an external point of view.

We optimized our platform to minimize computational requirements (e.g. no GPU required), without compromising on the data accuracy. In addition, we support auto scaling, so that one can synthesize huge databases.

Specifically for multi-table datasets and databases, we automatically detect the data types, schemas and formats to maximize data accuracy. For multi-table database, we support automatic table relationship inference and synthesis to preserve the referential integrity. Finally, we support for comprehensive table and column operations so that you can configure your synthetic data generation job, also for multi-table datasets and databases.

Preserved referential integrity

Syntho supports automatic table relationship inference and synthesis. We automatically infer and generate primary and foreign keys that reflect your source tables and safeguard relationships throughout your databases and across different systems to preserve the referential integrity. Foreign key relationships are automatically captured from your database to preserve referential integrity. Alternatively, one can run a scan to scan for potential foreign key relations (when foreign keys are not defined in the database, but for example in the application layer) or one can add them manually.

Comprehensive table and column operations

Synthesize, duplicate or exclude tables or columns to your preference. When you synthesize a database with multiple tables, one typically would like to be able to configure the synthetic data generation job to include and / or exclude the desired combination of tables.

Table modes:

  • Synthesize: Use AI to synthesize the table
  • Duplicate: De-identify personally identifiable information (PII) or duplicate the table 
  • Exclude: Exclude the table from the target database 
Syntho Engine platform table modes

Time series data support

Syntho supports also for time series data. time series data is a type of data that is collected and organized in chronological order, with each data point representing a specific point in time. This type of data is commonly used in many sectors. This could for example be in finance (for example with customers making transactions) or in healthcare (where patients undergo procedures), and many others where trends and patterns over time are important to understand.

Time series data can be collected at regular or irregular intervals. The data can be univariate, consisting of a single variable such as temperature, or multivariate, consisting of multiple variables that are measured over time, such as a stock portfolio’s value or a company’s revenue and expenses.

Analyzing time series data often involves identifying patterns, trends, and seasonal fluctuations over time, as well as making predictions about future values based on past data. The insights gained from analyzing time series data can be used for a wide range of applications, such as forecasting sales, predicting the weather, or detecting anomalies in a network. Hence, supporting for time series data is often required when synthesizing data.

Supported types of time series data

Auto-correlations are included in our quality assurance report

Overview of data that Syntho supports

Data Type Description Example
Integer A whole number without any decimal places, either positive or negative 42
Float A decimal number with either a finite or infinite number of decimal places, either positive or negative 3,14
Boolean A binary value True or false, yes or no etc.
String A sequence of characters, such as letters, digits, symbols, or spaces, that represent text, categories or other data "Hello, world!"
Date/Time A value representing a specific point in time, either a date, a time, or both (any data/time format is supported) 2023-02-18 13:45:00
Object A complex data type that can contain multiple values and properties, also known as a dictionary, map, or hash table { "name": "John", "age": 30, "address": "123 Main St." }
Array An ordered collection of values of the same type, also known as a list or vector [1, 2, 3, 4, 5]
Null A special value representing the absence of any data, often used to indicate a missing or unknown value null
Character A single character, such as a letter, digit, or symbol 'A'
Any other Any other form of tabular data is supported

User documentation

Access Syntho’s User Documentation!