The Syntho Engine’s supported data
Preserve referential integrity in an entire relational data ecosystem
Syntho supports any form of tabular data
Syntho supports any form of tabular data and also supports complex data types. Tabular data is a type of structured data that is organized in rows and columns, typically in the form of a table. Most of the times, you see this type of data in databases, spreadsheets, and other data management systems.
Time series data
Generate synthetic datasets to simulate time-based patterns for analysis and forecasting.
Multi-table datasets & databases
Create synthetic datasets across multiple tables to replicate complex database structures.
Any language (Dutch, English etc.)
Generate data in any language for diverse applications.
Any alphabet (English, Chinese, Japanese etc.)
Produce data in various alphabets for global use.
Geographical location data (like GPS)
Generate realistic GPS and location-based data for geospatial applications.
Check our User Documentation here
Large multi-table datasets & databases
Syntho supports for large multi-table datasets and databases. Also for multi-table datasets and databases, we maximize the data accuracy for every synthetic data generation job and demonstrate this via our data quality report. In addition, the SAS data experts assessed and approved our synthetic data from an external point of view.
We optimized our platform to minimize computational requirements (e.g. no GPU required), without compromising on the data accuracy. In addition, we support auto scaling, so that one can synthesize huge databases.
Specifically for multi-table datasets and databases, we automatically detect the data types, schemas and formats to maximize data accuracy. For multi-table database, we support automatic table relationship inference and synthesis to preserve the referential integrity. Finally, we support for comprehensive table and column operations so that you can configure your synthetic data generation job, also for multi-table datasets and databases.
Preserved referential integrity
Syntho supports automatic table relationship inference and synthesis. We automatically infer and generate primary and foreign keys that reflect your source tables and safeguard relationships throughout your databases and across different systems to preserve the referential integrity. Foreign key relationships are automatically captured from your database to preserve referential integrity. Alternatively, one can run a scan to scan for potential foreign key relations (when foreign keys are not defined in the database, but for example in the application layer) or one can add them manually.
Comprehensive table and column operations
Synthesize, duplicate or exclude tables or columns to your preference. When you synthesize a database with multiple tables, one typically would like to be able to configure the synthetic data generation job to include and / or exclude the desired combination of tables.
Table modes:
- Synthesize: Use AI to synthesize the table
- Duplicate: De-identify personally identifiable information (PII) or duplicate the table
- Exclude: Exclude the table from the target database
Time series data support
Syntho supports also for time series data. time series data is a type of data that is collected and organized in chronological order, with each data point representing a specific point in time. This type of data is commonly used in many sectors. This could for example be in finance (for example with customers making transactions) or in healthcare (where patients undergo procedures), and many others where trends and patterns over time are important to understand.
Time series data can be collected at regular or irregular intervals. The data can be univariate, consisting of a single variable such as temperature, or multivariate, consisting of multiple variables that are measured over time, such as a stock portfolio’s value or a company’s revenue and expenses.
Analyzing time series data often involves identifying patterns, trends, and seasonal fluctuations over time, as well as making predictions about future values based on past data. The insights gained from analyzing time series data can be used for a wide range of applications, such as forecasting sales, predicting the weather, or detecting anomalies in a network. Hence, supporting for time series data is often required when synthesizing data.
Supported types of time series data
- Time series with equal interval
- Time series with un-equal interval
- Time series with equal lengths
- Time series with un-equal lengths
- Time series with missing observations
- Huge time serie strings
Auto-correlations are included in our quality assurance report
Overview of data that Syntho supports
Data Type | Description | Example |
---|---|---|
Integer | A whole number without any decimal places, either positive or negative | 42 |
Float | A decimal number with either a finite or infinite number of decimal places, either positive or negative | 3,14 |
Boolean | A binary value | True or false, yes or no etc. |
String | A sequence of characters, such as letters, digits, symbols, or spaces, that represent text, categories or other data | "Hello, world!" |
Date/Time | A value representing a specific point in time, either a date, a time, or both (any data/time format is supported) | 2023-02-18 13:45:00 |
Object | A complex data type that can contain multiple values and properties, also known as a dictionary, map, or hash table | { "name": "John", "age": 30, "address": "123 Main St." } |
Array | An ordered collection of values of the same type, also known as a list or vector | [1, 2, 3, 4, 5] |
Null | A special value representing the absence of any data, often used to indicate a missing or unknown value | null |
Character | A single character, such as a letter, digit, or symbol | 'A' |
Any other | Any other form of tabular data is supported |
Access Syntho’s User Documentation!
- Getting started
- Deployment and connectors
- User interface
- Features
- User roles and support