Syntho supports for large multi-table datasets and databases. Also for multi-table datasets and databases, we maximize the data accuracy for every synthetic data generation job and demonstrate this via our data quality report. In addition, the SAS data experts assessed and approved our synthetic data from an external point of view. We optimized our platform to minimize computational requirements (e.g. no GPU required), without compromising on the data accuracy. In addition, we support auto scaling, so that one can synthesize huge databases. Specifically for multi-table datasets and databases, we automatically detect the data types, schemas and formats to maximize data accuracy. For multi-table database, we support automatic table relationship inference and synthesis to preserve the referential integrity. Finally, we support for comprehensive table and column operations so that you can configure your synthetic data generation job, also for multi-table datasets and databases.
Preserved referential integritySyntho supports automatic table relationship inference and synthesis. We automatically infer and generate primary and foreign keys that reflect your source tables and safeguard relationships throughout your databases and across different systems to preserve the referential integrity. Foreign key relationships are automatically captured from your database to preserve referential integrity. Alternatively, one can run a scan to scan for potential foreign key relations (when foreign keys are not defined in the database, but for example in the application layer) or one can add them manually.
Comprehensive table and column operationsSynthesize, duplicate or exclude tables or columns to your preference. When you synthesize a database with multiple tables, one typically would like to be able to configure the synthetic data generation job to include and / or exclude the desired combination of tables. Table modes: Synthesize: Use AI to synthesize the table Duplicate: De-identify personally identifiable information (PII) or duplicate the table Exclude: Exclude the table from the target database
This type of data is commonly used in many sectors. This could for example be in finance (for example with customers making transactions) or in healthcare (where patients undergo procedures), and many others where trends and patterns over time are important to understand. Time series data can be collected at regular or irregular intervals. The data can be univariate, consisting of a single variable such as temperature, or multivariate, consisting of multiple variables that are measured over time, such as a stock portfolio’s value or a company’s revenue and expenses. Analyzing time series data often involves identifying patterns, trends, and seasonal fluctuations over time, as well as making predictions about future values based on past data. The insights gained from analyzing time series data can be used for a wide range of applications, such as forecasting sales, predicting the weather, or detecting anomalies in a network. Hence, supporting for time series data is often required when synthesizing data.