Healthcare severely needs data drive insights. Because healthcare is understaffed, over pressured with the potential to save lives. However, healthcare data is the most privacy sensitive data and is therefore locked. This privacy sensitive data:
This is problematic, as our goal for this hackathon it the predict deterioration and mortality as part of cancer research for a leading hospital. That is why Syntho and SAS collaborate for this hospital, where Syntho unlocks data with synthetic data and SAS realizes data insights with SAS Viya, the leading analytics platform.
Your guide into synthetic data generation
Our Syntho Engine generates completely new artificially generated data. Key difference, we apply AI to mimic the characteristics of real world data in the synthetic data, and to such an extent that it can even be used for analytics. That’s why we call it a synthetic data twin. It is as good as real and statistically identical to the original data, but without the privacy risks.
During this hackathon, we integrated the Syntho Engine API in SAS Viya as step. Here we also validated that the synthetic data is indeed as good as real in SAS Viya. Before we started with the cancer research, we tested this integrated approach with an open dataset and validated if the synthetic data is indeed as-good-as real via various validations methods in SAS Viya.
The correlations, the relations between variables, are preserved.
The Area Under the curve, a measure for model performance, is preserved.
And even the variable importance, the predictive power of variables for a model, holds when we compare the original data with the synthetic data.
Hence, we can conclude that synthetic data generated by the Syntho Engine in SAS Viya is indeed as-good-as-real and that we can use synthetic data for model development. Hence, we can start with this cancer research to predict deterioration and mortality.
Here, we used the integrated Syntho Engine as step in SAS Viya to unlock this privacy sensitive data with synthetic data.
The result, is an AUC of 0.74 and a model that is able to predict deterioration and mortality.
As result of using synthetic data, we were able to unlock this healthcare in a situation with less risk, more data and faster data access.
This is not only possible within the hospital, also data from multiple hospitals could be combined. Hence, the next step was to synthesize data from multiple hospitals. Different relevant hospital data was synthesized as input for the model in SAS Viya via the Syntho Engine. Here, we realized an AUC of 0.78, demonstrating that more data results in better predictive power of those models.
And these are the results from this hackathon:
Next steps are to
This is how Syntho and SAS unlock data and realize data driven insights in healthcare to make sure healthcare is well staffed, with normal pressure to save lives.
What is synthetic data?
How does it work?
Why do organizations use it?
How to start?
Keep up to date with synthetic data news