![](https://www.bigdatawire.com/wp-content/uploads/2023/06/informatica_webinar_111.png)
MOSTLY AI Unveils Open-Source Toolkit for Synthetic Data Generation
Feb. 7, 2025 — MOSTLY AI recently announced the launch of the first industry-grade open-source synthetic data toolkit (SDK), enabling any organization to easily generate high-quality, privacy-safe synthetic datasets from sensitive proprietary data, all within their own compute infrastructure. By eliminating data-sharing hurdles, this open-source release clears the path for the next wave of AI innovation, fueled by previously inaccessible data.
AI is recognized as a foundational technology, akin to electricity. Yet, the shortage of relevant training data is starting to hamper its further development. Governments and analysts alike emphasize synthetic data as the next frontier:
- The UK Government’s most recent AI Opportunities and Action Plan stresses the “need access to high-quality data — the lifeblood of modern AI”, and explicitly calls for “exploring the use of synthetic data generation techniques to construct privacy-preserving versions of highly sensitive datasets.”
- Industry analyst Gartner predicts 75% of businesses will generate synthetic customer data by 2026, up from less than 5% in 2023.
- The European Commission’s JRC deems synthetic data a key enabler for AI development and data democratization.
MOSTLY AI’s mission has always been to democratize data. With the open-source release of MOSTLY AI’s industry-proven synthetic data toolkit, every business and every agency can now be empowered to finally harness their proprietary data with zero compromises on privacy.
The new SDK delivers state-of-the-art accuracy, differential privacy, best-in-class compute efficiency as well as a broad data support. Its fully permissive license fosters tighter integrations with leading AI and cloud platforms, creating a seamless ecosystem for synthetic data at scale.
Importantly, any synthetic data generator built with the SDK is fully compatible with MOSTLY AI’s Enterprise Platform, enabling instant sharing, analysis, and AI-assisted data exploration. This unlocks data insights for everyone in an organization, independent of their background, driving true democratization of knowledge.
Synthetic data is set to drive the next wave of AI adoption. Many organizations realize that AI trained solely on public data falls short when it comes to context and relevance. Proprietary business data — rich in behavioral insights and domain expertise — holds the key to more effective AI applications. However, privacy constraints often lock this data out of AI training.
MOSTLY AI’s Synthetic Data SDK removes that barrier, empowering organizations to safely harness their proprietary data without risking privacy compliance. With the recent advancements of GPTs, organizations can empower every employee with AI, not just expert users. But without unlocked training data, it’s like having a high-end device with no power source. By safely fueling AI with proprietary synthetic data, businesses can now turn their AI models from a novelty into an indispensable force for innovation.
Availability
The Synthetic Data SDK is available as a standalone Python package here under the fully permissive Apache v2 license. Join MOSTLY AI’s burgeoning community by installing, using, and integrating the SDK — and help shape the future of privacy-safe synthetic data. Share your questions, star the repository, request features, and showcase your use cases.
About MOSTLY AI
MOSTLY AI was founded in 2017 in Vienna, Austria, by Michael Platzer, Klaudius Kalcher and Roland Boubela, three distinguished data scientists. They realized early on the potential of using AI to generate structured business data and to create what we now call synthetic data. Back then this was not much more than an idea. It was unclear how the process was going to work, since no previous research or competitors existed in the space.
The inspiration came from the unstructured data domain where the first artificially created synthetic images were produced. The three co-founders experienced the challenges companies were facing with traditional data anonymization. These challenges only increased as GDPR was introduced in Europe in 2018. MOSTLY AI released the first version of its Synthetic Data Platform at the same time and proved to the world that synthetic data has a vast potential.
Source: Alexandra Ebert, MOSTLY AI