• Agile Data Preparation

    Agile Data Preparation

    Turn your data islands into a powerful Data Factory.

    The world of analytics is changing. Marketers, product makers and politicians alike are tapping into new sources of information to gain valuable and immediate insights. But to ask meaningful question of the data, analysts, data stewards and architects will first need to get it into a format their tools can understand.

  • 1

Preparing a large variety of data for analysis can be challenging. Big data projects often fail because they cannot integrate and mobilize critical information in a timely fashion. Agility is all about how fast you can extract value from mountains of data stored in disparate systems and how quickly that information can be turned into action. It's about rapid, agile creation of curated and trusted datasets for use by applications or desktop analytics tools.

The Reactive Data Platform™ lets you discover and create composite data models and build scaleable, high-performance pipelines for data preparation and analysis. It is an ideal tool for data architects or citizen data scientists that enhances their understanding of data, and simplifies preparation of large and complex datasets.


Enterprise analytics is hard. High value information is often scattered across disparate data stores and cloud sources like Salesforce or Google Analytics. While self-service data prep technologies solve certain integration problems, most are little more than desktop tools that are fed by data lake or warehouse, resulting in more data copies,stale data and synchronization issues.

StreamScape's Dataspaces™ are a unified computing abstraction for modeling and working with in-memory data, files, relational (SQL) and unstructured (NoSQL) data stores. Dataspaces make it easy to integrate and query any combination of data sources and provide a common environment for batch, real-time and streaming analytics.

A dataspace provides powerful data virtualization features that allow users to build cross-data source schema bottom-up, letting data engineers discover and declare data relationships on-the-fly. Reference links, URI and FLOB data types make it possible to define data graphs and advanced relationships between files, documents or tables stored across a variety of data stores. Dataspaces let non-technical knowledge workers create transient data models that can be used by applications and analytics tools, regardless of the type or location of actual data source.

The Reactive Data Platform™ offers big, in-memory performance, advanced access control, flexible execution and web enablement when you are ready to take your business intelligence and analytics to the next level. Native instrumentation provides built-in Operational Intelligence, allowing DevOps teams to monitor and manage how business users curate and prepare data for analysis.


Recent innovation in data storage technologies offers new and efficient ways to store and organize data. Although big data storage engines like Hadoop and NoSQL databases simplify the task of collecting and managing information, asking meaningful questions of such data remains a challenge. Without a robust data language, every query becomes a program for parsing and creating more data copies.

StreamScape's query engine lets you move the computations to the data or use streaming analytics to process data without storing it, reducing data copy and allowing users to process an infinite amount of data with limited hardware resources. The engine supports user-defined functions (UDFs) written in SQL, Java, Python or RPL that can access local files, remote databases or in-memory datasets for optimal performance. Built-in support for parallelism, transactions, SQL, unstructured data and native file system storage accellerates creation of curated data.

The engine provides dozens of functions for data aggregation, transformation and analysis that allow business users to easily work with semi-structured, multi-structured and time-series data, including automatic transformation of XML, JSON, Web Queries and Text into structured datasets. Creating data mash-ups and blending together information from various sources becomes an easy and intuitive task.

StreamScape lets you define how data collections from disparate sources can relate to each other, creating a unified model and a single dataspace. The engine then uses that information to join collections, documents or objects as needed; allowing data visualization tools to make use of these relationships for query and reporting. Any complex join logic can be expressed with StreamScape's DSQL. Visualization and analysis tools such as Tableau, Power BI, Excel or Qlink can then be used to present blended and curated data to business users.


Pipeline Processing lets you link a sequence of data processing tasks into a process that automates data preparation. Common tasks like transformation, cleansing and validation can be made part of a pieline as well as data analysis logic that turns data into computation results used for financial reporting or scientific analysis. Pipelines can be triggered by arrival of new data, scheduled to run at specific times or called from web applications and visualization tools. Pipelines can be used to accumulate data in background, remove data duplications, errors or inconsistencies and automate the process of answering critical business questions in dynamically changing environments.


The engine automatically exposes Functions, Services and Queries as JDBC sources, OData compliant collections or REST based API that may be accessed by desktop analytics tools, applications or any mobile device nativley. No coding or other prep work needed. Structured or unstructured data can be made to look like tables, JSON or XML documents based on application needs. Users can describe, prepare, and present data usig a single tool, regardless of data format or complexity. Powerful Schema-On-Read capabilities let you describe data on-the-fly without the need for complex modeling or up-front schema design.

.. get started with
the Reactive Data Platform™ today!


Semantic Types let you create reusable data objects that represent real-world things such as Addresses, Patient Records or Financial information. Types can be used to model query results, map file contents or as part of data collections to describe documents and unstructured data. Data dictionaries can be organized into Ontologies that allow the user to discover or configure relationships between user-defined data elements. Popular tools like Tableau, Qlik, Power BI and Excel can automatically make use of the declared relationships to guide users in the data discovery process.


Login to access additional content such as white papers, on-line docs, Wiki and product downloads.