Data Processing

Powerful Data Processing Engine

"Having built routers that process data at terabits per second, performance has always been the core differentiator of API AutoFlow"
Co-Founder / Chief Architect, David Jung

Raw data-like unrefined gold buried deep
in a mine is a precious resource for modern businesses

API AutoFlow Data Transformation
Joining Filtering Splitting Cleansing Aggregation Validation Key Restructuring Bucketing & Binning Derivation Deduplication Integration Format Revision Z-Score Normalization and Max-Min Scaling
data warehouse
rdbms
files
saas
mainframe
apps

A simple drag and drop of action to flexibly work with strings, arrays, objects, and more.

Bucketing/Binning

Grouping by a range of values, i.e., categorizing a number of continuous values into a smaller number of bins (buckets), say, from {1,3,5…} to {1-5, 8-10, 14-18…}.

Actions Object String are commonly used to perform the bucketing.

Data Aggregation

The process where data is gathered and expressed in a summary form for statistical analysis and visualization. For example, data can be aggregated over a given time to provide statistics such as average, minimum, maximum, sum, and count. The aggregated data then can be analyzed to gain insights about particular resources or resource groups.

Actions Iteration/Map Array String are commonly used to perform the aggregation.

Data Cleansing

The process of analyzing, identifying, and correcting messy, raw data. Also referred to as data scrubbing and data cleaning, data cleansing relies on the careful analysis of datasets to provide the most accurate data possible.

Actions Iteration/Filter Data/Set String are commonly used to perform the cleansing.

Data Deduplication

The process eliminates excessive copies of data and significantly decreases storage capacity requirements. Also referred to as single-instance storage, intelligent compression, commonality factoring, or data reduction, deduplication allows you to store one unique copy of data in your data warehouse or database.

Actions Array/Unique Conditional/IF are commonly used to perform the dupulication.

Data Derivation

The creation of special rules to “derive” the specific information you want from the data source. For example, let's say you have revenue data from sales, but you want the profit figures derived after subtracting costs and tax liabilities.

Actions Interation/ForEach Data/Expression are commonly used to perform the derivation.

Data Filtering

The process of refining the datasets. The goal is to eliminate repeated, irrelevant, or overly sensitive data.

Actions Interation/Filter are commonly used to perform the filtering.

Data Integration & Merge

The process of merging the data into the same structure or schema. Used for data warehousing purposes, data integration supports the processing of massive data sets by merging multiple data sources into an easy-to-analyze whole.

Actions Interation/ForEach String/Join Database are commonly used to perform the integration.

Data Joining

The process of connecting two or more database datasets. This allows a relationship between multiple datasets, which merges data together so you can access correlated data from multiple sources.

Actions Interation/Map String/Join are commonly used to perform the integration.

Data Splitting

The process of dividing a single column into multiple columns. For example, splitting the single column into multiple columns can be useful to develop "training" and "testing" sets. Data splitting is also used to split a large amount of data gathered over a period of time.

Actions Array/Split String/Split Object/Split are commonly used to perform the splitting.

Data Summarization

Similar to data aggregation, it refers to presenting the summary of generated data in an easily comprehensible and informative manner. For example, the sum of the total revenue of all the sales or by an individual salesperson, then for Head of Sale, you create sales metrics that reveal total sales in a time-periods.

Actions Iteration/Map Data/Expression String/Join are commonly used to perform the Summarization.

Data Validation

The process of creating rules for the system to handle different data issues. A Common use case is to ensure the accuracy and quality of the data you transform.

Actions Condition/Match Interation/Any-is-true are commonly used to perform the Validation.

Format Revision

The process of fixing problems that stem from fields having different data types. For example, some fields might be numeric, and others might be text.

Actions String/to-Integer Array/to-Object JSON/Decode DateTime/Unix-to-UTC are commonly used to perform the formatting.

Key Restructuring

The process of transforming to generic keys whether in a database or data object. For example, when the data has keys with built-in meanings, serious problems can develop when the key needs to be changed.

Actions Iteration/Find-Index Object/Keys Database are commonly used to perform the formating.

Z-Score Normalization and Max-Min Scaling

Though not a common use case for API AutoFlow, Z-Score and min-max normalization usage is becoming common for data scientists. By applying the formula once in the `data/Expression` action, the incoming data can be normalized in real-time.

Actions Data Expression is commonly used to perform the Z-Score normalization.