Data-Engineering

Data engineering is the process of designing, building, and maintaining large-scale data systems that can collect, store, process, and analyze massive amounts of data from various sources. It involves a deep understanding of computer science, mathematics, and domain-specific knowledge to create robust, scalable, and efficient data processing systems.

Leading Platforms

Name	OSS	Comment
Snowflake
Talend
Databricks
SingleStore

ETL / ELT

Name	OSS	Comment
Apache Airflow	👍	Programmatically author, schedule and monitor workflows.
Airbyte	👍	ETL Pipelines
Dagster	👍	Cloud-native orchestrator for data pipelines
Meltano	👍
Stitch		Move data from multiple sources into a data warehouse
Hevo		Automated Data Pipelines to Redshift, BigQuery, Snowflake
Rivery		ETL Pipelines
Fivetran		ETL Pipelines

Event Streaming & Data Streams

Name	OSS	Comment
Apache Kafka	👍	Distributed event streaming platform
Apache Flink	👍	Stateful Computations over Data Streams
Confluent		Data in motion

Data Development Tools

Name	OSS	Comment
Apache Spark	👍	Unified engine for large-scale data analytics
Apache Beam	👍	Unified model for batch and streaming data processing
Dask	👍	Scale the Python tools you love
DBT-Core		Transform data

Notebooks & Visualizations

Name	OSS	Comment
Jupyter	👍
Observable
Deepnote
Hex
Streamlit		Build and share data apps in python
Steep
Hashboard
Metabase

Leading Platforms​

ETL / ELT​

Event Streaming & Data Streams​

Data Development Tools​

Notebooks & Visualizations​

More resources​

Leading Platforms

ETL / ELT

Event Streaming & Data Streams

Data Development Tools

Notebooks & Visualizations

More resources