Skip to main content

Data-Engineering

Data engineering is the process of designing, building, and maintaining large-scale data systems that can collect, store, process, and analyze massive amounts of data from various sources. It involves a deep understanding of computer science, mathematics, and domain-specific knowledge to create robust, scalable, and efficient data processing systems.

Leading Platforms

NameOSSComment
Snowflake
Talend
Databricks
SingleStore

ETL / ELT

NameOSSComment
Apache Airflow👍Programmatically author, schedule and monitor workflows.
Airbyte👍ETL Pipelines
Dagster👍Cloud-native orchestrator for data pipelines
Meltano👍
StitchMove data from multiple sources into a data warehouse
HevoAutomated Data Pipelines to Redshift, BigQuery, Snowflake
RiveryETL Pipelines
FivetranETL Pipelines

Event Streaming & Data Streams

NameOSSComment
Apache Kafka👍Distributed event streaming platform
Apache Flink👍Stateful Computations over Data Streams
ConfluentData in motion

Data Development Tools

NameOSSComment
Apache Spark👍Unified engine for large-scale data analytics
Apache Beam👍Unified model for batch and streaming data processing
Dask👍Scale the Python tools you love
DBT-CoreTransform data

Notebooks & Visualizations

NameOSSComment
Jupyter👍
Observable
Deepnote
Hex
StreamlitBuild and share data apps in python
Steep
Hashboard
Metabase

More resources