Skip to content

gchoy/awesome-data-engineering

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 

Repository files navigation

Awesome Data Engineering

A curated list of data engineering tools for software developers

List of content

  1. [Databases] (#databases)
  2. Ingestion
  3. [File System] (#file-system)
  4. File Format
  5. Stream Processing
  6. [Batch Processing] (#batch-processing)
  7. [Front End] (#front-end)
  8. [Frameworks] (#frameworks)

Databases

Data Ingestion

File System

File Format

  • Apache Avro Apache Avro™ is a data serialization system
  • Apache Parquet Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
  • Apache Thrift The Apache Thrift software framework, for scalable cross-language services development
  • ProtoBuf Protocol Buffers - Google's data interchange format
  • SequenceFile SequenceFile is a flat file consisting of binary key/value pairs. It is extensively used in MapReduce as input/output formats

Stream Processing

  • Spark Streaming Spark Streaming makes it easy to build scalable fault-tolerant streaming applications.
  • Apache Flink Apache Flink is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.
  • Apache Storm Apache Storm is a free and open source distributed realtime computation system
    • Pyleus Pyleus is a Python framework for developing and launching Storm topologies.
    • ParselyStreamparse lets you run Python code against real-time streams of data with Apache Storm.
  • Apache Samza Apache Samza is a distributed stream processing framework
  • Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data

Batch Processing

Front End

Frameworks

ELK Elastic Logstash Kebana

Docker

  • Flocker Easily manage Docker containers & their data

Datasets

Realtime

Data Dumps

Cheers to The Data Engineering Ecosystem: An Interactive Map

Inspired by the awesome list. Created by Insight Data Engineering fellows.

License

CC0

To the extent possible under law, Igor Barinov has waived all copyright and related or neighboring rights to this work.

About

A curated list of data engineering tools for software developers

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published