Data Engineering Beehiiv-realtime

Overview

This repository provides a comprehensive setup for a modern data engineering stack utilizing powerful open-source tools. The stack enables real-time and batch data processing, orchestration, and storage with seamless containerized deployment.

Technologies Used

1. Apache Airflow

Workflow orchestration and scheduling.
Manages end-to-end data pipelines.
DAG-based execution for automation.

2. Python

Primary programming language for data processing and orchestration.
Used in ETL scripts, Kafka consumers, and data transformation tasks.

3. Apache Kafka

Distributed event streaming platform for real-time data ingestion.
Handles high-throughput and low-latency data streams.
Integrates seamlessly with Spark and ClickHouse.

4. Apache Zookeeper

Manages and coordinates Kafka brokers.
Provides leader election and distributed synchronization.

5. Apache Spark

Distributed data processing engine for real-time and batch workloads.
Utilized for transformations, aggregations, and analytics.

6. ClickHouse

Columnar database optimized for fast analytical queries.
Stores structured and semi-structured data efficiently.

7. PostgreSQL

Relational database used for transactional workloads.
Acts as metadata storage for Airflow and other applications.

8. Docker

Containerization for seamless deployment of all components.
Ensures portability and reproducibility of the data stack.

Setup & Deployment

Prerequisites

Ensure you have the following installed:

Docker & Docker Compose
Python 3.x
Kafka & Zookeeper dependencies

Steps to Run

Clone this repository:

git clone https://github.com/Theglassofdata/Beehiiv-realtime.git
cd Beehiiv-realtime

Start the services using Docker Compose:
```
docker-compose up -d
```

Verify Airflow setup:

docker-compose exec airflow-webserver airflow dags list

Access services:
- Airflow UI: http://localhost:8080
- Kafka UI (if available): http://localhost:9092
- ClickHouse: Connect via clickhouse-client
- PostgreSQL: Access using psql or admin tools

Contributing

Feel free to open issues and contribute improvements to this stack.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dags		dags
jobs		jobs
script		script
venv		venv
.DS_Store		.DS_Store
.gitignore		.gitignore
Beehiiv_architecture.jpg		Beehiiv_architecture.jpg
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Engineering Beehiiv-realtime

Overview

Technologies Used

1. Apache Airflow

2. Python

3. Apache Kafka

4. Apache Zookeeper

5. Apache Spark

6. ClickHouse

7. PostgreSQL

8. Docker

Setup & Deployment

Prerequisites

Steps to Run

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Theglassofdata/Beehiiv-realtime

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Beehiiv-realtime

Overview

Technologies Used

1. Apache Airflow

2. Python

3. Apache Kafka

4. Apache Zookeeper

5. Apache Spark

6. ClickHouse

7. PostgreSQL

8. Docker

Setup & Deployment

Prerequisites

Steps to Run

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages