Kafka - Syllabus

Mastering Apache Kafka: Real-Time Data Streaming with Python

Apache Kafka is a distributed event streaming platform used for building real-time data pipelines and applications. This book provides a hands-on approach to mastering Kafka, covering setup, message processing, fault tolerance, and integrations with MinIO, Apache Spark, and Airflow.

Module 1: Introduction to Apache Kafka and Event Streaming

What is Apache Kafka? Key features and use cases
Kafka’s architecture: Brokers, Topics, Producers, Consumers
Understanding partitions, offsets, and replication
Installing and setting up Kafka on local and cloud environments

Module 2: Producing and Consuming Data in Kafka

Writing Kafka Producers with Python
Writing Kafka Consumers with Python
Understanding message serialization (JSON, Avro, Protobuf)
Optimizing producer-consumer performance

Module 3: Kafka Topics, Partitions, and Message Retention

Creating and managing Kafka topics
Configuring partitions for scalability
Message retention policies and log compaction
Handling duplicate and out-of-order messages

Module 4: Kafka Connect and Data Integration

Introduction to Kafka Connect for external system integration
Connecting Kafka to databases, APIs, and object storage (MinIO, PostgreSQL)
Configuring Source and Sink Connectors
Custom Connector Development with Python

Module 5: Stream Processing with Kafka Streams and PySpark

Introduction to Kafka Streams API
Processing Kafka messages in real-time with Spark Structured Streaming
Stateful transformations, windowing, and joins in Kafka Streams
Handling backpressure and stream optimization

Module 6: Data Pipeline Orchestration with Apache Airflow

Automating Kafka workflows with Airflow DAGs
Using Airflow’s KafkaOperator for event-driven workflows
Managing ETL pipelines with Kafka, Spark, and Airflow
Implementing failure recovery and monitoring

Module 7: Kafka Security and Fault Tolerance

Securing Kafka with SSL/TLS and SASL authentication
Implementing ACLs and Role-Based Access Control (RBAC)
Kafka disaster recovery and multi-cluster replication
Monitoring Kafka clusters with Prometheus and Grafana

Module 8: Deploying Kafka in Production

Running Kafka on Kubernetes
Deploying Kafka in AWS, GCP, and Azure
Scaling Kafka clusters for high availability
Best practices for Kafka performance tuning

Hands-On Projects

Project 1: Real-Time Log Processing with Kafka and MinIO

Stream logs into Kafka from multiple sources
Store and retrieve event logs in MinIO
Analyze log patterns in real-time with Spark Streaming

Project 2: Fraud Detection System Using Kafka and Spark

Ingest real-time transaction data into Kafka
Process transactions for fraud detection using PySpark
Deploy an alerting system for anomalies

Project 3: Building a Real-Time ETL Pipeline with Kafka and Airflow

Automate data ingestion from APIs using Kafka Producers
Process and store data using Kafka Connect and PostgreSQL
Schedule and monitor ETL jobs with Apache Airflow

Project 4: IoT Sensor Data Streaming with Kafka

Simulate IoT sensors producing real-time data
Process and visualize IoT data using Kafka and Grafana
Implement real-time anomaly detection for sensor failures

Project 5: Deploying a Scalable Kafka Cluster on Kubernetes

Set up Kafka in a Kubernetes environment
Implement Kafka Streams for data transformation
Secure and monitor Kafka with industry best practices

References

K3S: Introduction to Kubernetes

Keycloak: Syllabus

Datascience

Rizki Sasri Dwitama

Title here

Kafka - Syllabus

Mastering Apache Kafka: Real-Time Data Streaming with Python

Module 1: Introduction to Apache Kafka and Event Streaming

Module 2: Producing and Consuming Data in Kafka

Module 3: Kafka Topics, Partitions, and Message Retention

Module 4: Kafka Connect and Data Integration

Module 5: Stream Processing with Kafka Streams and PySpark

Module 6: Data Pipeline Orchestration with Apache Airflow

Module 7: Kafka Security and Fault Tolerance

Module 8: Deploying Kafka in Production

Hands-On Projects

Project 1: Real-Time Log Processing with Kafka and MinIO

Project 2: Fraud Detection System Using Kafka and Spark

Project 3: Building a Real-Time ETL Pipeline with Kafka and Airflow

Project 4: IoT Sensor Data Streaming with Kafka

Project 5: Deploying a Scalable Kafka Cluster on Kubernetes

References

Kafka - Syllabus

Mastering Apache Kafka: Real-Time Data Streaming with Python

Module 1: Introduction to Apache Kafka and Event Streaming#

Module 2: Producing and Consuming Data in Kafka#

Module 3: Kafka Topics, Partitions, and Message Retention#

Module 4: Kafka Connect and Data Integration#

Module 5: Stream Processing with Kafka Streams and PySpark#

Module 6: Data Pipeline Orchestration with Apache Airflow#

Module 7: Kafka Security and Fault Tolerance#

Module 8: Deploying Kafka in Production#

Hands-On Projects

Project 1: Real-Time Log Processing with Kafka and MinIO#

Project 2: Fraud Detection System Using Kafka and Spark#

Project 3: Building a Real-Time ETL Pipeline with Kafka and Airflow#

Project 4: IoT Sensor Data Streaming with Kafka#

Project 5: Deploying a Scalable Kafka Cluster on Kubernetes#

References#

Module 1: Introduction to Apache Kafka and Event Streaming

Module 2: Producing and Consuming Data in Kafka

Module 3: Kafka Topics, Partitions, and Message Retention

Module 4: Kafka Connect and Data Integration

Module 5: Stream Processing with Kafka Streams and PySpark

Module 6: Data Pipeline Orchestration with Apache Airflow

Module 7: Kafka Security and Fault Tolerance

Module 8: Deploying Kafka in Production

Project 1: Real-Time Log Processing with Kafka and MinIO

Project 2: Fraud Detection System Using Kafka and Spark

Project 3: Building a Real-Time ETL Pipeline with Kafka and Airflow

Project 4: IoT Sensor Data Streaming with Kafka

Project 5: Deploying a Scalable Kafka Cluster on Kubernetes

References