Data Engineering | Lijju Mathew

PySpark Cheat Sheet

A quick guide to PySpark and AWS Glue for efficient data processing and transformation.

A guide to configuring Amazon EMR clusters, focusing on instance types, auto-scaling, and spot instance strategies.

Explore how to build an efficient and scalable data lake on AWS using key services and best practices

This post explores AWS Glue’s powerful ETL capabilities, focusing on Glue Catalog, Glue Jobs, and practical tips to optimize your data workflows.

A comparison of AWS ETL services EMR, Glue, and Lambda, highlighting their strengths, use cases, and best practices.