Building Scalable ETL Pipelines: Best Practices
Learn essential best practices for designing and implementing scalable ETL pipelines using AWS services and modern data engineering patterns.
Read More
Data Engineer with 5+ years of experience designing and optimizing scalable data platforms for analytics and AI/ML initiatives within the AWS ecosystem. Proficient in delivering high-performance solutions using Python, PySpark, Databricks, and AWS services (S3, Redshift, Glue, Lambda). Proven expertise in automating infrastructure with Terraform and CI/CD to enhance data quality and support business intelligence. Familiar with multi-cloud environments, including foundational knowledge of Microsoft Azure data services.
Download CVDesigning and optimizing scalable data platforms for analytics and AI/ML initiatives.
Delivering high-performance solutions using AWS services and automating infrastructure with Terraform.
Engineering and scaling AI-ready data pipelines to power real-time BI dashboards and support machine learning model training.
Orchestrated AWS infrastructure deployment using Terraform, designed and implemented data processing pipelines with Databricks and PySpark, and developed BuildKite CI/CD pipelines. Engineered and scaled AI-ready data pipelines to power real-time BI dashboards.
Architected and automated serverless ETL pipelines using AWS Glue and Lambda. Developed data ingestion pipelines for an AWS-based data lake. Led the strategic planning and management of data infrastructure, ensuring high levels of data quality and availability.
Designed, developed, and tested features using Python, serverless framework, and AWS services. Successfully refined infrastructure deployment using AWS CloudFormation and Bamboo CI/CD and migrated an existing system from on-premise to AWS services.
Developed ETL mappings using Informatica PowerCenter, developed stored procedures, database triggers, and SQL queries. Implemented best practices and tuned SQL code for optimization. Participated in Code review and UAT tests.
Established and maintained Quality Management Systems in Production. Prepared regular Quality Reports and developed and monitored performance metrics for all processes in the Production.
ETL, CI/CD, Automation, Scripting, Data Warehousing, Data Cleaning, AI/ML
Databricks, Terraform, BuildKite, JIRA, Informatica PowerCenter, VsCode, SQL Developer
Architected and automated serverless ETL pipelines using AWS Glue and Lambda for real-time data processing.
Engineered and scaled data pipelines to power real-time BI dashboards and support ML model training.
Orchestrated AWS infrastructure deployment using Terraform with automated CI/CD pipelines.
Developed data ingestion pipelines for an AWS-based data lake with high data quality and availability.
Successfully migrated an existing system from on-premise to AWS services using CloudFormation.
Developed ETL mappings and optimized SQL code for enterprise data warehousing solutions.
Learn essential best practices for designing and implementing scalable ETL pipelines using AWS services and modern data engineering patterns.
Read MoreA comprehensive guide to automating AWS infrastructure deployment using Terraform, including best practices and real-world examples.
Read MoreExplore architectural patterns and considerations for building a robust, scalable data lake on AWS using S3, Glue, and Athena.
Read MoreSharing my career transition story, lessons learned, and advice for aspiring data engineers looking to break into the field.
Read MoreDiscover how to build cost-effective, serverless data processing pipelines using AWS Lambda, S3, and EventBridge.
Read MoreA beginner-friendly tutorial on using PySpark in Databricks for large-scale data processing and transformation tasks.
Read MoreWant to work together or whether you have a question or just want to say hi, I’ll try my best to get back to you!