Building Data Pipelines.

Reliable
Scalable
Portable

10+ years designing reliable, scalable data platforms across enterprise data warehouses and modern cloud-native pipelines on AWS, Snowflake, and Apache Iceberg.

Get in touch →
42M+ Records migrated to Apache Iceberg
Pipeline performance improvement
Merger-scale Migration across United + Continental Airlines
10+ Years of production data engineering
About

Data engineer.
Pipeline architect.
Problem solver.

I'm Nguyen Le. I design data pipelines that are reliable, scalable, and portable across Snowflake and Redshift from a single dbt codebase.

My background spans the full data stack, from raw ingestion and orchestration to dimensional modeling, dbt transformation layers, and stakeholder-facing dashboards. I have delivered at scale inside United and Continental Airlines through a major merger, and independently through ZenClarity Consulting.

I specialize in modernizing legacy pipelines, replacing brittle ETL/ELT with cost-aware, idempotent, observable systems built on AWS, Snowflake, Airflow, and Apache Iceberg.

AWS Certified Data Analytics – Specialty
B.A. MIS — Cal State University Fullerton
United + Continental Airlines — merger-scale data engineering
SOX-aligned reporting and Enterprise Data Warehouse operations
Open to full-time and contract roles
Orange County, CA — hybrid Southern California and remote
Featured Work

Projects

ZenClarity UrbanFlow V2
End-to-End Data Pipeline with Iceberg Migration Framework
Production · AWS · Snowflake · Redshift

End-to-end data engineering platform built on AWS, featuring a production-grade Iceberg Migration Framework as the core ingestion layer. V2 introduces cost-aware engine routing between Glue and EMR based on data volume, idempotent orchestration via Airflow with DynamoDB audit trail, and a full dbt transformation stack on Snowflake and Redshift with 35.6M clean records across staging, intermediate, and mart layers.

Full medallion architecture implemented as staging, intermediate, and mart layers in dbt, with a dedicated DQ quarantine layer for multi-reason failure tracking and a clean separation between infrastructure and modeling concerns. dbt transformation layer architected for multi-engine portability, a single codebase deployable to both Snowflake and Redshift using target-aware Jinja conditionals. Architecture includes a delta ingestion layer for ongoing monthly loads, completing the full ingest-to-mart pipeline cycle.

Apache Iceberg AWS EMR AWS Glue Airflow DynamoDB Snowflake Redshift dbt PySpark S3
View on GitHub →
Records ingested 42M+
Clean mart records 35.6M
EMR runtime 1.5 min
Performance gain 4× faster
Engine routing Benchmark-driven
dbt tests All passing
dbt layers Staging · Int · Mart
Idempotency DynamoDB audit
United + Continental Airlines
Enterprise Data Engineering
Enterprise · Merger-scale · Global

Delivered data engineering solutions across mission-critical airline operations including Cargo Operations and Revenue Management, through the United-Continental merger. Led migration of 170+ data landing zones to a secure SFTP platform consolidating internal file shares, FTP connections, and database links from internal teams, external partners, and global vendors into a compliant, standardized ingestion layer.

Maintained SOX-aligned audit reporting infrastructure and operated the Enterprise Data Warehouse at 99.9% availability across mission-critical airline operations. Drove ETL performance improvements of 30-65% across critical operational systems including progressive offload of Teradata workloads to Hadoop and EMR/Spark on S3.

Merger-scale migration United + Continental
ETL performance gain 30–65%
SOX reporting SOX-aligned
EDW availability 99.9%
Scale Enterprise · Global
Stack

Tools I build with.

Cloud and Storage
AWS S3 AWS Glue AWS EMR DynamoDB Apache Iceberg Redshift
Transform and Model
dbt Snowflake SQL PySpark Dimensional Modeling SCD Type 1 + 2
Orchestrate and Ops
Apache Airflow AWS Step Functions GitHub Actions Docker CI/CD IAM + RBAC
Visualize and Report
Streamlit QuickSight Tableau Python Bash / Linux Git / GitHub
Contact

Let's build
something reliable.

Available for Senior Data Engineering and Analytics Engineering roles in Southern California and remote. Open to full-time and contract opportunities.

Currently open to new opportunities.

Actively exploring Senior Data Engineer and Analytics Engineer roles where I can bring production-grade pipeline architecture, modern data stack expertise, and a track record of delivering at enterprise scale. Based in Orange County, CA. Available for hybrid Southern California and remote positions.