About Me | Thoai’s DataBrain

Hi there!

👋 I’m Thoai. I work in the cloud and data platform space, mostly around Kubernetes, Kafka, Spark, and the tools that make modern data systems run smoothly.

I enjoy digging into how data actually flows through systems, how storage works under the hood (pages, blocks, execution…), and how to turn a bunch of scattered services into a clean, maintainable pipeline.

Recently, I’ve been focusing more on Data Engineering, including:

designing reliable and scalable data platforms
technical stacks such as Iceberg, Lakehouse, Trino, Spark, etc
data modeling techniques
building ETL/ELT pipelines
and so on ...

I created this blog/docs site to capture what I learn, what I experiment with, and the mistakes I run into along the way — hopefully it helps someone else, or at least helps future me.

If you’re into data, distributed systems, or just want to debug a burning pipeline together, feel free to reach out.

My Experiences

Data Engineer ZDA - Zalopay - VNG Corporation

Ho Chi Minh City, Vietnam

September 2025 - Present

🧰 Responsibilities

Designed and integrated a unified feature store based on Feast into a legacy data platform, supporting batch and real-time feature engineering and API-based online serving, with GitOps-style governance and versioning; operated at scale with 14M MAUs, ~2M streaming events/day, and <200 ms feature retrieval latency.
Built batch and streaming ETL / feature engineering pipelines using Spark and Airflow; developed internal frameworks and libraries to define & automate streaming pipeline deployment and management, allowing ML Engineers and Analysts to focus on business logic instead of infrastructure complexity.
Owned and improved the Risk data & ML platform (Spark, Airflow, HDFS, and related systems), ensuring stable daily operation of 50–60 Spark applications, each processing up to 500M–1B records per run, through performance tuning, monitoring, and operational best practices.

🚀 Stacks

Feast Spark Airflow TiDB Hadoop Docker

Data Platform Engineer xPlat - FPT Smart Cloud - FPT Corporation

Ho Chi Minh City, Vietnam

April 2023 - August 2025

🧰 Responsibilities

Contributed as a core member of the Data Platform team at a cloud service provider, delivering a platform-as-a-service (PaaS) for real-time ingestion, distributed processing, governed access, and self-service analytics to enterprise clients:

Engineered a full-fledged LakeHouse platform with comprehensive data governance for Spark and Trino, integrating OAuth2-based identity propagation, fine-grained access control and dynamic data masking through Apache Ranger, automated lineage tracking via OpenMetadata, and standardized encryption at rest with S3 SSE-C.
Developed high-throughput CDC pipelines (100GB/day, 5K TPS) using Kafka Connect & Debezium, migrating 500+ PostgreSQL tables to ClickHouse, Iceberg, and S3 to enable low-latency analytics and historical storage.
Built a self-service Spark environment on JupyterHub by developing a custom Profile Manager with secure session provisioning, LakeHouse integration (Iceberg on S3), and dynamic environment configuration.
Enhanced Spark orchestration experiences by creating custom Airflow plugins integrated with Spark Operator, allowing modular job submission, runtime tracking, and real-time log streaming via Airflow UI.
Built unified monitoring dashboards using Prometheus and Grafana to track pipeline SLAs (latency, throughput, error rate) and detect anomalies across Spark, Kafka, and Airflow.
Collaborated on provisioning and operating key data platform components (Airflow, Trino, Superset, Kafka Connect) on Kubernetes, and supported deployment automation via FastAPI-based internal tools.

🚀 Stacks

Kubernetes ArgoCD Debezium Kafka ClickHouse Airflow JupyterHub FastAPI Iceberg Spark Prometheus Grafana

Backend Engineer InternxPlat - FPT Smart Cloud - FPT Corporation

Hanoi, Vietnam

October 2022 - March 2023

🧰 Responsibilities

Researched Kafka architecture and deployment feasibility, and designed Kafka-as-a-Service solutions on both VMs and Kubernetes.
Deployed Kafka on Kubernetes using Strimzi and Implemented end-to-end monitoring with JMX, Telegraf, Prometheus, Grafana, and alerting through Telegram.
Built Kong plugins and integrated API gateway into microservices on Kubernetes.

🚀 Stacks

Kafka Strimzi Kong API Gateway Spring Boot

My Educations

B.Sc. in Computer Science Hanoi University of Science and Technology

Hanoi, Vietnam

September 2018 - September 2023

The program was a 5-year engineering track (Russian-system), which is internationally mapped as a B.Sc., though locally recognized as an engineer’s degree.

My Contributions

Feast

Optimized MySQL Online Store write performance by implementing batch insert and transaction grouping, significantly reducing write latency. #5699
Introduced HDFS Registry backend, allowing teams to manage Feast feature definitions on Hadoop-compatible file systems. #5655
Added HDFS Staging support for Spark Offline Store, enabling distributed materialization and more efficient large-scale feature computation. #5635

ClickHouse Kafka Sink Connector

Refactored Hikari connection pool logic to prevent NPEs, avoid memory leaks, and improve thread safety using ConcurrentHashMap. #1048

Nifi Helm Chart

Revived and upgraded an abandoned Helm chart to fully support NiFi 2.x, redesigning its clustering, state management, and configuration system to run natively and reliably on Kubernetes without Zookeeper.

Feel free to contact me at

trinhvanthoai99@gmail.com

chimeyrock999

Or just download my resume

34KB

TrinhVanThoai_DataEngineer.pdf

PDF

Open

Last updated 2 months ago

hashtagHi there!

hashtagMy Experiences

hashtagData Engineer buildingsZDA - Zalopay - VNG Corporation

hashtagData Platform Engineer buildingsxPlat - FPT Smart Cloud - FPT Corporation

hashtagBackend Engineer InternbuildingsxPlat - FPT Smart Cloud - FPT Corporation

hashtagMy Educations

hashtagB.Sc. in Computer Science school-circle-checkHanoi University of Science and Technology

hashtagMy Contributions

hashtaggithubClickHouse Kafka Sink Connector

hashtaggithubNifi Helm Chart

hashtagFeel free to contact me at

hashtagOr just download my resume

Hi there!

My Experiences

Data Engineer ZDA - Zalopay - VNG Corporation

Data Platform Engineer xPlat - FPT Smart Cloud - FPT Corporation

Backend Engineer InternxPlat - FPT Smart Cloud - FPT Corporation

My Educations

B.Sc. in Computer Science Hanoi University of Science and Technology

My Contributions

ClickHouse Kafka Sink Connector

Nifi Helm Chart

Feel free to contact me at

Or just download my resume