Home

Ritesh Sonawane

Site Reliability Engineer / Platform Engineer

Led infrastructure for 10+ startups across 17+ Kubernetes environments on AWS, GCP, and Azure. Specialized in reliability, observability, and air-gapped enterprise deployments.

SRE Platform Engineering Golang Kubernetes Observability

Blog

Recent posts

Practical writing on Kubernetes, observability, platform engineering, migrations, and production infrastructure.

NVIDIA DSX

Introduction For years, building an AI data center meant assembling a puzzle from dozens of vendors, each speaking a different language. Chips from one company, …

ndots in Kubernetes

Let’s see how the ndots option works in Kubernetes. In Kubernetes, we connect to running pods either directly or via a Kubernetes Service. This post …

Celery to Argo Workflows

This blog is based on my work at CloudRaft! AI jobs often run for long periods on expensive hardware like GPUs. When a job fails halfway, you don’t just …

View all posts

Skills

What I work with