Mahendra Rao — DevOps · SRE

// 01 — About

The engineer
behind the uptime.

I'm a DevOps & SRE professional with over 11 years of experience operating production-grade SaaS platforms, cloud-native infrastructure, and B2B API ecosystems — at the precise intersection where engineering depth meets customer trust.

I specialize in Kubernetes, CI/CD automation, OpenTelemetry-based observability, and incident ownership — helping teams move from reactive firefighting to proactive platform intelligence. Whether designing a GitHub Actions pipeline to GCP Cloud Run, debugging a latency anomaly through distributed traces, or translating a crash-loop backtrace into a board-level risk narrative, I bridge deep technical craft with business impact.

Most recently at Middleware.io, I owned end-to-end observability support for 50+ enterprise accounts — becoming the primary escalation point for complex APM, trace, and metric debugging. Before that, I built CI/CD pipelines and led a 24/7 L3 SRE team at Ola Electric, and delivered Fortune 500 technical account success at Sprinklr.

11+

Years in the field

35%

MTTR reduction at Middleware

40%

Deploy velocity gain at Ola

50+

Enterprise accounts owned

Core Stack

Kubernetes GCP / GKE Docker OpenTelemetry Prometheus Grafana GitHub Actions Azure DevOps Linux

SRE & DevOps Practices

Incident Management RCA / Post-mortems SLO / SLI CI/CD Pipelines On-call Ownership IaC (Terraform)

Currently Targeting

Google Cloud TSE DevOps Engineer SRE Solutions Engineering Platform Engineering

// 02 — Tech Stack

What I
build with.

Infrastructure

Kubernetes Docker GCP (GKE · Cloud Run) Azure Terraform Helm Linux

CI/CD & Delivery

GitHub Actions Azure DevOps Jenkins GitOps Infrastructure as Code

Observability

OpenTelemetry Prometheus Grafana Middleware.io Dynatrace New Relic Splunk ELK

Languages

Python Bash TypeScript Node.js YAML JSON

// production cloud: GCP & Azure · currently learning: AWS (SAA-C03)

// 03 — Experience

Where I've
built my craft.

March 2025 — December 2025

Middleware.io

Technical Support Engineer · Cloud Observability

Owned production observability and troubleshooting for 50+ global enterprise accounts across APM, logs, metrics, traces, and RUM. Debugged complex distributed systems — latency spikes, CPU/memory pressure, trace gaps — using OpenTelemetry, Prometheus, and Grafana, reducing MTTR by 35%. Reproduced customer workloads in Node.js, Python, Kubernetes, and Cloud Run to isolate root causes and drive engineering fixes. Translated production issues into actionable product feedback, directly influencing the OpsAI and Continuous Profiling roadmap. Authored technical runbooks and best-practice guides that improved customer self-service and platform stability.

OpenTelemetry APM Distributed Tracing Prometheus Grafana Kubernetes Cloud Run Python Node.js

April 2023 — February 2025

Career Development & Upskilling

Full-time parent · Active upskilling in DevOps & Cloud

Took a planned career break for full-time parenting while actively building hands-on DevOps skills. Shipped GitHub Actions pipelines deploying to GCP Cloud Run, built Prometheus/Grafana monitoring stacks with Slack-integrated alerting, and practiced infrastructure-as-code with Terraform and Docker Compose. Pursued Microsoft Power BI Data Analyst certification (completed) and progressed toward Google Cloud Professional DevOps Engineer, CKA, and CKAD.

GitHub Actions GCP Cloud Run Prometheus Grafana Docker Terraform Kubernetes

April 2022 — April 2023

Ola Electric

L3 Product Support Engineer · SRE / DevOps

Owned production reliability and incident response for an OTA vehicle software platform running large-scale 24/7 deployments. Designed CI/CD pipelines on Azure DevOps — improving deployment velocity by 40% and cutting manual release steps by 80%. Built observability dashboards across Grafana, Dynatrace, and Splunk, and managed API Gateway integrations via Apigee. Led incident management and RCA workflows, reducing MTTR by 40% through automated runbooks and post-mortem process improvements. Scaled and mentored the 24/7 L3 support team from 7 to 20 engineers — establishing on-call rotations, SLA frameworks, and training programs.

Azure DevOps CI/CD Grafana Dynatrace Splunk Apigee SRE Incident Management RCA

May 2019 — April 2022

Sprinklr

Technical Account Manager (TAM)

Served as technical SPOC for Fortune 500 enterprise accounts — Apple, Microsoft, Dell, Samsung, UBS — maintaining 95%+ CSAT. Led B2B API integrations across Salesforce, Slack, Zendesk, and Adobe Analytics, ensuring zero-downtime migrations. Managed a GetSatisfaction → Sprinklr Communities platform migration with SSO/SAML configurations for 100K+ users. Conducted RCAs for critical incidents, mentored junior engineers, and improved operational best practices across the team.

Salesforce API OAuth2 / SAML B2B Integrations SSO Enterprise SaaS Fortune 500 95%+ CSAT

2014 — 2019

Plivo · Movate · IBM · Aditya Birla

Enterprise APIs · IVR/VoIP · Production Support

5 years building a strong foundation across enterprise API support, IVR/VoIP platform operations, and production troubleshooting at scale. Worked across telecom, IT services, and conglomerate verticals — developing the systematic debugging mindset and customer-facing technical depth that powers everything since.

Enterprise APIs IVR / VoIP Production Troubleshooting Telecom

// 04 — Skills

Tools of
the trade.

📡

Observability & APM

End-to-end telemetry — metrics, logs, traces, RUM. Deep expertise in OpenTelemetry, Prometheus, Grafana, Dynatrace, New Relic, Splunk, and ELK at enterprise scale.

☁️

Google Cloud Platform

GCE, GKE, Cloud Run, Cloud Logging & Monitoring, IAM. Comfortable across the full Compute stack and cloud-native architecture patterns.

🐳

Kubernetes

Pod debugging, resource constraints, HPA, PVCs, node affinity, DaemonSets. Skilled at diagnosing cluster-level failures and translating them into stakeholder-legible narratives.

⚙️

CI/CD & DevOps

GitHub Actions, Azure DevOps, Jenkins. Designed pipelines that cut deployment time by 40% and eliminated 80% of manual release steps at production scale.

🐧

Linux Systems

Performance analysis, process management, filesystem diagnostics, systemd, cgroups, kernel parameters. Comfortable operating in production at any hour.

🌐

Networking

TCP/IP, DNS, HTTP/S, TLS, VPCs, load balancers, firewalls. Can trace a packet from browser to backend — and explain exactly where and why it got dropped.

🐍

Python & Scripting

Python for automation, API testing, monitoring integrations, and tooling. Bash for ops scripts, diagnostic pipelines, and environment bootstrapping.

🔐

Security & IAM

OAuth2, OIDC, SAML, SSO. Experience with API Gateway configurations (Apigee), enterprise SSO migrations for 100K+ users, and zero-downtime auth integrations.

🤝

Customer Success & SRE

QBRs, adoption metrics, churn prevention, escalation leadership. Scaled L3 SRE teams from 7→20 engineers. Proven track record of turning at-risk accounts into long-term champions.

// 05 — Projects & Open Source

Things I've
actually shipped.

GitHub · HTML / Three.js Live

This Portfolio

A production-grade personal portfolio site with a Three.js 3D hero, scroll-triggered animations, amber/charcoal dark theme, and full mobile responsiveness. Built from scratch — no templates.

Three.jsCSS AnimationsVercel

↗ github.com/MahendraRao/mahendra-portfolio

GitHub · Three.js Live

3D Humpty — Interactive Character

An interactive 3D character experience built with Three.js. A dancing Humpty Dumpty you can control — built to sharpen my 3D rendering and real-time animation skills in the browser.

Three.js3D AnimationInteractive

↗ github.com/MahendraRao/3d-humpty

GitHub · HTML / JS ★ 5 Stars

Consciousness

An exploratory creative coding experiment — pushing the boundaries of what pure HTML and JavaScript can express. One of my most starred personal projects.

Creative CodingHTML CanvasGenerative

↗ github.com/MahendraRao/consciousness

GitHub · React + TypeScript Live MVP

ClawBridge

A GitHub-hosted AI wrapper for OpenClaw — built as a React + Vite frontend and local Express + TypeScript API. Simplifies installation, system diagnostics, provider setup (OpenAI / Anthropic / Ollama / custom), and first-run onboarding for non-coders. Working MVP today; Electron packaging, OpenTelemetry, and Docker/K8s integration on the roadmap.

ReactViteTypeScriptExpressAI Tooling

↗ github.com/MahendraRao/clawbridge

Coming Soon · Python / OTEL In Progress

OTel Debug Companion

A CLI tool that interrogates your OpenTelemetry pipeline — checking collector health, trace ingestion rates, span drop rates, and SDK configuration — and surfaces actionable diagnostics. The runbook I always wished existed during enterprise escalations.

OpenTelemetryPythonObservabilityCLI

↗ Coming soon on GitHub

Coming Soon · GitHub Actions In Progress

GCP Cloud Run Pipeline Template

A production-ready GitHub Actions workflow template for deploying containerized apps to GCP Cloud Run — with built-in secret management, smoke tests, rollback on failure, and Slack notifications. Built during upskilling, now being polished for open source.

GitHub ActionsGCPCloud RunDocker

↗ Coming soon on GitHub

Observability · Middleware.io

Zero-to-OpenTelemetry Enterprise Onboarding

Designed and executed a structured onboarding program for enterprise customers adopting OpenTelemetry from scratch. Reduced time-to-value from weeks to days by building instrumentation guides, debug runbooks, and live troubleshooting playbooks for Go, Python, and Node.js stacks.

OpenTelemetryOnboardingTime-to-Value

Process Design · Ola Electric

Escalation Framework Built from Ground Zero

Built a tiered escalation framework and knowledge infrastructure at a company scaling at breakneck speed. Defined SLAs, trained L1/L2 agents, and created engineering feedback loops that led to measurable reduction in repeat incidents across the product lifecycle.

Escalation DesignSLATraining

Enterprise Success · Sprinklr

Fortune 500 Account Success Program

Managed strategic success programs for Fortune 500 accounts on the Sprinklr CXM platform. Delivered impactful QBRs, drove feature adoption, and acted as the technical-business bridge during critical escalations — converting at-risk relationships into multi-year renewals.

QBRAdoptionRenewal

// 06 — Writing

Thoughts on
the craft.

Coming Soon

OpenTelemetry in Production: What Nobody Tells You

Lessons from instrumenting real production systems — the gotchas, the sampling edge cases, and why your traces are probably lying to you.

Let's build
something together.

Open to DevOps Engineer, SRE, Cloud TSE, Solutions Engineer, and Platform Engineering roles — especially in observability, cloud infrastructure, or enterprise SaaS. Based in Bengaluru · Open to remote & hybrid.

nkneelkumar [at] gmail [dot] com

↗ LinkedIn ↗ GitHub ↗ Twitter / X ↗ Resume (PDF)

MahendraRao.

The engineerbehind the uptime.

Core Stack

SRE & DevOps Practices

Currently Targeting

What Ibuild with.

Where I'vebuilt my craft.

Tools ofthe trade.

Things I'veactually shipped.

Thoughts onthe craft.

Let's buildsomething together.