Open to DevOps · SRE · Cloud TSE roles

Mahendra
Rao.

"Turning system noise into customer confidence — one signal at a time."

scroll to explore

// 01 — About

The engineer
behind the uptime.

I'm a DevOps & SRE professional with over 11 years of experience operating production-grade SaaS platforms, cloud-native infrastructure, and B2B API ecosystems — at the precise intersection where engineering depth meets customer trust.

I specialize in Kubernetes, CI/CD automation, OpenTelemetry-based observability, and incident ownership — helping teams move from reactive firefighting to proactive platform intelligence. Whether designing a GitHub Actions pipeline to GCP Cloud Run, debugging a latency anomaly through distributed traces, or translating a crash-loop backtrace into a board-level risk narrative, I bridge deep technical craft with business impact.

Most recently at Middleware.io, I owned end-to-end observability support for 50+ enterprise accounts — becoming the primary escalation point for complex APM, trace, and metric debugging. Before that, I built CI/CD pipelines and led a 24/7 L3 SRE team at Ola Electric, and delivered Fortune 500 technical account success at Sprinklr.

11+
Years in the field
35%
MTTR reduction at Middleware
40%
Deploy velocity gain at Ola
50+
Enterprise accounts owned

Core Stack

Kubernetes GCP / GKE Docker OpenTelemetry Prometheus Grafana GitHub Actions Azure DevOps Linux

SRE & DevOps Practices

Incident Management RCA / Post-mortems SLO / SLI CI/CD Pipelines On-call Ownership IaC (Terraform)

Currently Targeting

Google Cloud TSE DevOps Engineer SRE Solutions Engineering Platform Engineering

// 02 — Tech Stack

What I
build with.

Infrastructure
Kubernetes Docker GCP (GKE · Cloud Run) Azure Terraform Helm Linux
CI/CD & Delivery
GitHub Actions Azure DevOps Jenkins GitOps Infrastructure as Code
Observability
OpenTelemetry Prometheus Grafana Middleware.io Dynatrace New Relic Splunk ELK
Languages
Python Bash TypeScript Node.js YAML JSON
// production cloud: GCP & Azure · currently learning: AWS (SAA-C03)

// 03 — Experience

Where I've
built my craft.

March 2025 — December 2025
Middleware.io
Technical Support Engineer · Cloud Observability

Owned production observability and troubleshooting for 50+ global enterprise accounts across APM, logs, metrics, traces, and RUM. Debugged complex distributed systems — latency spikes, CPU/memory pressure, trace gaps — using OpenTelemetry, Prometheus, and Grafana, reducing MTTR by 35%. Reproduced customer workloads in Node.js, Python, Kubernetes, and Cloud Run to isolate root causes and drive engineering fixes. Translated production issues into actionable product feedback, directly influencing the OpsAI and Continuous Profiling roadmap. Authored technical runbooks and best-practice guides that improved customer self-service and platform stability.

OpenTelemetry APM Distributed Tracing Prometheus Grafana Kubernetes Cloud Run Python Node.js
April 2023 — February 2025
Career Development & Upskilling
Full-time parent · Active upskilling in DevOps & Cloud

Took a planned career break for full-time parenting while actively building hands-on DevOps skills. Shipped GitHub Actions pipelines deploying to GCP Cloud Run, built Prometheus/Grafana monitoring stacks with Slack-integrated alerting, and practiced infrastructure-as-code with Terraform and Docker Compose. Pursued Microsoft Power BI Data Analyst certification (completed) and progressed toward Google Cloud Professional DevOps Engineer, CKA, and CKAD.

GitHub Actions GCP Cloud Run Prometheus Grafana Docker Terraform Kubernetes
April 2022 — April 2023
Ola Electric
L3 Product Support Engineer · SRE / DevOps

Owned production reliability and incident response for an OTA vehicle software platform running large-scale 24/7 deployments. Designed CI/CD pipelines on Azure DevOps — improving deployment velocity by 40% and cutting manual release steps by 80%. Built observability dashboards across Grafana, Dynatrace, and Splunk, and managed API Gateway integrations via Apigee. Led incident management and RCA workflows, reducing MTTR by 40% through automated runbooks and post-mortem process improvements. Scaled and mentored the 24/7 L3 support team from 7 to 20 engineers — establishing on-call rotations, SLA frameworks, and training programs.

Azure DevOps CI/CD Grafana Dynatrace Splunk Apigee SRE Incident Management RCA
May 2019 — April 2022
Sprinklr
Technical Account Manager (TAM)

Served as technical SPOC for Fortune 500 enterprise accounts — Apple, Microsoft, Dell, Samsung, UBS — maintaining 95%+ CSAT. Led B2B API integrations across Salesforce, Slack, Zendesk, and Adobe Analytics, ensuring zero-downtime migrations. Managed a GetSatisfaction → Sprinklr Communities platform migration with SSO/SAML configurations for 100K+ users. Conducted RCAs for critical incidents, mentored junior engineers, and improved operational best practices across the team.

Salesforce API OAuth2 / SAML B2B Integrations SSO Enterprise SaaS Fortune 500 95%+ CSAT
2014 — 2019
Plivo · Movate · IBM · Aditya Birla
Enterprise APIs · IVR/VoIP · Production Support

5 years building a strong foundation across enterprise API support, IVR/VoIP platform operations, and production troubleshooting at scale. Worked across telecom, IT services, and conglomerate verticals — developing the systematic debugging mindset and customer-facing technical depth that powers everything since.

Enterprise APIs IVR / VoIP Production Troubleshooting Telecom

// 04 — Skills

Tools of
the trade.

📡
Observability & APM
End-to-end telemetry — metrics, logs, traces, RUM. Deep expertise in OpenTelemetry, Prometheus, Grafana, Dynatrace, New Relic, Splunk, and ELK at enterprise scale.
☁️
Google Cloud Platform
GCE, GKE, Cloud Run, Cloud Logging & Monitoring, IAM. Comfortable across the full Compute stack and cloud-native architecture patterns.
🐳
Kubernetes
Pod debugging, resource constraints, HPA, PVCs, node affinity, DaemonSets. Skilled at diagnosing cluster-level failures and translating them into stakeholder-legible narratives.
⚙️
CI/CD & DevOps
GitHub Actions, Azure DevOps, Jenkins. Designed pipelines that cut deployment time by 40% and eliminated 80% of manual release steps at production scale.
🐧
Linux Systems
Performance analysis, process management, filesystem diagnostics, systemd, cgroups, kernel parameters. Comfortable operating in production at any hour.
🌐
Networking
TCP/IP, DNS, HTTP/S, TLS, VPCs, load balancers, firewalls. Can trace a packet from browser to backend — and explain exactly where and why it got dropped.
🐍
Python & Scripting
Python for automation, API testing, monitoring integrations, and tooling. Bash for ops scripts, diagnostic pipelines, and environment bootstrapping.
🔐
Security & IAM
OAuth2, OIDC, SAML, SSO. Experience with API Gateway configurations (Apigee), enterprise SSO migrations for 100K+ users, and zero-downtime auth integrations.
🤝
Customer Success & SRE
QBRs, adoption metrics, churn prevention, escalation leadership. Scaled L3 SRE teams from 7→20 engineers. Proven track record of turning at-risk accounts into long-term champions.

// 05 — Projects & Open Source

Things I've
actually shipped.

GitHub · HTML / Three.js Live
This Portfolio

A production-grade personal portfolio site with a Three.js 3D hero, scroll-triggered animations, amber/charcoal dark theme, and full mobile responsiveness. Built from scratch — no templates.

Three.jsCSS AnimationsVercel
↗ github.com/MahendraRao/mahendra-portfolio
GitHub · Three.js Live
3D Humpty — Interactive Character

An interactive 3D character experience built with Three.js. A dancing Humpty Dumpty you can control — built to sharpen my 3D rendering and real-time animation skills in the browser.

Three.js3D AnimationInteractive
↗ github.com/MahendraRao/3d-humpty
GitHub · HTML / JS ★ 5 Stars
Consciousness

An exploratory creative coding experiment — pushing the boundaries of what pure HTML and JavaScript can express. One of my most starred personal projects.

Creative CodingHTML CanvasGenerative
↗ github.com/MahendraRao/consciousness
GitHub · React + TypeScript Live MVP
ClawBridge

A GitHub-hosted AI wrapper for OpenClaw — built as a React + Vite frontend and local Express + TypeScript API. Simplifies installation, system diagnostics, provider setup (OpenAI / Anthropic / Ollama / custom), and first-run onboarding for non-coders. Working MVP today; Electron packaging, OpenTelemetry, and Docker/K8s integration on the roadmap.

ReactViteTypeScriptExpressAI Tooling
↗ github.com/MahendraRao/clawbridge
Coming Soon · Python / OTEL In Progress
OTel Debug Companion

A CLI tool that interrogates your OpenTelemetry pipeline — checking collector health, trace ingestion rates, span drop rates, and SDK configuration — and surfaces actionable diagnostics. The runbook I always wished existed during enterprise escalations.

OpenTelemetryPythonObservabilityCLI
↗ Coming soon on GitHub
Coming Soon · GitHub Actions In Progress
GCP Cloud Run Pipeline Template

A production-ready GitHub Actions workflow template for deploying containerized apps to GCP Cloud Run — with built-in secret management, smoke tests, rollback on failure, and Slack notifications. Built during upskilling, now being polished for open source.

GitHub ActionsGCPCloud RunDocker
↗ Coming soon on GitHub
Observability · Middleware.io
Zero-to-OpenTelemetry Enterprise Onboarding

Designed and executed a structured onboarding program for enterprise customers adopting OpenTelemetry from scratch. Reduced time-to-value from weeks to days by building instrumentation guides, debug runbooks, and live troubleshooting playbooks for Go, Python, and Node.js stacks.

OpenTelemetryOnboardingTime-to-Value
Process Design · Ola Electric
Escalation Framework Built from Ground Zero

Built a tiered escalation framework and knowledge infrastructure at a company scaling at breakneck speed. Defined SLAs, trained L1/L2 agents, and created engineering feedback loops that led to measurable reduction in repeat incidents across the product lifecycle.

Escalation DesignSLATraining
Enterprise Success · Sprinklr
Fortune 500 Account Success Program

Managed strategic success programs for Fortune 500 accounts on the Sprinklr CXM platform. Delivered impactful QBRs, drove feature adoption, and acted as the technical-business bridge during critical escalations — converting at-risk relationships into multi-year renewals.

QBRAdoptionRenewal

// 06 — Writing

Thoughts on
the craft.

Coming Soon
OpenTelemetry in Production: What Nobody Tells You

Lessons from instrumenting real production systems — the gotchas, the sampling edge cases, and why your traces are probably lying to you.

Read more →
Coming Soon
The CI/CD Pipeline That Saved 80% of Our Manual Work

How we redesigned the OTA software deployment pipeline at Ola Electric — and what most teams get wrong about release velocity vs. release safety.

Read more →
Coming Soon
The Kubernetes Debugging Playbook I Wish Existed on Day One

A field-tested guide to pod failures, resource starvation, and network black holes — written from a support engineer's perspective, not a textbook author's.

Read more →

// 07 — Contact

Let's build
something together.

Open to DevOps Engineer, SRE, Cloud TSE, Solutions Engineer, and Platform Engineering roles — especially in observability, cloud infrastructure, or enterprise SaaS. Based in Bengaluru · Open to remote & hybrid.

nkneelkumar [at] gmail [dot] com
↗ LinkedIn ↗ GitHub ↗ Twitter / X ↗ Resume (PDF)