Status: Operational // System: KRN-01

KiranGarud[DevOps & Platform Engineer]

Building and operating reliable, automated, and scalable cloud infrastructure and deployment systems.

Value Statement

Focused on platform engineering, automated deployment pipelines, and observability to maintain system reliability and operational stability.I work with a production-first approach — prioritizing infrastructure automation, clear system boundaries, and service availability across all layers.

node_01 // status: active
OUT_Initializing platform...
OUT_Establishing secure connection...
OUT_Ready.
CMD_$
protocol: ssh_secure
latency: 12ms
REF_001
COORD: 1 // 40.7128° N

Mindset & Intent

ENGINEERING IDENTITY

Professional Profile & Technical Background

SYS_SPEC: DEVOPS-PLATFORM

A DevOps and Platform Engineer focused on building reliable, automated cloud infrastructure and scalable deployment systems.

My technical foundation is built on hands-on infrastructure work, covering cloud architecture, container orchestration, deployment automation, and system monitoring across real environments.

I build with a focus on cloud-native patterns, CI/CD integration, observability pipelines, and access security — with attention to how components behave together under load.

I approach distributed systems by breaking down complexity, understanding how things fail, and learning to fix them to maintain consistent service availability.

Every deployment decision considers observability, scalability, and long-term maintainability — ensuring systems remain operable as workloads and requirements evolve.

I reduce manual work by building reliable automation workflows that improve deployment consistency and reduce intervention across the infrastructure lifecycle.

I apply the same engineering standards to development and test environments as to production — because system behavior in validation directly affects production outcomes.

Engineering Philosophy

"Reliable systems are built through infrastructure automation, continuous observability, and consistent operational discipline."

Operational_Bias

Focused on building platform systems that support service reliability, security controls, and consistent scalability — starting from strong fundamentals and growing from there.

Journey_Mindset

Technical skills develop through implementation, hands-on testing, and iterative refinement across varied infrastructure and deployment environments.

REF_002
COORD: 2 // 40.7128° N

Engineering Capabilities

OPERATIONAL SCOPE

Core Skills & Technical Expertise

01

AWS Cloud Infrastructure & DevOps Engineering

  • Cloud Architecture: Designing and provisioning VPC environments, EKS clusters, and cloud networking on AWS.
  • Kubernetes Orchestration: Deploying and managing containerized workloads and microservices across Kubernetes clusters.
  • CI/CD Automation: Building GitOps-based deployment pipelines to automate the software delivery process.
  • Infrastructure as Code: Provisioning and managing cloud resources using version-controlled IaC with Terraform.
  • Architectural Validation: Applying AWS architecture best practices as an AWS Certified Solutions Architect – Associate.
02

AI Systems & LLM Infrastructure

  • Private AI Deployment: Hosting and running self-managed LLM platforms on Kubernetes with optimized compute configurations.
  • Resource-Optimized Infrastructure: Designing compute environments suited to resource-constrained and bare-metal deployments.
  • Workflow Automation: Building custom tooling and AI-assisted workflows to support software development processes.
  • System Analysis: Using AI-assisted approaches to document, review, and improve complex system designs.
03

Distributed Systems & Application Development

  • Platform Interfaces: Building responsive frontends using Next.js to interact with cloud service backends.
  • Backend Services: Developing API layers and service communication using NestJS and Node.js.
  • Data Layer: Configuring relational and NoSQL databases with caching strategies for persistent, scalable storage.
  • End-to-End Reliability: Connecting application layers with infrastructure to maintain consistent system behavior and user experience.
04

Cloud Security & DevSecOps

  • Network Security: Configuring network segmentation, security groups, and encrypted communication between services.
  • Identity & Access Management: Applying IAM policies and role-based access controls across cloud environments.
  • Compliance & Governance: Structuring infrastructure to meet security standards and operational requirements.
05

Observability & Site Reliability Engineering

  • System Monitoring: Implementing metrics, dashboards, and health checks for distributed service environments.
  • Centralized Logging: Configuring log aggregation pipelines for visibility into application and infrastructure behavior.
  • Reliability Operations: Monitoring system performance, identifying bottlenecks, and contributing to service availability.
SYS_NOTE

These areas reflect a consistent focus on infrastructure automation, deployment reliability, and platform operations. The goal is building systems that are observable, maintainable, and built to scale.

03

TECHNOLOGY STACK

01

CLOUD PLATFORMS

Working with AWS, GCP, and Azure to provision cloud environments, manage networking, and configure core platform services.

AWSGCPAzure
02

CONTAINERS & ORCHESTRATION

Building and running containerized applications using Docker, managing workload deployment and scaling with Kubernetes.

DockerKubernetesHelm
03

CI/CD & AUTOMATION

Designing deployment pipelines and infrastructure automation workflows to support continuous integration and delivery.

JenkinsGitHub ActionsArgoCDTerraform
04

OBSERVABILITY

Setting up monitoring, logging, and alerting systems to track service health and investigate issues in distributed environments.

PrometheusGrafanaELKCloudWatch
05

SECURITY & ACCESS

Applying IAM policies, secret management, and security scanning to control access and maintain a secure infrastructure posture.

IAMIRSASecrets ManagerImage Scanning
06

AI & ENGINEERING TOOLS

Using AI-assisted tools to support infrastructure planning, code review, documentation, and development efficiency.

ChatGPTGeminiCopilotCursor
REF_004
COORD: 4 // 40.7128° N

Certifications & Learning

VALIDATION LAYERS

Tools, Certifications & Tech Stack

Learning_Mindset

My technical knowledge is developed through a combination of structured study and hands-on implementation. Industry certifications provide a reference framework, which is then validated through actual infrastructure deployments and system-level projects.

I build operational understanding iteratively: concepts are prototyped in sandbox environments, tested under realistic conditions, and refined through practical troubleshooting.

Beyond certification coverage, I focus on understanding how systems work at the component level — including failure behavior, performance characteristics, and operational trade-offs.

I stay current by following evolving DevOps practices and cloud-native patterns through continuous self-directed learning and hands-on experimentation.

Applied Methodology

I follow an applied learning methodology — infrastructure patterns are reinforced through direct implementation, continuous testing, and iterative refinement rather than passive study.

Each engineering cycle covers: system design, infrastructure provisioning, failure analysis, performance review, and documentation of findings.

I test platforms under simulated failure conditions and varied observability configurations to build a practical understanding of system behavior under stress.

This approach develops an SRE-aligned perspective — focused on system reliability, operational visibility, and maintainable infrastructure design.

AWS Certified Solutions Architect – Associate

Amazon Web Services // 2024

Validates knowledge of AWS services, cloud architecture patterns, secure networking, and designing scalable, cost-effective cloud solutions.

AWS & DevOps Professional Training Program

Structured Certification Track

Completed a structured training program covering infrastructure automation, CI/CD pipeline design, container orchestration, and core DevOps practices.

Diploma in AWS with Python

Academic Certification Program

Completed an AWS-focused program covering cloud infrastructure fundamentals, Python-based automation scripting, and cloud resource management.

Upcoming Infrastructure Validation Queue

Active Focus Areas

Active learning areas include SRE practices, security automation, Kubernetes cluster management, and distributed systems reliability.

REF_003
COORD: 3 // 40.7128° N

Featured Projects

ENGINEERING WORK

Projects & Hands-On Implementations

KG WORK LearnSphere

LearnSphere

Foundation Infrastructure — Single-AZ AWS Setup

Problem Statement

How do you validate enterprise-grade infrastructure patterns — Kubernetes orchestration, GitOps delivery, and observability — without production-level budgets?

Sole DevOps Engineer — Designed, built, and deployed complete infrastructure and CI/CD pipelines
4 months · Local development + AWS testing
View Source
48Services Designed
60+Active Pod Targets
1-AZAWS Availability Zone
4moDevelopment Cycle

LearnSphere is a self-hosted learning platform built to validate core infrastructure patterns in a controlled environment. The system uses a microservices architecture where each service runs in its own container, enabling focused testing of infrastructure components including identity management, media processing, and service observability. The infrastructure is scoped to a single AWS Availability Zone to cover the full deployment lifecycle while managing cost. This setup supported hands-on work with Kubernetes cluster configuration, VPC networking, CI/CD pipeline design, and monitoring — without the complexity of multi-AZ failover. A key focus was automating infrastructure provisioning and deployment workflows. The project includes an event-driven media processing pipeline using serverless functions, and all cloud resources are managed using Infrastructure as Code to maintain consistency across environments.

ARCHITECTURE VISUALIZATION

This visualization maps the core traffic flow from the internet gateway through application load balancers into isolated Kubernetes namespaces. It highlights the separation of stateful and stateless components, internal service routing, and the GitOps deployment synchronization layer.

AWS Architecture
AWS Architecture

DEPLOYMENT PIPELINES — 4 TOTAL

PIPELINE 01

Jenkins Shared Library — CI

Centralizes CI automation across all microservices. Multi-stage pipeline handles compilation, SAST security scans, image building, and vulnerability checks before pushing validated images to ECR. Ensures only tested, secure container artifacts reach deployment.

01Code Commit
GitHub

Push to feature branch

02Jenkins Triggered
Webhook

Pipeline auto-triggered

03Compilation
NodeJS

Code compilation check

04Dependency Check
npm audit

Package validation

05Code Analysis
SonarQube

Quality & standards

06Security Scan
Trivy / Snyk

SAST security check

07Vuln Check
OWASP

Known CVE detection

08Multi-Stage Build
Docker/Podman

Optimized image size

09Image Scan
Trivy

Container vuln scan

10Push Registry
ECR / Artifact

Verified image pushed

PIPELINE 02

ArgoCD GitOps — CD

Establishes declarative delivery and drift detection. Polls Git for state changes, renders Helm charts, validates manifests, and syncs to EKS. Eliminates manual deployment errors, provides audit trail, and enables instant rollback.

01Image Tagged
ECR

New image available

02Helm Updated
Helm Chart

Image tag in values.yaml

03Git Commit
GitHub

Manifest committed

04Drift Detected
ArgoCD

Cluster vs Git diff

05Final Image Scan
Trivy

Last security check

06Helm Validate
Helm

Chart render & check

07Deploy to EKS
Kubernetes

Pods to namespace

08Health Check
K8s Probes

Liveness & readiness

09Auto Rollback
ArgoCD

Revert if failed

10Sync Confirmed
ArgoCD

Cluster = Git state

PIPELINE 03

Database Container Pipeline — 4 Instances

Provisions 4 StatefulSets (PostgreSQL, MongoDB, Redis, DynamoDB) with dedicated PVCs and automated backup policies. Each database serves a specific domain, minimizing cross-service data coupling.

01Schema Definition
Prisma / SQL

Per service domain

02PostgreSQL
Instance 1

Users, courses, enrollments

03MongoDB
Instance 2

Content, media metadata

04Redis
Instance 3

Cache, sessions, API

05DynamoDB
Instance 4

Events, notifications

06PVC Configured
Kubernetes

Per DB persistent storage

07StatefulSet Deploy
Kubernetes

Stable pod identity

08Backup Policy
AWS Backup

Auto schedule per instance

PIPELINE 04

Video Processing Pipeline — S3 + Lambda

Automates media processing via event-driven serverless architecture. S3 uploads trigger EventBridge, invoking Lambda functions for FFmpeg transcoding, thumbnail generation, and metadata extraction. Delivers multi-resolution HLS/DASH streams.

01Video Upload
S3 bucket

Raw video input bucket

02S3 Trigger
EventBridge

Upload event fired

03Validator
Lambda

Format, size, codec check

04Transcoder
Lambda + FFmpeg

1080p, 720p, 480p

05Thumbnail Gen
Lambda

Key frame thumbnails

06Metadata Extract
Lambda

Duration, resolution, codec

07Output to S3
S3 Bucket

HLS/DASH segments

08Notifier
Lambda + SNS

Downstream services notified

09Metadata Update
DynamoDB

Video record written

10Job Complete
DynamoDB

Status tracked end-to-end

MICROSERVICES ARCHITECTURE

48 Services Designed

Core Platform

Manages identity, access control, and centralized API routing.

api-gatewayauth-serviceuser-servicerbac-serviceadmin-serviceorganization-service

Learning & Content

Handles educational workflows, curriculum delivery, and student progress tracking.

course-servicecontent-serviceenrollment-serviceprogress-serviceassessment-serviceassignment-servicecertificate-servicereview-servicegamification-service

Media & Streaming

Powers video delivery, document processing, and live streaming capabilities.

media-servicevideo-streaming-servicevideo-processing-servicelive-streaming-servicedoc-mgmt-service

Communication

Facilitates messaging, notifications, and community interactions.

notification-serviceemail-servicechat-serviceforum-servicesupport-service

Payments & Business

Handles payment processing, subscription management, and billing logic.

payment-servicebilling-servicesubscription-servicecart-servicecoupon-serviceaffiliate-servicewaitlist-service

Analytics & Intelligence

Aggregates telemetry, user behavior data, and powers internal search.

analytics-servicereporting-serviceexport-servicesearch-servicerecommendation-serviceai-servicetranslation-service

Platform & Operations

Supports feature toggling, audit logging, and cross-system integrations.

moderation-serviceaudit-servicewebhook-serviceintegration-servicefeature-flag-servicemarketing-servicecalendar-servicesurvey-servicecode-exec-service

ARCHITECTURE DECISIONS

01

Single-AZ over Multi-AZ Setup

Opted for single Availability Zone to focus on validating orchestration patterns without the cost overhead of multi-AZ replication. This reduced AWS spend significantly while still demonstrating full CI/CD, monitoring, and deployment workflows.

02

GitOps Strategy over Imperative kubectl

Selected ArgoCD-driven GitOps to enforce infrastructure immutability. Every cluster state change is version-controlled in Git, providing a full audit trail, preventing configuration drift, and enabling deterministic rollbacks.

03

Jenkins Pipeline over Managed CI/CD

Self-hosted Jenkins with a custom Shared Library provided full control over build environments, plugin selection, and local container caching — trade-offs worth the additional compute management overhead.

04

Terraform Modularity over Monolithic Config

Decoupled infrastructure into separate Terraform modules (VPC, EKS, NodeGroups, IAM). This enabled rapid iteration — individual components could be modified and tested independently without affecting the rest of the stack.

CHALLENGES & SOLUTIONS

Observability across 48 services

Challenge

Operating dozens of services created fragmented logs and isolated metrics, making incident triage difficult.

Solution

Deployed a centralized Prometheus + Grafana stack with per-namespace dashboards and alerting rules.

Outcome

Enabled quick correlation of health probe failures and network issues.

Controlling AWS costs during testing

Challenge

Dense microservice deployments escalated cloud spend during validation windows.

Solution

Constrained the cluster to single-AZ, integrated EC2 Spot instances, and built automated Terraform destroy scripts for post-testing cleanup.

Outcome

Kept total runtime costs under budget.

Inter-service DNS resolution failures

Challenge

Complex network policies blocked legitimate API calls across namespaces.

Solution

Debugged CoreDNS resolution chains and recalibrated overly restrictive Calico network policies.

Outcome

Restored internal service discovery while maintaining security boundaries.

GitOps drift from manual debugging

Challenge

Ad-hoc manual cluster changes during debugging caused sync conflicts with the Git state.

Solution

Enforced a strict GitOps-only policy, removing all direct cluster mutation permissions.

Outcome

Eliminated undocumented drift entirely.

KEY LEARNINGS

Resource governance is non-negotiable at scale

Running many services reinforced that stability depends on proper namespace segmentation, RBAC policies, network policies, and resource quotas. Constraints must be set proactively, not reactively after failures.

GitOps changes how you think about deployments

With ArgoCD, the workflow shifts from 'deploying to a cluster' to 'declaring desired state in Git.' This made deployments predictable, auditable, and safely reversible.

Observability is a prerequisite, not an add-on

Managing distributed services without centralized metrics and logging is effectively flying blind. Prometheus + Grafana integration must be part of the initial architecture, not bolted on later.

Modular IaC accelerates iteration

Decoupling Terraform into modules meant I could tear down and rebuild individual components (networking, compute, IAM) independently. This dramatically sped up staging validation cycles.

FULL TECH STACK

Cloud & Infrastructure
AWSEKSEC2 SpotECRS3IAMVPCEventBridgeLambda
DevOps & Orchestration
KubernetesTerraformDockerJenkinsArgoCDHelmFFmpeg
Observability
PrometheusGrafanaCloudWatchSNS
Backend & Data
NestJSNode.jsTypeScriptPostgreSQLMongoDBRedisDynamoDBRabbitMQPrisma
Previous Project
Repo
LAB_EXPERIMENTS // OVERVIEW

Build. Deploy. Learn. Repeat.

localhost:5173
LIVE
EXP_01
↑ Scroll to explore full page ↓

Learning Hub

REACT / NODE

Full-stack containerized app with microservices architecture and reverse proxy routing via Nginx.

DockerNginxNode.jsReact

Deployment

Docker + Nginx

Learned

  • Container networking
  • Reverse proxy config
  • Multi-service Docker

Links

Project 01 of 06
interior.dev
LIVE
EXP_02
↑ Scroll to explore full page ↓

Interior Designer

NEXT.JS

Server-side rendered application with automated preview deployments triggered on each pull request via Vercel.

VercelCI/CDSSRNext.js

Deployment

Vercel CI/CD

Learned

  • Automated deployments
  • Preview environments
  • Server-side rendering

Links

Project 02 of 06
restaurant.local
LIVE
EXP_03
↑ Scroll to explore full page ↓

Restaurant Site

HTML / JS

Static site hosted on S3 with CloudFront distribution, custom domain, and SSL certificate configured.

AWS S3CloudFrontSSLCDN

Deployment

S3 + CloudFront

Learned

  • S3 Static Hosting
  • CDN Distribution
  • SSL/TLS Management

Links

Project 03 of 06
saas.example.com
LIVE
EXP_04
↑ Scroll to explore full page ↓

SaaS Prototype

NEXT.JS / POSTGRES

Serverless container deployment using ECS Fargate with a managed RDS backend, ALB routing, and auto-scaling.

ECSFargateRDSALB

Deployment

ECS Fargate

Learned

  • Serverless containers
  • Managed databases
  • Load balancer routing

Links

Project 04 of 06
portfolio-v1.dev
EXP_05
Portfolio v1 deployment screenshot
↑ Scroll to explore full page ↓

Portfolio v1

GATSBY

JAMstack site with an automated GitHub Actions pipeline handling builds and deployments on each commit.

GitHub ActionsCI/CDJAMstack

Deployment

GitHub Pages

Learned

  • GitHub Actions
  • JAMstack concepts
  • Automated releases

Links

Project 05 of 06
portfolio-v2.dev
EXP_06
Portfolio v2 deployment screenshot
↑ Scroll to explore full page ↓

Portfolio v2

NEXT.JS / TAILWIND

Edge-deployed Next.js site using incremental static regeneration and image optimization for fast global delivery.

VercelEdgeISRNext.js

Deployment

Vercel Edge

Learned

  • Edge computing
  • Image optimization
  • ISR caching strategies

Links

Project 06 of 06
REF_007
COORD: 7 // 40.7128° N

Momentum

CURRENT SPRINT

01

Platform Reliability

Reviewing and improving platform configurations to support consistent service availability and operational stability.

02

Deployment Safety

Applying staged rollout strategies, canary deployments, and automated rollback to reduce risk during releases.

03

Kubernetes Operations

Building deeper knowledge of cluster management, scheduling, and networking across Kubernetes environments.

04

CI/CD Feedback Loops

Improving build and test feedback cycles to detect issues earlier and speed up the delivery process.

REF_008
COORD: 8 // 40.7128° N

Collaboration

ECOSYSTEM

Professional Presence & Platforms

Participating in DevOps and cloud infrastructure communities, reviewing architectural decisions, and studying real-world production incidents and post-mortems.

#DevOps_Community#Architecture_Reviews#Cloud_Native

Growth Strategy

Learn, implement, review — applying concepts through direct practice before moving forward.
Revisiting cloud networking and security fundamentals to build a stronger operational foundation.
Testing infrastructure patterns in sandbox environments before applying them to larger systems.

INITIATE
CONTACT

Open to DevOps and Infrastructure Engineering opportunities.

Open to opportunities in DevOps, Platform Engineering, Cloud Infrastructure, and Site Reliability Engineering roles.