v1.35 · Complete Reference

KubernetesEncyclopedia

// container orchestration at scale — every component, every concept, every YAML

150+Topics

50+YAML Examples

18Core Areas

01 / Overview

📖 Official Docs: Kubernetes Overview Cluster Architecture Components Kubernetes API

Cluster Architecture

// A Kubernetes cluster consists of a control plane and one or more worker nodes. The control plane manages the cluster state; worker nodes run containerized workloads.

// kubernetes cluster topology

⚙ Control Plane

kube-apiserver

etcd

kube-scheduler

kube-controller-manager

cloud-controller-manager

CoreDNS

⇄

⬡ Worker Node(s)

kubelet

kube-proxy

Container Runtime (CRI)

Pods

CNI Plugin

CSI Driver

// request lifecycle

👤

kubectl / Client

→

🔐

API Server Auth

→

💾

etcd Write

→

📋

Scheduler

→

🖥

Node / kubelet

→

📦

Container Running

02 / Core Components

📖 Official Docs: Cluster Components kube-apiserver etcd kube-scheduler kube-controller-manager kubelet kube-proxy

Control Plane Components

// The components that form the cluster's "brain" — responsible for global decisions and responding to cluster events.

🌐

kube-apiserver

API Server

The front-end to the Kubernetes control plane. Exposes the Kubernetes API over HTTPS. All communication goes through it — kubectl, controllers, kubelets. Horizontally scalable. Validates and processes REST requests, then updates etcd.

REST API Authentication Authorization Admission Control

🗄️

etcd

Distributed Key-Value Store

Consistent, highly-available key-value store used as the backing store for all cluster data. Stores the entire cluster state. Uses Raft consensus algorithm. Only the API server communicates with etcd directly. Must be backed up regularly.

Raft Consensus Cluster State HA

📋

kube-scheduler

Scheduler

Watches for newly created Pods with no assigned node. Selects a node for them to run on. Factors in: resource requirements, hardware/software constraints, affinity/anti-affinity, data locality, inter-workload interference, and deadlines.

Node Selection Affinity Taints & Tolerations

🔄

kube-controller-manager

Controller Manager

Runs controller processes. Logically each controller is a separate process, but compiled into a single binary. Includes: Node controller, Replication controller, Endpoints controller, Service Account & Token controllers, Job controller.

Reconciliation Control Loop Multiple Controllers

☁️

cloud-controller-manager

Cloud Controller Manager

Embeds cloud-specific control logic. Lets you link your cluster into your cloud provider's API. Only runs controllers specific to your cloud: Node, Route, and Service controllers. Separates cloud-specific code from core Kubernetes code.

AWS / GCP / Azure Load Balancers

Node Components

// Run on every node, maintaining running pods and providing the Kubernetes runtime environment.

🤖

kubelet

Node Agent

An agent that runs on each node. Ensures containers in a Pod are running and healthy. Takes a set of PodSpecs (provided through API server) and ensures the described containers are running. Does NOT manage containers not created by Kubernetes.

PodSpec Health Checks CRI Interface

🔗

kube-proxy

Network Proxy

A network proxy that runs on each node, implementing part of the Kubernetes Service concept. Maintains network rules that allow communication to Pods from inside or outside the cluster. Uses OS packet filtering layer (iptables/ipvs) if available.

iptables IPVS Service Proxy

📦

Container Runtime

CRI (Container Runtime Interface)

Software responsible for running containers. Kubernetes supports any runtime implementing the CRI. Options: containerd (default), CRI-O, Docker (via cri-dockerd). Handles image pulling, container lifecycle, and resource isolation.

containerd CRI-O OCI Standard

Addons

🌍

CoreDNS

Cluster DNS

A flexible, extensible DNS server that can serve as cluster DNS. Automatically assigns DNS names to Services and Pods. Every container inherits the cluster DNS server. Supports stub zones, forwarding, and plugin architecture.

Service DiscoveryDNS Resolution

📊

Metrics Server

Resource Metrics

Scalable, efficient source of container resource metrics (CPU, memory). Used by HPA and VPA for autoscaling decisions. Provides the metrics API. Not for monitoring or alerting purposes — use Prometheus for that.

HPAVPAAutoscaling

🖥

Dashboard

Web UI

General-purpose web-based UI for Kubernetes clusters. Manage applications running in the cluster, troubleshoot them, and manage the cluster itself. Deploy containerized applications and get an overview of cluster resources.

Web UIMonitoring

03 / Workloads & Deployments

📖 Official Docs: Workloads Pods Deployments StatefulSets DaemonSets Jobs CronJobs Init Containers Pod Lifecycle

Workload Resources

// Kubernetes offers several built-in workload resources for different application patterns and deployment strategies.

Resource	Type	Use Case	Key Features	Scaling
Pod	ATOMIC	Smallest deployable unit. One or more containers sharing network and storage.	Shared IP, shared volumes, sidecar pattern, init containers	Manual
Deployment	STATELESS	Web servers, APIs, microservices. Rolling updates and rollbacks.	ReplicaSet management, rolling update, rollback history	HPA / Manual
StatefulSet	STATEFUL	Databases (MySQL, PostgreSQL), Kafka, Zookeeper. Needs stable identity.	Stable network identity, ordered rollout, persistent volumes per pod	Manual
DaemonSet	DAEMON	Node-level agents: log collectors, monitoring agents, network plugins.	One pod per node, runs on all/selected nodes, node affinity	Node count
ReplicaSet	STATELESS	Maintain a stable set of replica Pods. Usually managed by Deployments.	Pod template, replica count, label selectors	HPA / Manual
Job	BATCH	One-time batch processing tasks. Database migrations, report generation.	Completion guarantee, parallelism, retry on failure	Fixed
CronJob	SCHEDULED	Recurring tasks. Backups, cleanup jobs, scheduled reports.	Cron schedule syntax, concurrency policy, history limits	Schedule-based

Pod Internals

🔬

Init Containers

Initialization Logic

Specialized containers that run and complete before app containers start. Used for setup tasks: waiting for dependencies, loading config, running migrations. Each init container must complete successfully before the next starts.

SequentialSetup Tasks

🚦

Probes

Health Checks

Liveness: Restarts container if it fails. Readiness: Removes from Service endpoints if not ready. Startup: Disables other probes until app starts. Types: HTTP, TCP, exec command, gRPC.

HTTPTCPExecgRPC

📏

Resources

Requests & Limits

Requests: Minimum guaranteed resources (used for scheduling). Limits: Maximum allowed usage. CPU is compressible (throttled), Memory is not (OOMKilled). QoS classes: Guaranteed, Burstable, BestEffort.

CPUMemoryQoS

🏷

Labels & Selectors

Object Metadata

Labels are key/value pairs attached to objects. Selectors filter objects by labels. Services use selectors to route traffic. Deployments use selectors to manage ReplicaSets. Annotations store non-identifying metadata (build info, contact info, etc.).

FilteringGroupingAnnotations

🧲

Affinity & Anti-Affinity

Pod Placement Rules

Node Affinity: schedule pods on specific nodes by labels. Pod Affinity: co-locate pods on same node. Pod Anti-Affinity: spread pods across nodes for HA. Required (hard) vs Preferred (soft) rules. Topology spread constraints for even distribution.

Node AffinityPod Affinity

🚫

Taints & Tolerations

Node Restrictions

Taints allow nodes to repel certain pods. Tolerations allow pods to schedule onto tainted nodes. Effects: NoSchedule, PreferNoSchedule, NoExecute. Use for dedicated nodes (GPU, SSD), node isolation, eviction of existing pods.

Dedicated NodesIsolation

04 / Networking

📖 Official Docs: Services & Networking Service Ingress Ingress Controllers NetworkPolicy DNS for Pods and Services EndpointSlices Gateway API

Kubernetes Networking Model

// Every Pod gets its own IP. All Pods can communicate with all other Pods without NAT. Nodes can communicate with all Pods. The IP a Pod sees itself as is the same IP others see it as.

🔵

ClusterIP Service

Default service type. Exposes service on an internal IP in the cluster. Only reachable from within the cluster. kube-proxy manages iptables rules to forward traffic to pod endpoints.

Virtual IP — never bound to any interface
DNS: my-svc.my-namespace.svc.cluster.local
Stable endpoint for pod-to-pod communication
Load balances across healthy endpoint pods

🟡

NodePort Service

Exposes the service on each Node's IP at a static port (30000–32767). Routes external traffic from NodeIP:NodePort to ClusterIP:Port. Useful for development and testing, not production.

Port range: 30000–32767 by default
Creates ClusterIP automatically
Accessible: <NodeIP>:<NodePort>
All nodes expose the port, even without pods

🟢

LoadBalancer Service

Exposes the Service externally using a cloud provider's load balancer. Creates NodePort and ClusterIP automatically. Provisions external IP via cloud-controller-manager. Works on AWS (NLB/ALB), GCP, Azure.

Provisions cloud load balancer automatically
Gets external-ip once provisioned
Supports annotations for cloud-specific config
MetalLB for bare-metal clusters

🔗

ExternalName Service

Maps a service to a DNS name (not a selector). Returns a CNAME record with the configured external name. No proxying. Useful for accessing external databases or services with a stable in-cluster DNS name.

CNAME mapping only — no proxy
Access external services by cluster-internal name
Useful for legacy system migration

🌐

Ingress

Manages external access to services in a cluster (usually HTTP/HTTPS). Provides load balancing, SSL/TLS termination, and name-based virtual hosting. Requires an Ingress controller (NGINX, Traefik, HAProxy, AWS ALB).

Path-based and host-based routing
TLS termination with cert-manager
NGINX, Traefik, Contour, Kong controllers
Annotations for controller-specific features

🛡

Network Policies

Specification of how groups of pods are allowed to communicate. Ingress and Egress rules using label selectors, namespace selectors, CIDR blocks. Default: all traffic allowed. With policies: deny all then allow selectively (zero trust).

Requires CNI plugin support (Calico, Cilium)
Ingress: who can talk TO pods
Egress: where pods can talk TO
Namespace isolation best practice

⬡

CNI Plugins

Container Network Interface — standard for configuring network interfaces in Linux containers. Kubernetes uses CNI plugins to set up pod networking. Each plugin implements the Kubernetes network model differently.

Calico — BGP, NetworkPolicy, eBPF
Cilium — eBPF-based, L7 policies, service mesh
Flannel — simple overlay, VXLAN
Weave Net — mesh networking

🔍

DNS & Service Discovery

CoreDNS provides DNS-based service discovery. Every Service gets a DNS A/AAAA record. Pods can resolve services by short name within same namespace, or FQDN across namespaces.

FQDN: svc.namespace.svc.cluster.local
Headless services: returns pod IPs directly
SRV records for named ports
ndots:5 search path configuration

🔀

Service Mesh

Layer on top of Kubernetes networking providing advanced features: mutual TLS, observability, traffic management, circuit breaking. Popular options: Istio, Linkerd, Consul Connect. Uses sidecar proxy pattern (Envoy).

Mutual TLS (mTLS) between services
Traffic shifting, canary deployments
Distributed tracing & telemetry
Circuit breaker, retry policies

05 / Security

📖 Official Docs: Security Authentication Authorization RBAC Pod Security Standards Secrets Best Practices Encryption at Rest Audit Logging

Kubernetes Security Model

// The 4C's of Cloud Native Security: Code, Container, Cluster, Cloud. Defense in depth — security at every layer.

Authentication (AuthN)

Who Are You?

Kubernetes has no User objects. Human users managed externally. Strategies: X.509 client certs, Bearer tokens, OpenID Connect (OIDC), Webhook token auth, Authentication proxy. ServiceAccounts for in-cluster processes with auto-mounted JWT tokens.

Authorization (AuthZ)

What Can You Do?

RBAC (Role-Based Access Control): Role + RoleBinding (namespaced), ClusterRole + ClusterRoleBinding (cluster-wide). Verbs: get, list, watch, create, update, patch, delete. Principle of least privilege. ABAC and Webhook modes also available.

RBAC

Role-Based Access Control

Fine-grained permissions on API resources. Roles define permissions, RoleBindings assign them to subjects (users, groups, ServiceAccounts). ClusterRoles for node, persistent volumes, namespace creation. Aggregate ClusterRoles using labels.

Admission Control

Policy Enforcement

Intercepts requests after auth/authz, before persistence. Mutating admission webhooks (modify), Validating admission webhooks (allow/deny). Built-in controllers: NamespaceLifecycle, LimitRanger, ResourceQuota, PodSecurity. OPA Gatekeeper, Kyverno for policy-as-code.

Secrets Management

Sensitive Data

Kubernetes Secrets store sensitive data: passwords, tokens, keys. Base64 encoded (NOT encrypted by default). Enable encryption at rest with EncryptionConfiguration. Use external secret stores: HashiCorp Vault, AWS Secrets Manager (ESO, CSI Driver).

Pod Security

Container Isolation

Pod Security Standards (PSS): Privileged, Baseline, Restricted profiles. Pod Security Admission (PSA) enforces them per namespace. Security context: runAsNonRoot, runAsUser, fsGroup, readOnlyRootFilesystem, allowPrivilegeEscalation, seccompProfile, capabilities.

Network Policies

Zero-Trust Networking

Default-deny all ingress and egress, then explicitly allow required traffic. Namespace isolation: deny all cross-namespace traffic by default. Label-based selectors for fine-grained control. Requires CNI support (Calico, Cilium, Weave).

Image Security

Supply Chain

Use specific image tags or SHA digests (never :latest in production). Scan images with Trivy, Snyk, Grype. Sign images with Cosign (Sigstore). ImagePolicyWebhook to enforce pull policies. Private registries with imagePullSecrets.

Audit Logging

Compliance & Forensics

Records chronological actions taken in the cluster. Audit policy defines: None, Metadata, Request, RequestResponse levels per resource/verb. Backend: log files, webhooks. Essential for compliance (SOC2, PCI-DSS, HIPAA), incident response, and forensics.

06 / Storage

📖 Official Docs: Storage Volumes Persistent Volumes Storage Classes Volume Snapshots Dynamic Provisioning Ephemeral Volumes

Storage Resources

// Kubernetes abstracts storage from compute. Volumes attach to pods, PersistentVolumes are cluster-level resources, PersistentVolumeClaims are user requests for storage.

💾

Volume

Pod-Scoped Storage

Exists as long as the Pod exists. Types: emptyDir (ephemeral scratch space), hostPath (node filesystem, use sparingly), configMap/secret (mount configs/secrets), projected (multiple sources), downwardAPI (expose pod metadata to containers).

emptyDirhostPathconfigMap

🗃

PersistentVolume (PV)

Cluster Storage Resource

A piece of storage provisioned by an admin or dynamically. Independent of pod lifecycle. Access modes: ReadWriteOnce (RWO), ReadOnlyMany (ROX), ReadWriteMany (RWX), ReadWriteOncePod (RWOP). Reclaim policies: Retain, Delete, Recycle.

RWORWXRetain/Delete

📝

PersistentVolumeClaim (PVC)

Storage Request

Request for storage by a user. Pods reference PVCs to use persistent storage. Kubernetes binds PVCs to matching PVs. Supports dynamic provisioning via StorageClasses. PVC can request specific storage class, access mode, and size.

BindingDynamic Provisioning

⚡

StorageClass

Dynamic Provisioning

Enables dynamic provisioning of PersistentVolumes. Defines the provisioner (e.g., kubernetes.io/aws-ebs), parameters, and reclaim policy. VolumeBindingMode: Immediate vs WaitForFirstConsumer. Default StorageClass auto-assigned to unspecified PVCs.

DynamicProvisionerParameters

🔌

CSI Drivers

Container Storage Interface

Standard interface for container orchestrators to expose storage systems to containerized workloads. CSI drivers provided by storage vendors: AWS EBS, GCP Persistent Disk, Azure Disk, Ceph RBD, Longhorn, OpenEBS. Supports snapshots, cloning, resizing.

AWS EBSCephLonghorn

📸

VolumeSnapshot

Point-in-Time Backup

Create snapshots of PersistentVolumeClaims. VolumeSnapshotClass defines the snapshot driver. VolumeSnapshot creates the snapshot. VolumeSnapshotContent is the actual snapshot resource. Restore by creating PVC from snapshot. Requires CSI driver support.

BackupRestoreCSI Required

07 / Configuration

📖 Official Docs: Configuration ConfigMaps Secrets Resource Quotas LimitRange HPA Autoscaling

Configuration Resources

🗂

ConfigMap

Non-Secret Configuration

Store non-confidential data as key-value pairs. Mount as files, environment variables, or command-line arguments. Decouples application config from container images. Max size 1MB. Not encrypted — use Secrets for sensitive data. Immutable ConfigMaps for performance.

Env VarsVolume MountImmutable

🔑

Secret

Sensitive Configuration

Stores sensitive data (passwords, API keys, TLS certs). Types: Opaque, kubernetes.io/tls, kubernetes.io/dockerconfigjson, kubernetes.io/service-account-token. Base64 encoded. Enable EncryptionConfiguration for encryption at rest. Avoid committing to git.

TLSOpaqueEncryption at Rest

📊

ResourceQuota

Namespace Resource Limits

Limits total resource consumption in a namespace. Covers compute (CPU, memory), storage (PVC count, storage size), object count (pods, services, secrets). LimitRange sets default requests/limits per container/pod. Prevents noisy-neighbor problem.

CPU/MemoryObject Count

🏢

Namespace

Virtual Cluster Isolation

Provides a mechanism for isolating groups of resources. Names unique within namespace, not across. Not all objects are namespaced (Nodes, PVs, StorageClass, ClusterRole). Default namespaces: default, kube-system, kube-public, kube-node-lease.

IsolationMulti-tenancy

📈

HPA

Horizontal Pod Autoscaler

Automatically scales number of pod replicas based on observed metrics. Default: CPU/memory via Metrics Server. Custom metrics via Prometheus Adapter. External metrics from cloud providers. ScaleDown stabilization window prevents flapping.

CPUCustom MetricsKEDA

📦

VPA

Vertical Pod Autoscaler

Automatically adjusts CPU and memory requests/limits for containers. Modes: Off (recommend only), Initial (set on pod creation), Auto (update running pods). Analyzes historical usage patterns. Cannot be used with HPA on same metric simultaneously.

RightsizingRecommendations

08 / YAML Examples

📖 Official Docs: Deployment YAML StatefulSet YAML Ingress YAML API Reference

Example Manifests

// Production-ready YAML configurations for all major Kubernetes resources. Copy, adapt, deploy.

Web Application Deployment Deployment

Production-ready Deployment with resource limits, probes, security context, and rolling update strategy.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: production
  labels:
    app: web-app
    version: v1.2.0
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: web-app
    spec:
      serviceAccountName: web-app-sa
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 2000
      containers:
      - name: web-app
        image: myapp:1.2.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop: ["ALL"]
        envFrom:
        - configMapRef:
            name: web-app-config
        - secretRef:
            name: web-app-secrets

PostgreSQL StatefulSet StatefulSet

Stateful database deployment with persistent volume per replica and stable network identity.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: database
spec:
  serviceName: postgres-headless
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15-alpine
        ports:
        - containerPort: 5432
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 10Gi

Service & Ingress Service / Ingress

ClusterIP service with NGINX Ingress, TLS termination, and path-based routing.

apiVersion: v1
kind: Service
metadata:
  name: web-app-svc
  namespace: production
spec:
  selector:
    app: web-app
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-app-ingress
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-app-svc
            port:
              number: 80
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-svc
            port:
              number: 8080

Network Policy (Zero Trust) NetworkPolicy

Deny all ingress/egress by default, then selectively allow required traffic between namespaces.

# Deny all ingress and egress by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
---
# Allow web-app to receive traffic from ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress-controller
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: web-app
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: ingress-nginx
    - podSelector:
        matchLabels:
          app.kubernetes.io/name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8080
---
# Allow egress to database namespace only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-db-egress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: web-app
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: database
    ports:
    - protocol: TCP
      port: 5432
  - to: # Allow DNS
    - namespaceSelector: {}
    ports:
    - protocol: UDP
      port: 53

RBAC — Role & Binding RBAC

ServiceAccount, Role with least-privilege permissions, and RoleBinding for a microservice.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: web-app-sa
  namespace: production
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123:role/web-app
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: web-app-role
  namespace: production
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["secrets"]
  resourceNames: ["web-app-secrets"]
  verbs: ["get"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: web-app-binding
  namespace: production
subjects:
- kind: ServiceAccount
  name: web-app-sa
  namespace: production
roleRef:
  kind: Role
  name: web-app-role
  apiGroup: rbac.authorization.k8s.io

HPA + PDB Autoscaling

Horizontal Pod Autoscaler with CPU/memory targets and PodDisruptionBudget for HA during updates.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 200Mi
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
  namespace: production
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web-app

CronJob — Scheduled Backup CronJob

Scheduled job that runs a database backup every night at 2 AM with retry policy.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: db-backup
  namespace: production
spec:
  schedule: "0 2 * * *"  # 2 AM daily
  timeZone: "UTC"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  startingDeadlineSeconds: 300
  jobTemplate:
    spec:
      backoffLimit: 3
      activeDeadlineSeconds: 3600
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: postgres:15-alpine
            command:
            - /bin/sh
            - -c
            - pg_dump -h postgres-0.postgres-headless -U postgres mydb | gzip | aws s3 cp - s3://backups/$(date +%Y%m%d).sql.gz
            env:
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: password

DaemonSet — Log Collector DaemonSet

Fluent Bit DaemonSet running on every node to collect and forward container logs to Elasticsearch.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
  labels:
    app: fluent-bit
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      tolerations: # Run on control-plane too
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:2.2
        volumeMounts:
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: containers
          mountPath: /var/lib/docker/containers
          readOnly: true
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: containers
        hostPath:
          path: /var/lib/docker/containers

ConfigMap & Secret Config

Application configuration and secrets with multiple data entries and multi-line values.

apiVersion: v1
kind: ConfigMap
metadata:
  name: web-app-config
  namespace: production
data:
  APP_ENV: production
  LOG_LEVEL: info
  MAX_CONNECTIONS: "100"
  app.properties: |
    server.port=8080
    cache.ttl=300
    feature.flags=auth,metrics
immutable: false
---
apiVersion: v1
kind: Secret
metadata:
  name: web-app-secrets
  namespace: production
  annotations:
    reloader.stakater.com/match: "true"
type: Opaque
stringData: # plain text (auto base64)
  DATABASE_URL: postgresql://user:pass@postgres:5432/db
  API_KEY: supersecretapikey123
  JWT_SECRET: myverysecretjwtkey

ResourceQuota & LimitRange Quota

Namespace-level resource limits with per-container defaults using LimitRange.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    persistentvolumeclaims: "10"
    pods: "100"
    services: "20"
    services.loadbalancers: "2"
    secrets: "50"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: 500m
      memory: 256Mi
    defaultRequest:
      cpu: 100m
      memory: 128Mi
    max:
      cpu: "4"
      memory: 4Gi
    min:
      cpu: 10m
      memory: 32Mi

09 / Scheduling

📖 Official Docs: Scheduling & Eviction kube-scheduler Assign Pod to Node Topology Spread Priority & Preemption Taints & Tolerations Resource Bin Packing

Advanced Scheduling

// The kube-scheduler places Pods onto nodes through a filtering + scoring pipeline. Advanced controls let you influence exactly where and how workloads run.

🏆

PriorityClass

Pod Priority & Preemption

Assign integer priority values to Pods. Higher-priority Pods can preempt (evict) lower-priority ones when cluster resources are tight. System-node-critical and system-cluster-critical are built-in high-priority classes. PreemptionPolicy controls whether preemption is allowed.

PreemptionQoSSystem Critical

🌐

Topology Spread Constraints

Even Distribution

Spread Pods evenly across failure domains: zones, regions, nodes, custom topologies. Controls maxSkew (max allowed imbalance), topologyKey (the node label to spread across), and whenUnsatisfiable (DoNotSchedule or ScheduleAnyway). Supersedes podAntiAffinity for spreading.

Zone SpreadHAmaxSkew

🎛

Scheduler Profiles & Plugins

Custom Scheduling Logic

The scheduler framework exposes extension points: PreFilter, Filter, PostFilter, PreScore, Score, Reserve, Permit, PreBind, Bind. Multiple scheduler profiles can run in one binary. Deploy custom schedulers or use scheduler-plugins project for advanced features like coscheduling and capacity scheduling.

Extension PointsPlugins

⏳

Pod Overhead

Runtime Overhead Accounting

Accounts for resources consumed by the Pod sandbox (e.g. kata containers VM overhead) in addition to container resource requests/limits. Defined in RuntimeClass. Included in scheduling decisions, quota accounting, and kubelet cgroup management.

RuntimeClassSandbox Overhead

🔒

Node Selector & Node Name

Simple Node Targeting

nodeSelector: simplest form of node constraint — map of label key-values that must match. nodeName: bypasses the scheduler entirely, directly binds Pod to a specific node by name. Both are less flexible than nodeAffinity but simpler to reason about.

nodeSelectornodeName

📦

RuntimeClass

Container Runtime Selection

Select different container runtimes (and runtime configurations) per Pod. Use cases: stronger isolation with gVisor or Kata Containers for untrusted workloads, while standard workloads use containerd/runc. Specify in pod spec via runtimeClassName field.

gVisorKataIsolation

PriorityClass & Topology Spread Scheduling

PriorityClass definition and a Deployment using topology spread constraints across availability zones.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "Critical production workloads"
---
# Deployment with topology spread + priority
apiVersion: apps/v1
kind: Deployment
metadata:
  name: critical-app
spec:
  replicas: 6
  selector:
    matchLabels:
      app: critical-app
  template:
    metadata:
      labels:
        app: critical-app
    spec:
      priorityClassName: high-priority
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: critical-app
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: critical-app
      containers:
      - name: app
        image: myapp:latest

10 / Extending Kubernetes

📖 Official Docs: Extending Kubernetes Custom Resources Operator Pattern Admission Controllers Webhook Admission API Aggregation Gateway API Garbage Collection

Extending the Platform

// Kubernetes is designed to be extensible. Add new resource types, custom controllers, admission logic, and API endpoints without modifying core code.

📐

CRD

Custom Resource Definitions

Extend the Kubernetes API with your own resource types. Define schema with OpenAPI v3 validation. CRDs become first-class API objects: storable in etcd, accessible via kubectl, RBAC-protected. The foundation for Operators. Versions, conversion webhooks, and status subresources supported.

Custom APIOpenAPI SchemaOperators

🤖

Operators

Controller Pattern

Operators extend Kubernetes to manage stateful applications using domain-specific knowledge. Built on CRDs + custom controllers. Implement the reconcile loop: observe current state → compare to desired state → act. Operator SDK, Kubebuilder, CAPI frameworks. Examples: Prometheus Operator, Strimzi (Kafka), CloudNativePG.

Reconcile LoopkubebuilderOperator SDK

🔀

Admission Webhooks

Mutating & Validating

Mutating: Modify objects before persistence (inject sidecars, set defaults, add labels). Validating: Allow or reject requests after mutation (enforce policies). Called via HTTPS webhook. Failure policy: Fail (safe) or Ignore. Tools: OPA Gatekeeper, Kyverno, custom webhooks.

OPA GatekeeperKyvernoSidecar Inject

🔌

API Aggregation Layer

Extension API Servers

Register additional API servers that serve under /apis/<group>/<version>. The API server proxies requests to your extension server. More powerful than CRDs: custom storage backends, non-standard REST semantics. Used by metrics-server (metrics.k8s.io API), service catalog.

APIServiceCustom Backend

🗑

Finalizers & Owner References

Garbage Collection

Finalizers: keys on objects that prevent deletion until external cleanup is done. Controller removes the finalizer after cleanup. Owner References: parent-child relationships. When parent is deleted, children are garbage collected (cascade). Foreground vs Background deletion propagation policies.

Cascade DeleteCleanup Hooks

🌐

Gateway API

Next-Gen Ingress

Successor to Ingress — more expressive, role-oriented, extensible. Resources: GatewayClass, Gateway, HTTPRoute, GRPCRoute, TCPRoute, TLSRoute. Separates infrastructure (Gateway) from application routing (Routes). Supported by Cilium, Istio, Envoy Gateway, NGINX, Traefik.

HTTPRouteGRPCRouteRole-Oriented

📋

EndpointSlices

Scalable Endpoints

Replacement for Endpoints objects — more scalable for large clusters. Each slice holds up to 100 endpoints by default. Conditions: Ready, Serving, Terminating. Supports IPv4, IPv6, FQDN. Required for topology-aware routing and traffic policies (Local, Cluster). Automatically managed by endpoint-slice controller.

ScalableIPv4/IPv6Topology Aware

🎭

Lease Objects

Leader Election & Heartbeats

Lightweight objects in the coordination.k8s.io API group. Used for: node heartbeats (kubelet updates Lease every 10s, reduces etcd load), leader election in controllers (only one replica acts as leader), and distributed locking in custom controllers.

Leader ElectionHeartbeat

Custom Resource Definition CRD

A CRD defining a custom "Database" resource with OpenAPI v3 schema validation and status subresource.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.mycompany.io
spec:
  group: mycompany.io
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required: ["engine", "version"]
            properties:
              engine:
                type: string
                enum: ["postgres", "mysql"]
              version:
                type: string
              replicas:
                type: integer
                minimum: 1
                maximum: 5
              storageGB:
                type: integer
          status:
            type: object
            properties:
              phase:
                type: string
              readyReplicas:
                type: integer
    subresources:
      status: {}
    additionalPrinterColumns:
    - name: Engine
      type: string
      jsonPath: .spec.engine
    - name: Phase
      type: string
      jsonPath: .status.phase
  scope: Namespaced
  names:
    plural: databases
    singular: database
    kind: Database
    shortNames: [db]
---
# Custom Resource instance
apiVersion: mycompany.io/v1
kind: Database
metadata:
  name: my-postgres
  namespace: production
spec:
  engine: postgres
  version: "15"
  replicas: 3
  storageGB: 50

Validating Admission Webhook Webhook

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: resource-limits-enforcer
webhooks:
- name: check-limits.mycompany.io
  admissionReviewVersions: ["v1"]
  sideEffects: None
  failurePolicy: Fail
  rules:
  - apiGroups: ["apps"]
    apiVersions: ["v1"]
    operations: ["CREATE", "UPDATE"]
    resources: ["deployments"]
  namespaceSelector:
    matchLabels:
      admission-webhook: enabled
  clientConfig:
    service:
      name: webhook-service
      namespace: webhook-system
      path: /validate
      port: 443
    caBundle: LS0t... # base64 CA cert
  timeoutSeconds: 5
---
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  name: sidecar-injector
webhooks:
- name: inject.istio.io
  admissionReviewVersions: ["v1"]
  sideEffects: None
  rules:
  - apiGroups: [""]
    apiVersions: ["v1"]
    operations: ["CREATE"]
    resources: ["pods"]
  clientConfig:
    service:
      name: istiod
      namespace: istio-system
      path: /inject

Gateway API — HTTPRoute Gateway API

Next-gen Ingress using Gateway API with canary traffic splitting between stable and canary versions.

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: prod-gateway
  namespace: production
spec:
  gatewayClassName: cilium
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    tls:
      certificateRefs:
      - name: app-tls
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: web-app-route
  namespace: production
spec:
  parentRefs:
  - name: prod-gateway
  hostnames: ["app.example.com"]
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: web-app-stable
      port: 80
      weight: 90
    - name: web-app-canary
      port: 80
      weight: 10  # 10% canary traffic

11 / Cluster Administration

📖 Official Docs: Cluster Administration kubeadm Drain Nodes etcd Backup TLS Certificates Container Images Lifecycle Hooks Ephemeral Containers

Cluster Administration

// Day-2 operations: bootstrapping, node lifecycle management, upgrades, etcd backup, multi-tenancy, and cluster-level policies.

🛠

kubeadm

Cluster Bootstrap Tool

Official tool to bootstrap a production-grade Kubernetes cluster. Commands: kubeadm init (control plane), kubeadm join (worker nodes), kubeadm upgrade (upgrade cluster), kubeadm reset (tear down), kubeadm token (manage bootstrap tokens), kubeadm certs (certificate management).

kubeadm initkubeadm joinUpgrade

🔄

Node Lifecycle

Cordon, Drain & Delete

cordon: Mark node as unschedulable (no new pods). drain: Evict all pods (respects PDBs), then cordon — use before maintenance. uncordon: Restore scheduling. Node conditions: Ready, MemoryPressure, DiskPressure, PIDPressure, NetworkUnavailable.

cordondrainMaintenance

💾

etcd Backup & Restore

Disaster Recovery

etcd is the source of truth — back it up regularly. Use etcdctl snapshot save to create snapshots. etcdctl snapshot restore to restore to new data directory. Back up before every cluster upgrade. Store snapshots off-cluster (S3, GCS). Velero for application-level backups including PV data.

etcdctlsnapshotVelero

🔐

Certificate Management

PKI & TLS

Kubernetes uses PKI certificates for all internal communication. Certs: ca.crt, apiserver.crt, apiserver-kubelet-client.crt, front-proxy-ca.crt, etcd/ca.crt. Default 1-year expiry. Use kubeadm certs renew to rotate. cert-manager automates issuance from Let's Encrypt, Vault, or internal CA.

cert-managerLet's EncryptPKI

🏢

Multi-Tenancy

Cluster Sharing Patterns

Soft multi-tenancy: namespaces + RBAC + NetworkPolicy + ResourceQuota. Hard multi-tenancy: separate clusters per tenant (vcluster for virtual clusters). Hierarchical namespaces (HNC) for team org structures. Node selectors/taints for dedicated node pools per team. Capsule, Loft for enterprise multi-tenancy.

vclusterCapsuleHNC

🌍

Cluster Upgrades

Version Management

Upgrade control plane first, then worker nodes. Kubernetes skew policy: kubelet may not be more than 2 minor versions older than kube-apiserver. Rolling node upgrades: drain → upgrade → uncordon. Managed services (EKS, GKE, AKS) handle upgrades via their APIs. Always upgrade to next minor version sequentially.

Skew PolicyRollingMinor-by-Minor

🌐

Windows Nodes

Heterogeneous Clusters

Kubernetes supports Windows Server worker nodes alongside Linux nodes. Windows pods run Windows containers (no Linux containers on Windows nodes). Use node selectors (kubernetes.io/os: windows) to target Windows workloads. Limitations: no HostNetwork, no privileged containers, no Linux-specific capabilities.

Windows ServerMixed OS

📦

Image Management

Registry & Pull Policy

imagePullPolicy: Always (re-pulls every time), Never (local only), IfNotPresent (default — pull if not cached). Use image digests (sha256:...) instead of tags for reproducibility. imagePullSecrets for private registries — create docker-registry secret. Image garbage collection via kubelet: imageGCHighThresholdPercent / Low.

imagePullSecretsDigest PinningGC

⚡

Container Lifecycle Hooks

postStart & preStop

postStart: Executes immediately after container starts (async — not guaranteed to run before entrypoint). preStop: Called before container is terminated — use for graceful shutdown (drain connections, flush buffers). Pair with terminationGracePeriodSeconds for zero-downtime deploys.

postStartpreStopGraceful Shutdown

🔬

Ephemeral Containers

Live Debugging

Temporary containers added to a running Pod for debugging purposes. Cannot be removed once added. Share the Pod's namespaces (network, PID). Use kubectl debug to inject a debug container (e.g. busybox, netshoot) into a distroless or crash-looping container for live diagnosis.

kubectl debugDistrolessLive Debug

Container Lifecycle Hooks Pod Spec

postStart and preStop hooks for graceful startup and zero-downtime shutdown.

apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  terminationGracePeriodSeconds: 60
  containers:
  - name: app
    image: myapp:1.0
    lifecycle:
      postStart:
        exec:
          command:
          - /bin/sh
          - -c
          - echo "Container started" >> /var/log/lifecycle.log
      preStop:
        exec:
          command:
          - /bin/sh
          - -c
          - |
              # Graceful shutdown: stop accepting new connections,
              # wait for in-flight requests to complete
              kill -SIGTERM 1
              sleep 30
    startupProbe:
      httpGet:
        path: /startup
        port: 8080
      failureThreshold: 30
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 0
      periodSeconds: 10

Pod Security Standards PSA / Security

Namespace-level Pod Security Admission enforcement using labels and a restricted pod example.

# Label namespace to enforce security standards
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    # Enforce: reject violating pods
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.35
    # Audit: log violations
    pod-security.kubernetes.io/audit: restricted
    # Warn: show warnings
    pod-security.kubernetes.io/warn: restricted
---
# Restricted-compliant pod (all security requirements met)
apiVersion: v1
kind: Pod
metadata:
  name: restricted-pod
  namespace: production
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp:1.0
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop: ["ALL"]

12 / Observability

📖 Official Docs: Logging Architecture Debugging System Metrics Cluster Troubleshooting Application Debugging Metrics Reference

Monitoring, Logging & Tracing

// The three pillars of observability in Kubernetes: metrics for dashboards & alerting, logs for debugging, traces for distributed request flows.

📊

Metrics Architecture

Layered metrics system — core metrics for autoscaling, full metrics for monitoring.

Metrics Server → HPA / VPA (CPU, memory)
Prometheus → custom metrics + alerting
kube-state-metrics → object state metrics
node-exporter → OS/hardware metrics
Grafana → dashboards (kube-prometheus-stack)
PrometheusRule CRD for alerting rules

📋

Logging Architecture

Kubernetes doesn't provide native log storage — logs flow to external systems.

Node-level: DaemonSet log agents (Fluent Bit)
Sidecar: dedicated log container per pod
kubectl logs: direct container log access
Backends: Elasticsearch, Loki, CloudWatch
Structured logging (JSON) for queryability
Log rotation: containerLogMaxSize / MaxFiles

🔍

Distributed Tracing

Track requests across multiple microservices using trace context propagation.

OpenTelemetry (OTel) — CNCF standard SDK
Jaeger / Zipkin — trace storage & UI
W3C Trace Context propagation headers
Automatic injection via service mesh (Istio)
OTel Collector DaemonSet / sidecar pattern

🚨

Alerting

Proactive notification when cluster or application health degrades.

Alertmanager — route, deduplicate, silence alerts
PrometheusRule CRDs for alert definitions
Receivers: Slack, PagerDuty, email, webhooks
Inhibition rules to suppress downstream alerts
kube-prometheus-stack bundles everything

🩺

Cluster Health Events

Kubernetes Events provide real-time cluster activity and troubleshooting info.

kubectl get events -n <ns> --sort-by=.lastTimestamp
Types: Normal, Warning
Events expire after 1 hour by default
Event exporter → persist to Elasticsearch/BigQuery
KubeWatch, Robusta for event-driven alerting

📈

Resource Monitoring

Key metrics to monitor for cluster and workload health.

Node: CPU/mem utilization, disk I/O, network
Pod: container restarts, OOMKilled events
API server: request latency, error rate (4xx/5xx)
etcd: DB size, leader elections, write latency
Scheduler: scheduling latency, pending pods

ServiceMonitor (Prometheus Operator) Monitoring

Prometheus Operator ServiceMonitor to automatically scrape metrics from a Service's /metrics endpoint.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: web-app-monitor
  namespace: production
  labels:
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: web-app
  endpoints:
  - port: http
    path: /metrics
    interval: 30s
    scrapeTimeout: 10s
  namespaceSelector:
    matchNames:
    - production
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: web-app-alerts
  namespace: production
spec:
  groups:
  - name: web-app.rules
    rules:
    - alert: HighErrorRate
      expr: |
        rate(http_requests_total{status=~"5.."}[5m])
        / rate(http_requests_total[5m]) > 0.05
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: High HTTP error rate on web-app
    - alert: PodCrashLooping
      expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
      for: 5m
      labels:
        severity: warning

OpenTelemetry Collector Tracing

OTel Collector as a DaemonSet receiving traces/metrics and exporting to Jaeger and Prometheus.

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel-collector
  namespace: observability
spec:
  mode: DaemonSet
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch:
        timeout: 1s
      memory_limiter:
        limit_mib: 400
    exporters:
      jaeger:
        endpoint: jaeger-collector:14250
        tls:
          insecure: true
      prometheusremotewrite:
        endpoint: http://prometheus:9090/api/v1/write
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [jaeger]
        metrics:
          receivers: [otlp]
          processors: [batch]
          exporters: [prometheusremotewrite]

13 / kubectl Reference

📖 Official Docs: kubectl Reference kubectl Cheat Sheet kubectl Conventions JSONPath in kubectl Install kubectl

kubectl CLI Reference

// The primary command-line tool for interacting with Kubernetes clusters. Essential commands, flags, and patterns for daily operations.

📋

Get & Describe

Inspect Resources

Core commands for viewing cluster state. Use -o wide, -o yaml, -o json for different output formats. --watch (-w) for live updates. --all-namespaces (-A) across all namespaces. -l for label selector filtering. --field-selector for field-based filtering.

getdescribe-o yaml

⚙️

Apply & Create

Manage Resources

kubectl apply -f: declarative resource management (preferred). kubectl create: imperative creation. kubectl delete: remove resources. kubectl replace: full resource replacement. kubectl patch: partial update (strategic merge, JSON merge, JSON patch). kubectl edit: open in $EDITOR.

applypatchdelete

🐛

Debug & Exec

Troubleshooting

kubectl exec: run commands in containers. kubectl logs: view container logs (-f to follow, --previous for crashed containers). kubectl debug: ephemeral debug containers. kubectl port-forward: local port to pod/service. kubectl cp: copy files to/from containers. kubectl top: resource usage.

execlogsdebug

🌐

Context & Config

kubeconfig Management

kubectl config view: show kubeconfig. config use-context: switch active cluster. config get-contexts: list contexts. config set-context: modify context. KUBECONFIG env var or --kubeconfig flag for custom config paths. kubectx / kubens tools for fast switching. k9s for TUI interface.

contextkubectxk9s

🔄

Rollout Management

Deployment Operations

kubectl rollout status: watch rollout progress. rollout history: view revision history. rollout undo: rollback (--to-revision=N for specific). rollout pause / resume: canary-style pausing. rollout restart: rolling restart of all pods (triggers new ReplicaSet). scale: change replica count.

rollout undorollout restart

🏷

Label & Annotate

Metadata Operations

kubectl label: add/modify/remove labels on resources. kubectl annotate: add/modify/remove annotations. Use -- to remove (kubectl label pod foo env-). --overwrite to replace existing values. Label nodes for affinity, cordon, drain workflows. JSONPath and Go template output formats.

labelannotateJSONPath

kubectl Cheat Sheet CLI Reference

Most-used kubectl commands for daily cluster operations, debugging, and administration.

# ── CONTEXT & CLUSTER ────────────────────────────────────────
kubectl config get-contexts                    # list all contexts
kubectl config use-context my-cluster          # switch context
kubectl config set-context --current --namespace=prod  # set default ns
kubectl cluster-info                           # cluster endpoints
kubectl api-resources                          # all resource types
kubectl api-versions                           # all API versions

# ── GET / INSPECT ─────────────────────────────────────────────
kubectl get pods -A -o wide                    # all pods, all namespaces
kubectl get pod my-pod -o yaml                 # full pod spec
kubectl describe pod my-pod                    # events + status detail
kubectl get events --sort-by=.lastTimestamp    # sorted events
kubectl get all -n production                  # all resources in ns
kubectl top nodes                              # node resource usage
kubectl top pods --containers                  # container-level usage

# ── APPLY / MANAGE ────────────────────────────────────────────
kubectl apply -f manifest.yaml                 # declarative apply
kubectl apply -f ./k8s/                        # apply whole directory
kubectl delete -f manifest.yaml                # delete from file
kubectl patch deploy my-app -p '{"spec":{"replicas":5}}'
kubectl scale deploy my-app --replicas=5
kubectl set image deploy/my-app app=myapp:2.0  # update image
kubectl label node node1 disktype=ssd          # label a node

# ── ROLLOUTS ─────────────────────────────────────────────────
kubectl rollout status deploy/my-app
kubectl rollout history deploy/my-app
kubectl rollout undo deploy/my-app
kubectl rollout undo deploy/my-app --to-revision=3
kubectl rollout restart deploy/my-app          # rolling restart
kubectl rollout pause deploy/my-app            # pause rollout

# ── DEBUG / TROUBLESHOOT ──────────────────────────────────────
kubectl logs my-pod -c my-container -f         # follow logs
kubectl logs my-pod --previous                 # crashed container logs
kubectl exec -it my-pod -- /bin/sh             # interactive shell
kubectl exec my-pod -- env                     # list env vars
kubectl debug my-pod -it --image=busybox       # ephemeral debug container
kubectl port-forward svc/my-svc 8080:80        # local port forward
kubectl cp my-pod:/app/logs.txt ./logs.txt     # copy from pod

# ── NODE MANAGEMENT ───────────────────────────────────────────
kubectl cordon node1                           # mark unschedulable
kubectl drain node1 --ignore-daemonsets --delete-emptydir-data
kubectl uncordon node1                         # re-enable scheduling
kubectl taint nodes node1 key=val:NoSchedule   # add taint
kubectl taint nodes node1 key=val:NoSchedule-  # remove taint

# ── GENERATING YAML ──────────────────────────────────────────
kubectl create deploy my-app --image=nginx --dry-run=client -o yaml
kubectl create svc clusterip my-svc --tcp=80:8080 --dry-run=client -o yaml
kubectl create secret generic my-secret --from-literal=key=val --dry-run=client -o yaml

# ── USEFUL OUTPUT FORMATS ────────────────────────────────────
kubectl get pods -o jsonpath='{.items[*].metadata.name}'
kubectl get nodes -o custom-columns='NAME:.metadata.name,STATUS:.status.conditions[-1].type'
kubectl get pods --sort-by='.status.startTime'

14 / Additional YAML Examples

📖 Official Docs: Policies etcd Backup Pod Security Admission Volume Snapshots

Policies, Multi-tenancy & Operations

// Additional production patterns: Kyverno policies, etcd backup jobs, namespace setup, and node management.

Kyverno Policy Policy-as-Code

Kyverno ClusterPolicy to enforce image registry restrictions and require resource limits on all pods.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: enforce-registry-and-limits
spec:
  validationFailureAction: Enforce
  background: true
  rules:
  - name: restrict-image-registries
    match:
      any:
      - resources:
          kinds: ["Pod"]
    validate:
      message: "Only images from approved registries allowed"
      pattern:
        spec:
          containers:
          - image: "registry.mycompany.io/* | gcr.io/*"
  - name: require-resource-limits
    match:
      any:
      - resources:
          kinds: ["Pod"]
    validate:
      message: "CPU and memory limits are required"
      pattern:
        spec:
          containers:
          - resources:
              limits:
                cpu: "?*"
                memory: "?*"
  - name: add-default-labels  # mutate rule
    match:
      any:
      - resources:
          kinds: ["Deployment"]
    mutate:
      patchStrategicMerge:
        metadata:
          labels:
            managed-by: kyverno

etcd Backup CronJob Cluster Admin

Automated etcd snapshot backup every 6 hours, uploaded to S3 using etcdctl.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcd-backup
  namespace: kube-system
spec:
  schedule: "0 */6 * * *"  # every 6 hours
  jobTemplate:
    spec:
      template:
        spec:
          hostNetwork: true
          restartPolicy: OnFailure
          nodeSelector:
            node-role.kubernetes.io/control-plane: ""
          tolerations:
          - key: node-role.kubernetes.io/control-plane
            operator: Exists
            effect: NoSchedule
          containers:
          - name: etcd-backup
            image: bitnami/etcd:3.5
            command:
            - /bin/sh
            - -c
            - |
                BACKUP_FILE="/tmp/etcd-$(date +%Y%m%d-%H%M%S).db"
                ETCDCTL_API=3 etcdctl snapshot save $BACKUP_FILE \
                  --endpoints=https://127.0.0.1:2379 \
                  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
                  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
                  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
                aws s3 cp $BACKUP_FILE s3://my-etcd-backups/
                echo "Backup complete: $BACKUP_FILE"
            volumeMounts:
            - name: etcd-certs
              mountPath: /etc/kubernetes/pki/etcd
              readOnly: true
          volumes:
          - name: etcd-certs
            hostPath:
              path: /etc/kubernetes/pki/etcd

Namespace + Full Tenant Setup Multi-Tenancy

Complete namespace setup for a team: namespace, RBAC, ResourceQuota, LimitRange, and NetworkPolicy isolation.

# 1. Namespace
apiVersion: v1
kind: Namespace
metadata:
  name: team-alpha
  labels:
    team: alpha
    pod-security.kubernetes.io/enforce: baseline
---
# 2. Team RBAC - developers get edit rights
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: team-alpha-developers
  namespace: team-alpha
subjects:
- kind: Group
  name: team-alpha-devs
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: edit
  apiGroup: rbac.authorization.k8s.io
---
# 3. Quota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"
---
# 4. Namespace isolation network policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: namespace-isolation
  namespace: team-alpha
spec:
  podSelector: {}
  policyTypes: [Ingress]
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          team: alpha  # only same-team ns
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: ingress-nginx

VolumeSnapshot & Restore Storage

Create a VolumeSnapshot from a PVC, then restore it by creating a new PVC from that snapshot.

# VolumeSnapshotClass (CSI driver dependent)
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-aws-vsc
driver: ebs.csi.aws.com
deletionPolicy: Delete
---
# Take a snapshot of existing PVC
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-snapshot-20240101
  namespace: database
spec:
  volumeSnapshotClassName: csi-aws-vsc
  source:
    persistentVolumeClaimName: data-postgres-0
---
# Restore: create new PVC from snapshot
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-restored
  namespace: database
spec:
  accessModes: ["ReadWriteOnce"]
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 10Gi
  dataSource:
    name: postgres-snapshot-20240101
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io

15 / Cluster Setup

📖 Official Docs: Getting Started / Setup kubeadm Install Create Cluster (kubeadm) Install Tools Container Runtimes

Setting Up Kubernetes

// Multiple ways to run Kubernetes depending on your use case — local development, bare metal, or cloud-managed. Choose the right tool for the right environment.

Tool	Best For	Complexity	Production?	Notes
k3d	Local dev, CI/CD testing	LOW	❌	k3s in Docker containers — fastest spin-up (<30s)
kind	Local dev, e2e testing	LOW	❌	Kubernetes IN Docker — used by k8s upstream CI
minikube	Local dev, learning	LOW	❌	Single-node, many drivers (Docker, VM, Podman)
k3s	Edge, IoT, bare-metal, RPi	MEDIUM	✅	Lightweight k8s — 40MB binary, SQLite or etcd
kubeadm	Self-managed production clusters	HIGH	✅	Official bootstrap tool — full control, manual upgrades
RKE2	Enterprise, FIPS-compliant	MEDIUM	✅	Rancher's hardened k8s distribution
EKS / GKE / AKS	Cloud-managed, teams	LOW	✅	Managed control plane — pay for worker nodes only

k3d — Local Development

// k3d wraps k3s (a lightweight Kubernetes) inside Docker containers. Create full multi-node clusters on your laptop in seconds.

⚡

k3d Overview

k3s in Docker

k3d uses Docker to run k3s nodes as containers. Each "node" is a Docker container. Control plane and workers all in Docker. Supports multi-node clusters, LoadBalancer via built-in Traefik, persistent volumes via local-path-provisioner. Perfect for local development and CI pipelines.

Docker RequiredMulti-NodeTraefik LB

📦

k3d Install

Prerequisites & Installation

Requires: Docker Desktop or Docker Engine, kubectl. Install k3d via curl script, Homebrew (macOS), or Chocolatey (Windows). Very small binary (~15MB). Works on Linux, macOS, Windows (WSL2). No VM required — pure Docker networking.

curl | bashbrewchoco

🔧

k3d Config File

Declarative Cluster Config

k3d supports a YAML config file for reproducible cluster creation. Define server count, agent count, port mappings, volume mounts, extra k3s args, image registry mirrors, and environment variables. Commit to git for team-shared dev environments.

Config as CodeReproducible

k3d Setup — Complete Walkthroughk3d

Full k3d installation and cluster creation commands for local development.

# ── INSTALL k3d ──────────────────────────────────────────────
# Linux / macOS
curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash

# macOS via Homebrew
brew install k3d

# Verify installation
k3d version
kubectl version --client

# ── CREATE CLUSTERS ──────────────────────────────────────────
# Simple single-server cluster
k3d cluster create mycluster

# Production-like: 1 server + 3 agents + port mappings
k3d cluster create devcluster \
  --servers 1 \
  --agents 3 \
  --port "80:80@loadbalancer" \
  --port "443:443@loadbalancer" \
  --api-port 6550 \
  --k3s-arg "--disable=traefik@server:0"  # disable built-in Traefik

# With local registry (for custom images without pushing to remote)
k3d registry create myregistry --port 5050
k3d cluster create devcluster \
  --registry-use k3d-myregistry:5050 \
  --agents 2

# ── CLUSTER MANAGEMENT ───────────────────────────────────────
k3d cluster list                    # list all clusters
k3d cluster stop devcluster         # stop cluster (keep state)
k3d cluster start devcluster        # restart cluster
k3d cluster delete devcluster       # delete cluster
k3d node list                       # list all nodes
k3d node add --cluster devcluster   # add a worker node

# ── KUBECONFIG ───────────────────────────────────────────────
# Automatically merged into ~/.kube/config
kubectl config use-context k3d-devcluster
kubectl get nodes

# ── LOAD IMAGES INTO CLUSTER ─────────────────────────────────
# Build locally and import into k3d (no registry push needed)
docker build -t myapp:dev .
k3d image import myapp:dev --cluster devcluster

k3d Config Filek3d YAML

Declarative k3d cluster configuration file — commit to git for reproducible dev environments.

# k3d-config.yaml
apiVersion: k3d.io/v1alpha5
kind: Simple
metadata:
  name: dev-cluster

servers: 1
agents: 2

kubeAPI:
  hostPort: "6550"

ports:
- port: 8080:80
  nodeFilters: [loadbalancer]
- port: 8443:443
  nodeFilters: [loadbalancer]

volumes:
- volume: /tmp/k3dvol:/data
  nodeFilters: ["server:*", "agent:*"]

registries:
  use: [k3d-myregistry:5050]
  config: |
    mirrors:
      "docker.io":
        endpoint:
          - "https://mirror.gcr.io"

options:
  k3s:
    extraArgs:
    - arg: --disable=traefik
      nodeFilters: ["server:*"]
    - arg: --cluster-cidr=10.20.0.0/16
      nodeFilters: ["server:*"]

# Create from config file:
# k3d cluster create --config k3d-config.yaml

kubeadm — Production Cluster

// kubeadm is the official Kubernetes cluster bootstrapping tool. Use it to set up production-grade clusters on bare metal, VMs, or cloud instances.

🏗

Prerequisites

Node Requirements

Each node: 2+ CPUs, 2GB+ RAM, unique hostname/MAC/product_uuid, swap disabled, full network connectivity between nodes. Ports open: 6443 (API server), 2379-2380 (etcd), 10250 (kubelet), 10257/10259 (controller/scheduler). Container runtime installed (containerd recommended).

No SwapPort 6443containerd

🌐

HA Control Plane

High Availability Setup

Stacked etcd: etcd runs on same nodes as control plane (simpler, 3+ control-plane nodes). External etcd: separate etcd cluster (more resilient, more nodes). Need a load balancer in front of multiple API servers — HAProxy or cloud LB. VIP or DNS round-robin for API server endpoint.

3 Control PlaneHAProxyStacked etcd

📋

kubeadm Config

ClusterConfiguration

Use a kubeadm config file instead of flags for reproducibility. Defines API server endpoint, pod/service CIDRs, feature gates, etcd config, image repository, encryption provider, audit policy, scheduler/controller-manager extra args.

ClusterConfigurationInitConfiguration

kubeadm — Full Production Setupkubeadm

Complete step-by-step kubeadm cluster setup on Ubuntu 22.04 — control plane + worker nodes.

## ═══════════════════════════════════════════════════
##  RUN ON ALL NODES (control-plane + workers)
## ═══════════════════════════════════════════════════

# 1. Disable swap (required by kubelet)
swapoff -a
sed -i '/swap/d' /etc/fstab

# 2. Enable required kernel modules
cat <# 3. Kernel networking params
cat <# 4. Install containerd runtime
apt-get install -y containerd
mkdir -p /etc/containerd
containerd config default | tee /etc/containerd/config.toml
# Enable SystemdCgroup (critical!)
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' \
  /etc/containerd/config.toml
systemctl restart containerd && systemctl enable containerd

# 5. Install kubeadm, kubelet, kubectl
apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.35/deb/Release.key \
  | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.35/deb/ /' \
  | tee /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet=1.30.0-1.1 kubeadm=1.30.0-1.1 kubectl=1.30.0-1.1
apt-mark hold kubelet kubeadm kubectl  # prevent auto-upgrade

## ═══════════════════════════════════════════════════
##  RUN ON CONTROL-PLANE NODE ONLY
## ═══════════════════════════════════════════════════

# 6. Initialize the cluster
kubeadm init \
  --control-plane-endpoint "k8s-api.example.com:6443" \
  --pod-network-cidr "10.244.0.0/16" \
  --service-cidr "10.96.0.0/12" \
  --upload-certs  # needed for HA: share certs with other control-planes

# 7. Set up kubeconfig for root
mkdir -p $HOME/.kube
cp /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

# 8. Install CNI plugin (Calico)
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/tigera-operator.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/custom-resources.yaml

# 9. Verify control plane is ready
kubectl get nodes
kubectl get pods -n kube-system

## ═══════════════════════════════════════════════════
##  RUN ON EACH WORKER NODE
## ═══════════════════════════════════════════════════

# 10. Join worker nodes (token from kubeadm init output)
kubeadm join k8s-api.example.com:6443 \
  --token abc123.0123456789abcdef \
  --discovery-token-ca-cert-hash sha256:<hash>

# Regenerate join token if expired (24h TTL)
kubeadm token create --print-join-command

kubeadm Config File (HA)ClusterConfig

Production kubeadm configuration file for HA cluster with encryption at rest and audit logging.

# kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.1.10
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///run/containerd/containerd.sock
  kubeletExtraArgs:
    node-labels: "node-role=control-plane"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
clusterName: production
kubernetesVersion: v1.35.0
controlPlaneEndpoint: "k8s-api.example.com:6443"
networking:
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/12
  dnsDomain: cluster.local
etcd:
  local:
    dataDir: /var/lib/etcd
    extraArgs:
      auto-compaction-retention: "8"
      quota-backend-bytes: "8589934592"  # 8Gi
apiServer:
  certSANs:
  - k8s-api.example.com
  - 192.168.1.10
  - 192.168.1.11
  - 127.0.0.1
  extraArgs:
    audit-log-path: /var/log/kubernetes/audit.log
    audit-policy-file: /etc/kubernetes/audit-policy.yaml
    encryption-provider-config: /etc/kubernetes/encryption.yaml
    enable-admission-plugins: NodeRestriction,PodSecurity
  extraVolumes:
  - name: audit-logs
    hostPath: /var/log/kubernetes
    mountPath: /var/log/kubernetes
controllerManager:
  extraArgs:
    bind-address: 0.0.0.0
scheduler:
  extraArgs:
    bind-address: 0.0.0.0
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
containerLogMaxSize: 100Mi
containerLogMaxFiles: 5
maxPods: 110
kubeReserved:
  cpu: 200m
  memory: 500Mi
systemReserved:
  cpu: 200m
  memory: 500Mi
evictionHard:
  memory.available: "300Mi"
  nodefs.available: "10%"

# Run: kubeadm init --config kubeadm-config.yaml --upload-certs

k3s — Lightweight Kubernetes

// k3s is a CNCF-certified, fully conformant Kubernetes distribution packaged as a single binary. Ideal for edge computing, IoT, Raspberry Pi, and resource-constrained environments.

k3s Setup — Server + Agentsk3s

Full k3s cluster setup with a server (control plane) node and multiple agent (worker) nodes.

# ── INSTALL k3s SERVER (Control Plane) ───────────────────────
# Single command install — runs as systemd service
curl -sfL https://get.k3s.io | sh -s - server \
  --cluster-init \
  --tls-san k3s.example.com \
  --disable traefik \
  --disable servicelb \
  --flannel-backend=none \   # disable flannel (use Calico instead)
  --write-kubeconfig-mode 644

# Get node token (needed for agents to join)
cat /var/lib/rancher/k3s/server/node-token

# Get kubeconfig
cat /etc/rancher/k3s/k3s.yaml

# ── JOIN AGENT NODES ─────────────────────────────────────────
# Run on each worker node
curl -sfL https://get.k3s.io | K3S_URL=https://k3s.example.com:6443 \
  K3S_TOKEN=<node-token> sh -

# ── HA k3s WITH EMBEDDED etcd ────────────────────────────────
# First server (bootstraps etcd)
curl -sfL https://get.k3s.io | sh -s - server \
  --cluster-init \
  --token my-shared-secret

# Additional control plane servers join the cluster
curl -sfL https://get.k3s.io | sh -s - server \
  --server https://first-server:6443 \
  --token my-shared-secret

# ── k3s MANAGEMENT ───────────────────────────────────────────
kubectl get nodes                   # k3s bundles kubectl
systemctl status k3s                # service status
systemctl restart k3s               # restart server
k3s kubectl get pods -A             # alternative kubectl

# ── UNINSTALL ────────────────────────────────────────────────
/usr/local/bin/k3s-uninstall.sh     # server
/usr/local/bin/k3s-agent-uninstall.sh  # agent

16 / Certificates & PKI

📖 Official Docs: Certificates & PKI Managing TLS CSR Reference PKI Best Practices

Kubernetes PKI & Certificate Management

// Kubernetes uses TLS everywhere for secure communication between all components. Understanding the PKI is essential for troubleshooting, rotating certs, and securing clusters.

🏛

Kubernetes PKI

Certificate Authority Hierarchy

kubeadm creates a PKI under /etc/kubernetes/pki/. Two root CAs: the Kubernetes CA (for cluster components) and the etcd CA (for etcd peers). A front-proxy CA for API aggregation. Each CA signs specific leaf certificates. Self-signed by default — use your own CA for enterprise.

CA Hierarchy/etc/kubernetes/pki

📜

Certificate Files

What Gets Created

Cluster CA: ca.crt / ca.key

API Server: apiserver.crt (SANs: hostname, IPs, DNS), apiserver-kubelet-client.crt, apiserver-etcd-client.crt

etcd: etcd/ca.crt, etcd/server.crt, etcd/peer.crt, etcd/healthcheck-client.crt

Front Proxy: front-proxy-ca.crt, front-proxy-client.crt

SA Keys: sa.key / sa.pub (service account token signing)

1 year expiryAuto-renewed on upgrade

⏰

Certificate Expiry

Rotation & Renewal

kubeadm-issued certs expire after 1 year (CA: 10 years). kubeadm upgrade auto-renews certs. Manual renewal: kubeadm certs renew all. Check expiry: kubeadm certs check-expiration. Kubelet client certs auto-rotate when kubelet --rotate-certificates=true (default). Set up monitoring for cert expiry.

1yr ExpiryAuto-RotateMonitor Expiry

🔐

CertificateSigningRequest

Kubernetes CSR API

Kubernetes has a built-in CSR API for issuing certificates. Users/services submit CSR objects, admins approve them (kubectl certificate approve), and Kubernetes signs them with the cluster CA. Used for: adding new users, kubelet bootstrap, custom components needing cluster-trusted certs.

CSR APIkubectl certificate

🤖

cert-manager

Automated Certificate Lifecycle

The de-facto certificate controller for Kubernetes. Issues and renews certs from: Let's Encrypt (ACME), HashiCorp Vault, Venafi, self-signed, or cluster CA issuers. Stores certs as Kubernetes Secrets. Automatically renews before expiry. Used for Ingress TLS, mTLS between services, webhook server certs.

Let's EncryptVaultAuto-Renew

🔑

kubeconfig & User Auth

Client Certificate Auth

kubeconfig contains: cluster CA cert, client cert, client key (or token). Create new user cert: generate key → create CSR → submit K8s CSR → approve → download cert → add to kubeconfig. CN becomes username, O becomes group. Bind to RBAC roles using the username/group.

CN=usernameO=groupkubeconfig

Certificate Operations — Full ReferenceCertificates

All essential certificate management commands: inspection, renewal, rotation, and user creation.

# ── CHECK CERTIFICATE EXPIRY ─────────────────────────────────
kubeadm certs check-expiration

# Manual check with openssl
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -dates
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text \
  | grep -A2 "Subject Alternative"

# Check all certs in /etc/kubernetes/pki
for cert in /etc/kubernetes/pki/*.crt; do
  echo "=== $cert ===";
  openssl x509 -in $cert -noout -subject -dates 2>/dev/null;
done

# ── RENEW ALL CERTIFICATES ───────────────────────────────────
# Renew all control plane certs (run on control-plane node)
kubeadm certs renew all

# Renew specific cert
kubeadm certs renew apiserver
kubeadm certs renew apiserver-kubelet-client
kubeadm certs renew front-proxy-client

# After renewal: restart control plane components
kubectl -n kube-system delete pod -l component=kube-apiserver
kubectl -n kube-system delete pod -l component=kube-controller-manager
kubectl -n kube-system delete pod -l component=kube-scheduler

# Update kubeconfig after renewal
kubeadm kubeconfig user --client-name admin > ~/.kube/config

# ── CREATE A NEW USER WITH CERT AUTH ─────────────────────────
# Step 1: Generate user private key
openssl genrsa -out alice.key 4096

# Step 2: Create CSR (CN=username, O=group)
openssl req -new -key alice.key \
  -subj "/CN=alice/O=team-alpha" \
  -out alice.csr

# Step 3: Submit as Kubernetes CSR object
cat <# Step 4: Approve the CSR
kubectl certificate approve alice

# Step 5: Download signed cert
kubectl get csr alice -o jsonpath='{.status.certificate}' | \
  base64 -d > alice.crt

# Step 6: Add to kubeconfig
kubectl config set-credentials alice \
  --client-certificate=alice.crt \
  --client-key=alice.key \
  --embed-certs=true

kubectl config set-context alice-context \
  --cluster=my-cluster \
  --user=alice \
  --namespace=team-alpha

cert-manager — Issuer & Certificatecert-manager

cert-manager ClusterIssuer with Let's Encrypt + Certificate resource for automatic TLS.

# Install cert-manager
# kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml

# ── ClusterIssuer: Let's Encrypt Production ───────────────────
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
    - http01:      # HTTP-01 challenge via Ingress
        ingress:
          ingressClassName: nginx
    - dns01:       # DNS-01 for wildcard certs
        route53:
          region: us-east-1
          hostedZoneID: YOURZONEID
---
# ── Internal CA Issuer ────────────────────────────────────────
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: internal-ca
spec:
  ca:
    secretName: internal-ca-key-pair
---
# ── Certificate Resource ──────────────────────────────────────
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: app-tls
  namespace: production
spec:
  secretName: app-tls
  duration: 2160h    # 90 days
  renewBefore: 360h  # renew 15 days before expiry
  subject:
    organizations: [MyCompany]
  isCA: false
  privateKey:
    algorithm: RSA
    encoding: PKCS1
    size: 2048
  usages: [server auth, client auth]
  dnsNames:
  - app.example.com
  - www.app.example.com
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
    group: cert-manager.io

17 / Helm

📖 Official Docs: Helm Docs Chart Templates Charts Hooks Artifact Hub Kustomize

Helm — Kubernetes Package Manager

// Helm is the package manager for Kubernetes. Charts are packages of pre-configured Kubernetes resources. Releases track deployed instances. Repositories store and share charts.

📦

Core Concepts

Charts, Releases, Repos

Chart: Package of K8s resource templates + default values. Release: A running instance of a chart in the cluster. Repo: HTTP server hosting chart packages. Values: Configuration injected at install time. Revision: Versioned history of a release for rollbacks.

ChartsReleasesRepositories

📁

Chart Structure

Directory Layout

Chart.yaml: metadata (name, version, appVersion, description, dependencies). values.yaml: default config values. templates/: Go template files for K8s manifests. templates/NOTES.txt: post-install instructions. charts/: sub-chart dependencies. .helmignore: files to exclude from packaging.

Chart.yamlvalues.yamltemplates/

🔧

Templating Engine

Go Templates + Sprig

Templates use Go templating with Sprig function library. Built-in objects: .Values (from values.yaml), .Release (name, namespace, revision), .Chart (metadata), .Files (non-template files), .Capabilities (API versions). Named templates via define/include. Hooks: pre-install, post-install, pre-upgrade, pre-delete.

Go TemplatesSprigHooks

📊

Values Override

Configuration Layers

Values precedence (lowest → highest): chart defaults (values.yaml) → parent chart values → user values file (-f values.yaml) → --set flags. Use values files per environment (values-prod.yaml, values-staging.yaml). --set-string for string type enforcement. --set-file for file contents. Secrets: use helm-secrets plugin.

-f values.yaml--sethelm-secrets

🔄

Release Lifecycle

Install, Upgrade, Rollback

helm install: deploy a chart. helm upgrade: update a release (--install for upsert). helm rollback: revert to previous revision. helm uninstall: remove release + resources. helm history: view revision history. helm status: current state of release. Atomic installs: --atomic rolls back on failure.

upgrade --installrollback--atomic

🧪

Testing & Linting

Chart Quality

helm lint: validate chart for errors. helm template: render templates locally without installing. helm test: run test pods (annotated with helm.sh/hook: test). helm diff plugin: show what would change before upgrade. ct (chart-testing) for CI validation. unittest Helm plugin for TDD of templates.

helm linthelm templatehelm diff

Helm CLI — Complete ReferenceHelm Commands

Essential Helm commands for managing charts, releases, repositories, and debugging.

# ── INSTALLATION ─────────────────────────────────────────────
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
brew install helm   # macOS
helm version

# ── REPOSITORIES ─────────────────────────────────────────────
helm repo add stable https://charts.helm.sh/stable
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update                          # fetch latest chart versions
helm repo list                            # list configured repos
helm repo remove bitnami                  # remove repo
helm search repo nginx                    # search in repos
helm search hub wordpress                 # search artifact hub

# ── INSTALL / DEPLOY ─────────────────────────────────────────
helm install my-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace \
  --version 4.10.0 \
  -f values-prod.yaml \
  --set controller.replicaCount=2

# Upsert (install or upgrade if exists)
helm upgrade --install my-app ./my-chart \
  --namespace production \
  --create-namespace \
  --atomic \
  --timeout 5m \
  -f values.yaml \
  --set image.tag=v1.2.3

# ── INSPECT BEFORE INSTALLING ────────────────────────────────
helm show chart bitnami/postgresql        # chart metadata
helm show values bitnami/postgresql       # all default values
helm template my-release ./my-chart \
  -f values.yaml > rendered.yaml         # render locally
helm install my-release ./my-chart --dry-run --debug

# ── MANAGE RELEASES ──────────────────────────────────────────
helm list -A                              # all releases, all namespaces
helm status my-app -n production          # release status
helm history my-app -n production         # revision history
helm rollback my-app 2 -n production      # rollback to revision 2
helm uninstall my-app -n production       # remove release
helm get values my-app -n production      # get user-supplied values
helm get manifest my-app -n production    # get rendered manifests

# ── CHART DEVELOPMENT ────────────────────────────────────────
helm create my-chart                      # scaffold new chart
helm lint ./my-chart                      # validate chart
helm package ./my-chart                   # create .tgz package
helm push my-chart-1.0.0.tgz oci://registry.example.com/charts  # push to OCI

# ── PLUGINS ──────────────────────────────────────────────────
helm plugin install https://github.com/databus23/helm-diff
helm diff upgrade my-app ./my-chart -f values.yaml  # show diff before upgrade
helm plugin install https://github.com/jkroepke/helm-secrets
helm secrets upgrade my-app ./my-chart -f secrets.enc.yaml

Chart.yaml + values.yamlChart Files

Chart metadata and default values file for a web application chart with dependency management.

# Chart.yaml
apiVersion: v2
name: web-app
description: A Helm chart for web-app microservice
type: application
version: 1.4.2       # chart version (semver)
appVersion: "2.1.0"  # app version (informational)
keywords: [web, api, microservice]
maintainers:
- name: Platform Team
  email: [email protected]
dependencies:
- name: postgresql
  version: "~14.x.x"
  repository: https://charts.bitnami.com/bitnami
  condition: postgresql.enabled
- name: redis
  version: "~18.x.x"
  repository: https://charts.bitnami.com/bitnami
  condition: redis.enabled
---
# values.yaml
replicaCount: 2

image:
  repository: registry.example.com/web-app
  pullPolicy: IfNotPresent
  tag: ""  # overridden by CI with .Chart.AppVersion

service:
  type: ClusterIP
  port: 80
  targetPort: 8080

ingress:
  enabled: true
  className: nginx
  host: app.example.com
  tls: true

resources:
  requests: { cpu: 100m, memory: 128Mi }
  limits:   { cpu: 500m, memory: 256Mi }

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

postgresql:
  enabled: true
  auth:
    database: mydb
    existingSecret: postgres-secret

Helm Template — DeploymentGo Template

A production-quality Helm template for a Deployment using values, conditionals, helpers, and hooks.

{{/* templates/deployment.yaml */}}
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "web-app.fullname" . }}
  namespace: {{ .Release.Namespace }}
  labels:
    {{- include "web-app.labels" . | nindent 4 }}
  annotations:
    app.kubernetes.io/managed-by: {{ .Release.Service }}
    helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "web-app.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      labels:
        {{- include "web-app.selectorLabels" . | nindent 8 }}
      annotations:
        {{/* Force pod restart when configmap changes */}}
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
    spec:
      containers:
      - name: {{ .Chart.Name }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        ports:
        - containerPort: {{ .Values.service.targetPort }}
        {{- with .Values.resources }}
        resources:
          {{- toYaml . | nindent 10 }}
        {{- end }}
        {{- if .Values.envFrom }}
        envFrom:
          {{- toYaml .Values.envFrom | nindent 10 }}
        {{- end }}
---
{{/* templates/_helpers.tpl */}}
{{- define "web-app.fullname" -}}
{{- printf "%s-%s" .Release.Name .Chart.Name | trunc 63 | trimSuffix "-" }}
{{- end }}

{{- define "web-app.labels" -}}
helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version }}
{{ include "web-app.selectorLabels" . }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

{{- define "web-app.selectorLabels" -}}
app.kubernetes.io/name: {{ .Chart.Name }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}

18 / Expert Level

📖 Official Docs: API Priority & Fairness Server-Side Apply Mixed-Version Proxy Node Resource Managers Kubelet Checkpoint Kubernetes Blog

Expert Kubernetes Patterns

// Advanced topics for senior engineers and platform teams: performance tuning, GitOps, multi-cluster, eBPF networking, cost optimization, and production hardening.

🔥

GitOps

Declarative CD with ArgoCD / Flux

GitOps: Git is the single source of truth for cluster state. A GitOps operator continuously reconciles cluster state to match git. ArgoCD: pull-based, rich UI, multi-cluster, App of Apps pattern. Flux v2: GitOps Toolkit, Kustomization + HelmRelease CRDs, image automation. Both support progressive delivery (Argo Rollouts, Flagger).

ArgoCDFlux v2Reconciliation

🌍

Multi-Cluster

Federation & Management

Patterns: Hub-spoke (one management cluster controls workload clusters), fleet management (Cluster API, Rancher Fleet, ArgoCD ApplicationSets). Service mesh federation (Istio multicluster, Cilium Cluster Mesh) for cross-cluster communication. KubeVela, Liqo for workload placement. Submariner for cross-cluster networking.

Cluster APICilium MeshFleet

⚡

eBPF Networking

Cilium & Kernel-Level Observability

Cilium uses eBPF to implement networking, security, and observability directly in the Linux kernel — bypassing iptables entirely. Enables: L7 NetworkPolicies (HTTP, gRPC, Kafka-aware), transparent encryption (WireGuard/IPSec), Hubble for network flow visibility, per-call latency metrics, DNS security policies.

eBPFNo iptablesHubble

💰

Cost Optimization

FinOps for Kubernetes

Rightsizing: VPA recommendations, Goldilocks tool. Spot/Preemptible nodes with node pools + PodDisruptionBudgets. Karpenter (AWS) / Cluster Autoscaler for dynamic node provisioning. OpenCost / Kubecost for cost visibility per namespace/team. Bin-packing with balanced resource allocation. Pod topology constraints to avoid cross-AZ traffic costs.

KarpenterKubecostSpot Nodes

🚀

Progressive Delivery

Canary, Blue/Green, A/B

Argo Rollouts: CRD-based Deployment replacement with canary, blue/green, analysis runs (automated rollback if metrics degrade). Flagger: Istio/Linkerd-based canary automation. Gateway API traffic splitting for canary without service mesh. Feature flags (LaunchDarkly, Flagd) for application-level A/B testing independent of deployments.

Argo RolloutsFlaggerAnalysis Runs

📐

Kustomize

Template-Free Config Management

Kustomize uses overlays and patches instead of templating. Base: common resources. Overlays: environment-specific patches (dev, staging, prod). Strategic merge patches, JSON patches, image transformers, namespace transformers. Built into kubectl (kubectl apply -k). Pairs well with Flux and ArgoCD for GitOps.

OverlaysPatcheskubectl -k

🛡

Supply Chain Security

SLSA & Sigstore

SBOM (Software Bill of Materials) generation with Syft. Image signing with Cosign (keyless via Sigstore/Fulcio). Policy enforcement with Kyverno verifyImages or Connaisseur. SLSA framework for build provenance attestations. In-toto for supply chain integrity. Rekor transparency log for tamper-evident signing events.

CosignSBOMSLSA

⚙️

Performance Tuning

Large-Scale Optimization

API server: increase --max-requests-inflight, tune --event-ttl. etcd: SSD storage, tune heartbeat-interval, snapshot-count. Scheduler: tune qps/burst. Kubelet: reduce --sync-frequency, tune image GC. Use node local DNS cache to reduce CoreDNS load. Watch vs List for controllers. Large clusters (>1000 nodes): use Kube-OVN or Calico with BGP.

API Server Tuningetcd SSDDNS Cache

🔒

Zero-Trust Security

Defence in Depth

Workload Identity: SPIFFE/SPIRE for cryptographic workload identity. mTLS everywhere with Istio/Cilium. OPA/Gatekeeper for policy-as-code. Seccomp profiles (RuntimeDefault or custom). AppArmor/SELinux profiles. Falco for runtime threat detection (syscall monitoring). Encrypted etcd at rest. Regular CIS Benchmark scanning with kube-bench.

SPIFFE/SPIREFalcokube-bench

🤝

KEDA

Event-Driven Autoscaling

Kubernetes Event-Driven Autoscaling — extends HPA to scale on external events. 60+ built-in scalers: Kafka lag, RabbitMQ queue depth, Redis lists, AWS SQS/Kinesis, GCP Pub/Sub, Azure Service Bus, Prometheus metrics, cron schedules. Scales to zero (unlike HPA min=0). ScaledObject and ScaledJob CRDs.

Scale to ZeroKafka60+ Scalers

🧱

Platform Engineering

Internal Developer Platform

Building an IDP on top of Kubernetes: Backstage (service catalog, software templates), Crossplane (infrastructure as K8s CRDs — provision cloud resources), Port.io (developer portal), Kratix (platform promises), vcluster for on-demand dev namespaces. Golden paths: self-service templates that encode best practices.

BackstageCrossplaneIDP

🔁

Chaos Engineering

Resilience Testing

Chaos Mesh: CRD-based chaos experiments — PodChaos (kill pods), NetworkChaos (delay/loss/partition), StressChaos (CPU/memory), IOChaos (disk failures), TimeChaos. LitmusChaos: CNCF project with experiment hub. k6: load testing from within cluster. Always run chaos in staging first; gate production chaos behind GameDays.

Chaos MeshLitmusChaosResilience

ArgoCD ApplicationGitOps

ArgoCD Application resource deploying from a Helm chart in git with auto-sync and self-healing.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: web-app-production
  namespace: argocd
  finalizers:
  - resources-finalizer.argocd.argoproj.io
spec:
  project: production
  source:
    repoURL: https://github.com/myorg/k8s-gitops
    targetRevision: HEAD
    path: apps/web-app
    helm:
      valueFiles:
      - values-prod.yaml
      parameters:
      - name: image.tag
        value: v2.1.0
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true     # delete resources removed from git
      selfHeal: true  # revert manual cluster changes
    syncOptions:
    - CreateNamespace=true
    - PrunePropagationPolicy=foreground
    - ApplyOutOfSyncOnly=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        maxDuration: 3m
        factor: 2
  revisionHistoryLimit: 10

Argo Rollouts — CanaryProgressive Delivery

Argo Rollouts canary strategy with automated analysis — rolls back if error rate exceeds threshold.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: web-app-rollout
  namespace: production
spec:
  replicas: 10
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: myapp:v2.0.0
  strategy:
    canary:
      canaryService: web-app-canary
      stableService: web-app-stable
      trafficRouting:
        nginx:
          stableIngress: web-app-ingress
      steps:
      - setWeight: 5    # 5% traffic to canary
      - pause: {duration: 5m}
      - analysis:      # run automated analysis
          templates:
          - templateName: success-rate
      - setWeight: 20
      - pause: {duration: 10m}
      - setWeight: 50
      - pause: {duration: 10m}
      - setWeight: 100
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
  - name: success-rate
    interval: 1m
    successCondition: result[0] >= 0.95
    failureLimit: 3
    provider:
      prometheus:
        address: http://prometheus:9090
        query: |
          sum(rate(http_requests_total{status!~"5.."}[5m]))
          / sum(rate(http_requests_total[5m]))

KEDA ScaledObjectEvent-Driven Autoscaling

KEDA ScaledObject scaling a consumer Deployment based on Kafka consumer group lag.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: kafka-consumer
  pollingInterval: 15
  cooldownPeriod: 300
  minReplicaCount: 0  # scale to zero!
  maxReplicaCount: 50
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka:9092
      consumerGroup: my-consumer-group
      topic: orders
      lagThreshold: "100"  # 1 replica per 100 messages lag
      offsetResetPolicy: latest
---
# ScaledJob: scale Jobs (not Deployments) for batch processing
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: sqs-processor
spec:
  jobTargetRef:
    template:
      spec:
        containers:
        - name: processor
          image: myprocessor:latest
        restartPolicy: Never
  maxReplicaCount: 100
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123/my-queue
      targetQueueLength: "1"
      awsRegion: us-east-1

Crossplane — Cloud InfrastructurePlatform Engineering

Crossplane Composite Resource Definition — provision an AWS RDS instance as a Kubernetes resource.

# Crossplane lets you provision cloud infra as K8s resources
# Developers request infrastructure via K8s objects

apiVersion: database.example.com/v1alpha1
kind: PostgreSQLInstance
metadata:
  name: my-db
  namespace: production
spec:
  parameters:
    storageGB: 20
    size: db.t3.medium
    version: "15"
  compositionSelector:
    matchLabels:
      provider: aws
      env: production
  writeConnectionSecretToRef:
    name: my-db-conn   # K8s Secret with DB connection string
---
# Kustomize overlay structure example
# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
- configmap.yaml
---
# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: production
namePrefix: prod-
resources:
- ../../base
patches:
- patch: |
    - op: replace
      path: /spec/replicas
      value: 5
  target:
    kind: Deployment
images:
- name: myapp
  newTag: v2.1.0

19 / Service Mesh

📖 Official Docs: Services & Networking Istio Docs Linkerd Docs Cilium Service Mesh Gateway API SIG Istio Security Istio Traffic Mgmt

Service Mesh — Deep Dive

// A service mesh is a dedicated infrastructure layer for handling service-to-service communication. It provides traffic management, security (mTLS), and observability — without changing application code.

// service mesh architecture — data plane vs control plane

⚙ Control Plane (Istiod)

Pilot — service discovery & traffic rules

Citadel — certificate authority (mTLS)

Galley — config validation & distribution

Telemetry — metrics, traces, logs

Webhook — sidecar auto-injection

⇄

⬡ Data Plane (Envoy Sidecars)

Envoy Proxy — intercepts all traffic

Inbound listener — incoming requests

Outbound listener — egress routing

mTLS termination & origination

Metrics, traces, access logs

// request flow through mesh

📦

Pod A App

→

🔀

Sidecar (Envoy) Outbound

→

🔐

mTLS Encrypt

→

🔐

mTLS Decrypt

→

🔀

Sidecar (Envoy) Inbound

→

📦

Pod B App

Core Concepts

🔐

Mutual TLS (mTLS)

Zero-Trust Service Identity

Both client and server authenticate each other using X.509 certificates. Certificates issued automatically by the mesh CA (Citadel/Istiod). Modes: DISABLE, PERMISSIVE (accepts both TLS and plaintext — migration mode), STRICT (TLS only). Identity based on SPIFFE/SPIRE — URI SANs tied to ServiceAccount. Eliminates need for application-level auth between services.

STRICT modeSPIFFEAuto Cert RotationPERMISSIVE

🔀

Traffic Management

Fine-Grained Routing Control

Route traffic based on: HTTP headers, URI, method, weight (canary), source labels. VirtualService defines routing rules. DestinationRule defines subsets (versions) and load balancing policy. Timeout and retry configuration at the mesh level — no code changes. Supports A/B testing, canary deployments, blue/green deployments, mirroring (shadow traffic).

VirtualServiceDestinationRuleCanaryMirroring

🛡

Resilience Patterns

Circuit Breaking & Retries

Retries: Automatic retry on 5xx errors, configurable attempts and retry conditions. Timeout: Per-route request timeout. Circuit Breaker: Outlier detection ejects unhealthy hosts from load balancing pool. Bulkhead: Connection pool limits prevent cascade failures. All configured in DestinationRule without code changes.

Circuit BreakerRetriesTimeoutOutlier Detection

📊

Observability

Automatic Telemetry

Every sidecar automatically emits: L7 metrics (request rate, error rate, latency percentiles — the Golden Signals), distributed traces (Zipkin/Jaeger compatible), and access logs. No application instrumentation needed. Kiali — service mesh topology UI. Grafana dashboards included in istio addons. Trace context propagated via B3 or W3C headers.

Golden SignalsKialiAuto TracingZero Instrumentation

🌐

Ingress & Egress Gateways

Mesh Edge Traffic

IngressGateway: Entry point for external traffic into the mesh — replaces traditional Ingress. Terminates TLS, enforces policies before traffic enters mesh. EgressGateway: Controls all outbound traffic leaving the mesh to external services. Enforce policy: which services can access external APIs. Register external services with ServiceEntry.

GatewayServiceEntryEgress Control

🔑

Authorization Policies

L7 Access Control

Fine-grained L4-L7 access control based on service identity, request properties. Define: which source (from), what target (to), which conditions (when). Example: allow only frontend to call backend on /api/* with GET. Works alongside Kubernetes RBAC (different layer). PeerAuthentication for mTLS mode per namespace/workload.

AuthorizationPolicyPeerAuthenticationL7 ACL

Service Mesh Comparison

// Major service mesh implementations — choose based on complexity tolerance, performance needs, and feature requirements.

Mesh	Data Plane	Architecture	Strengths	Trade-offs	Best For
Istio	Envoy	Sidecar + Istiod control plane	Most features, rich traffic mgmt, large community, Gateway API support	High resource overhead, complexity, steep learning curve	Large enterprises needing full feature set
Linkerd	Rust proxy	Sidecar + linkerd-control-plane	Ultra-lightweight, simple install, excellent performance, CNCF graduated	Fewer advanced features than Istio, no Envoy ecosystem	Teams wanting simplicity and low overhead
Cilium	eBPF	No sidecar — kernel-level eBPF	Zero sidecar overhead, highest performance, L3-L7, NetworkPolicy, Gateway API	Requires Linux kernel ≥5.10, newer project	Performance-critical, CNI + mesh in one
Consul Connect	Envoy	Sidecar + Consul server	Multi-platform (VMs + K8s), HashiCorp ecosystem, service catalog	Requires Consul cluster, more ops burden	Hybrid cloud / VM + Kubernetes environments
AWS App Mesh	Envoy	Sidecar + AWS managed CP	Native AWS integration, managed control plane, no CP ops	AWS-only, less flexible, fewer features	AWS-native teams wanting managed option
Kuma / Kong	Envoy	Sidecar or sidecarless	Multi-zone mesh, universal (K8s + VMs), Kong ecosystem integration	Smaller community than Istio	Multi-zone deployments, Kong API Gateway users

Feature Deep Dive

🔄

Load Balancing Algorithms

Mesh-level LB goes beyond kube-proxy's basic round-robin:

ROUND_ROBIN — default, even distribution
LEAST_CONN — route to least active connection
RANDOM — random endpoint selection
PASSTHROUGH — forward to original destination
Consistent hash — sticky sessions (cookie, header, source IP)
Locality-aware — prefer same zone/region

🕶

Traffic Mirroring (Shadow)

Send a copy of live traffic to a shadow service for testing:

100% of live requests duplicated to shadow
Shadow responses are discarded (fire-and-forget)
Test new versions with real traffic — no user impact
Validate performance, correctness, error rates
Configured via VirtualService mirror field
mirrorPercentage to control mirrored volume

🎚

Canary & Traffic Splitting

Gradually shift traffic between service versions:

Weight-based: 90% v1 → 10% v2
Header-based: route beta users to v2
Progressive delivery with Flagger + Argo Rollouts
Automated rollback on metric degradation
A/B testing by user segment
Blue/green with instant cutover

🔍

Fault Injection

Deliberately inject failures to test resilience:

HTTP delay injection (artificial latency)
HTTP abort injection (return 5xx errors)
Test circuit breaker behavior
Chaos engineering without Chaos tools
Configured per-route in VirtualService
Percentage-based — affect % of requests

🏷

Service Entry

Add external APIs (AWS S3, Stripe, etc.) to mesh
Apply mTLS, retries, timeouts to external calls
Block all external traffic — allowlist via ServiceEntry
MESH_INTERNAL vs MESH_EXTERNAL location
Supports TCP, HTTP, HTTPS, gRPC protocols
Enable Egress Gateway enforcement

🏗

Sidecar Resource

Limit sidecar proxy scope to reduce memory and CPU:

By default Envoy holds config for ALL services
Sidecar resource scopes proxy to only needed services
Reduces config push size — faster convergence
Define egress hosts the workload actually calls
Critical for large clusters (1000+ services)
Namespace-scoped or workload-specific

Istio YAML Examples

VirtualService — Traffic Splitting Traffic Management

Route 90% traffic to stable v1, 10% to canary v2, with header-based override for testers.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: web-app-vs
  namespace: production
spec:
  hosts:
  - web-app
  http:
  - # Testers with X-Canary header go to v2
    match:
    - headers:
        x-canary:
          exact: "true"
    route:
    - destination:
        host: web-app
        subset: v2
  - # Everyone else: 90/10 split
    route:
    - destination:
        host: web-app
        subset: v1
      weight: 90
    - destination:
        host: web-app
        subset: v2
      weight: 10
    timeout: 30s
    retries:
      attempts: 3
      perTryTimeout: 10s
      retryOn: gateway-error,connect-failure,retriable-4xx
    mirror:
      host: web-app-shadow
      subset: v2
    mirrorPercentage:
      value: 10.0  # Mirror 10% to shadow

DestinationRule — Circuit Breaker Resilience

Define subsets (v1/v2) and circuit breaker with connection pool limits and outlier detection.

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: web-app-dr
  namespace: production
spec:
  host: web-app
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 50
        http2MaxRequests: 1000
        maxRequestsPerConnection: 10
    outlierDetection:  # Circuit breaker
      consecutiveGatewayErrors: 5
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
    loadBalancer:
      simple: LEAST_CONN
  subsets:
  - name: v1
    labels:
      version: v1
    trafficPolicy:
      loadBalancer:
        simple: ROUND_ROBIN
  - name: v2
    labels:
      version: v2
    trafficPolicy:
      loadBalancer:
        consistentHash:
          httpHeaderName: x-user-id  # Sticky by user

PeerAuthentication & AuthorizationPolicy Security

Enforce strict mTLS for a namespace and define L7 access control — only frontend can call backend.

# Enforce STRICT mTLS for entire namespace
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT
---
# Allow frontend → backend on /api/* GET only
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: backend-authz
  namespace: production
spec:
  selector:
    matchLabels:
      app: backend
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - cluster.local/ns/production/sa/frontend-sa
    to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/api/*"]
---
# Deny all other traffic to backend (default deny)
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: backend-deny-all
  namespace: production
spec:
  selector:
    matchLabels:
      app: backend
  action: DENY
  rules:
  - from:
    - source:
        notPrincipals:
        - cluster.local/ns/production/sa/frontend-sa

Istio Gateway + ServiceEntry Edge Traffic

Ingress Gateway for external traffic entry and ServiceEntry to allow egress to an external API.

# Istio Ingress Gateway
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: main-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: app-tls-cert
    hosts:
    - app.example.com
  - port:
      number: 80
      name: http
      protocol: HTTP
    tls:
      httpsRedirect: true  # Force HTTPS
    hosts:
    - app.example.com
---
# Allow egress to external Stripe API
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: stripe-api
  namespace: production
spec:
  hosts:
  - api.stripe.com
  ports:
  - number: 443
    name: https
    protocol: HTTPS
  resolution: DNS
  location: MESH_EXTERNAL

Fault Injection Testing Chaos / Resilience

Inject latency and HTTP errors into requests to test application resilience and circuit breaker behavior.

# Inject 3s delay for 10% of requests to ratings service
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: ratings-fault-injection
spec:
  hosts:
  - ratings
  http:
  - fault:
      delay:
        percentage:
          value: 10.0
        fixedDelay: 3s
      abort:
        percentage:
          value: 5.0
        httpStatus: 503
    route:
    - destination:
        host: ratings
        subset: v1
---
# Sidecar resource scoping for large clusters
apiVersion: networking.istio.io/v1alpha3
kind: Sidecar
metadata:
  name: web-app-sidecar
  namespace: production
spec:
  workloadSelector:
    labels:
      app: web-app
  egress:
  - hosts:
    - ./backend      # same namespace
    - ./postgres
    - istio-system/*
    - monitoring/prometheus
    # Only these — Envoy won't load other services

Linkerd — Annotations & Profile Linkerd

Linkerd mesh injection via annotations and ServiceProfile for per-route metrics and retries.

# Enable Linkerd injection for a namespace
apiVersion: v1
kind: Namespace
metadata:
  name: production
  annotations:
    linkerd.io/inject: enabled
---
# Per-pod injection control
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  template:
    metadata:
      annotations:
        linkerd.io/inject: enabled
        config.linkerd.io/proxy-cpu-request: "10m"
        config.linkerd.io/proxy-memory-request: "20Mi"
---
# ServiceProfile for per-route observability & retries
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: web-app.production.svc.cluster.local
  namespace: production
spec:
  routes:
  - name: GET /api/users
    condition:
      method: GET
      pathRegex: /api/users.*
    isRetryable: true
    timeout: 30s
  - name: POST /api/orders
    condition:
      method: POST
      pathRegex: /api/orders
    isRetryable: false  # Non-idempotent
    timeout: 60s

When to Use a Service Mesh

Scenario	Without Mesh	With Mesh	Verdict
mTLS between services	Manual cert management per service	Automatic, zero-config mTLS	USE MESH
Canary deployments	Needs two services + Ingress hacks	VirtualService weight split	USE MESH
Distributed tracing	Instrument every app with SDK	Automatic from sidecar	USE MESH
Small cluster (1–5 services)	Simple, low overhead	Adds 50–100ms latency, complexity	SKIP MESH
Circuit breaking	Implement per-service (Hystrix, Resilience4j)	One DestinationRule for all	USE MESH
Compliance (SOC2, PCI-DSS)	Hard to prove in-transit encryption	mTLS + audit logs prove it	USE MESH
Resource-constrained edge/IoT	Direct pod communication	Sidecar doubles memory per pod	SKIP MESH

20 / Kubernetes Plugins

📖 Official Docs: kubectl Plugins Krew Docs Plugin Index kubectl Reference

kubectl Plugins & Krew Ecosystem

// kubectl plugins extend the CLI with new commands. Krew is the official plugin manager — over 200 community plugins available. Any executable named kubectl-* in your PATH becomes a kubectl subcommand.

// how kubectl plugins work

① Discovery

kubectl scans every directory in your $PATH for executables starting with kubectl-. No registration needed.

② Naming

File kubectl-ns-switch → command kubectl ns-switch. Hyphens in filenames become spaces in the command. Must be executable.

③ Language

Write plugins in any language — Bash, Python, Go, Ruby. Go is most common (uses client-go). Shell scripts great for simple wrappers.

④ Krew

Package manager for plugins. Cross-platform. Curated index with 200+ plugins. Auto-update. kubectl krew install <name>

Krew — Install & Basic Usage Plugin Manager

Install Krew, then discover and manage kubectl plugins from the community index.

# ── INSTALL KREW (macOS/Linux) ────────────────────────────────
(
  set -x; cd "$(mktemp -d)" &&
  OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
  ARCH="$(uname -m | sed 's/x86_64/amd64/;s/arm.*/arm/;s/aarch64$/arm64/')" &&
  KREW="krew-${OS}_${ARCH}" &&
  curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
  tar zxvf "${KREW}.tar.gz" &&
  ./"${KREW}" install krew
)

# Add to shell profile (.bashrc / .zshrc)
export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"

# ── KREW COMMANDS ─────────────────────────────────────────────
kubectl krew version                    # show krew version
kubectl krew update                     # update plugin index
kubectl krew search                     # list all available plugins
kubectl krew search <keyword>           # search plugins by keyword
kubectl krew info <plugin>              # details about a plugin
kubectl krew install <plugin>           # install a plugin
kubectl krew install <p1> <p2> <p3>    # install multiple at once
kubectl krew upgrade                    # upgrade all installed plugins
kubectl krew upgrade <plugin>           # upgrade specific plugin
kubectl krew uninstall <plugin>         # remove a plugin
kubectl krew list                       # list installed plugins

# ── DISCOVER ALL PLUGINS (without krew) ──────────────────────
kubectl plugin list                     # show all plugins in PATH

# ── INSTALL ESSENTIAL PLUGINS IN ONE GO ──────────────────────
kubectl krew install \
  ctx ns stern neat tree \
  who-can access-matrix rbac-view \
  resource-capacity node-shell \
  images outdated \
  konfig view-secret \
  doctor popeye

Essential Plugin Categories

// Context, namespace & cluster navigation

🔀

kubectx / ctx

Fastest way to switch between Kubernetes contexts (clusters). Fuzzy search supported. Works with fzf for interactive selection.

kubectl ctx — list all contexts
kubectl ctx prod — switch to prod
kubectl ctx - — switch to previous
kubectl ctx -d old-ctx — delete context
Install: krew install ctx

📁

kubens / ns

Instant namespace switching with auto-completion. Sets the default namespace so you don't need -n on every command.

kubectl ns — list all namespaces
kubectl ns production — switch namespace
kubectl ns - — switch to previous ns
Pairs perfectly with kubectx
Install: krew install ns

🗂

konfig

Merge, split, and import kubeconfig files. Essential when working with many clusters. Safely combines configs without conflicts.

kubectl konfig merge a.yaml b.yaml
kubectl konfig split — split into files
kubectl konfig import --save cfg.yaml
Non-destructive — validates before merging
Install: krew install konfig

// Logs, debugging & troubleshooting

📜

stern

Multi-pod log tailing with color coding per pod. Regex filtering. Tail logs from all pods matching a pattern simultaneously.

stern web-app — tail all pods matching name
stern . -n prod — all pods in namespace
stern web --since 15m — last 15 minutes
stern web -c sidecar — specific container
Install: krew install stern

🌳

tree

Show ownership hierarchy of Kubernetes objects. Visualise which resources belong to a Deployment/ReplicaSet/Pod chain.

kubectl tree deploy web-app
Shows: Deployment → ReplicaSet → Pods
Works with any resource (StatefulSet, Job…)
Shows owner references as a tree
Install: krew install tree

🔬

neat

Remove clutter from kubectl output. Strips managed fields, status, and default values to make YAML readable and reusable.

kubectl get pod web -o yaml | kubectl neat
Removes: managedFields, creationTimestamp
Removes: status, default annotations
Perfect for exporting clean manifests
Install: krew install neat

🏥

doctor

Cluster health checker — scans for common misconfigurations, missing resources, and best-practice violations.

kubectl doctor — full cluster scan
Checks: deprecated API versions
Checks: pods without resource limits
Checks: services without endpoints
Install: krew install doctor

💥

node-shell

Open an interactive shell directly on a Kubernetes node — without SSH. Spins up a privileged pod with host PID/network access.

kubectl node-shell node1
Uses a privileged DaemonSet pod
Full node access: systemd, journalctl, crictl
Auto-cleans up the debug pod after exit
Install: krew install node-shell

🔎

popeye

Live Kubernetes cluster sanitizer. Scans resources and reports issues with graded severity across your whole cluster or namespace.

kubectl popeye — full scan with report
kubectl popeye -n production
Grades: A (clean) to F (critical issues)
Checks: resource limits, probes, RBAC, images
Install: krew install popeye

// Security, RBAC & access auditing

🔐

who-can

Show which users and ServiceAccounts can perform specific actions. Essential for RBAC auditing and security reviews.

kubectl who-can get pods
kubectl who-can delete secrets -n prod
kubectl who-can create deployments
Shows: users, groups, serviceaccounts
Install: krew install who-can

🗝

access-matrix

Show an RBAC access matrix for all resources and verbs for a user or ServiceAccount. Visual permission grid.

kubectl access-matrix
kubectl access-matrix --sa mysa -n prod
Grid: resources × verbs (get/list/create…)
Color coded: ✓ allowed, ✗ denied
Install: krew install access-matrix

📊

rbac-view

Web UI for visualizing RBAC permissions in your cluster. Launches a local server with an interactive permission explorer.

kubectl rbac-view — opens browser UI
Visual graph of Role → Subject bindings
Filter by namespace, subject, or resource
Export reports as HTML
Install: krew install rbac-view

🕵

view-secret

Decode and view Kubernetes Secret values directly — no manual base64 decoding required.

kubectl view-secret my-secret
kubectl view-secret my-secret key
Automatically base64 decodes all values
Shows all keys in a secret at once
Install: krew install view-secret

// Resources, capacity & image management

📈

resource-capacity

Overview of resource requests, limits, and live utilization per node and pod. Better than kubectl top.

kubectl resource-capacity — node overview
kubectl resource-capacity --pods
kubectl resource-capacity --util — live usage
Shows: requests, limits, % utilization
Install: krew install resource-capacity

🖼

images

List all container images running across your cluster. Filter by namespace, show image digests, or list unique images only.

kubectl images — all images cluster-wide
kubectl images -n prod
kubectl images --no-trunc — full names
Great for auditing image versions
Install: krew install images

⏰

outdated

Scan your cluster for containers running outdated images. Checks Docker Hub and other registries for newer tags.

kubectl outdated — full cluster scan
Shows: current tag vs latest available
Flags: patch / minor / major updates
Supports private registries with credentials
Install: krew install outdated

🔄

pv-migrate

Migrate PersistentVolumeClaim data between storage classes, namespaces, or clusters. Uses rsync under the hood.

kubectl pv-migrate migrate src-pvc dst-pvc
Cross-namespace migration supported
Cross-cluster migration with kubeconfig
Handles live vs stopped workloads
Install: krew install pv-migrate

// Networking, certificates & cluster tools

🌐

ingress-nginx

Interact with and inspect NGINX Ingress controller deployments. Debug routing, backends, and configuration.

kubectl ingress-nginx backends
kubectl ingress-nginx conf --host foo.com
kubectl ingress-nginx logs
kubectl ingress-nginx exec -- nginx -T
Install: krew install ingress-nginx

🔒

cert-manager

Interact with cert-manager resources directly. Trigger renewals, inspect certificate status, debug issuance failures.

kubectl cert-manager status certificate tls
kubectl cert-manager renew tls-cert
kubectl cert-manager inspect secret tls
Shows: expiry, issuer, SANs, renewal status
Install: krew install cert-manager

🗺

np-viewer

Visualize NetworkPolicy rules in a human-readable format. Explains what traffic is allowed or blocked for selected pods.

kubectl np-viewer -n production
Shows ingress and egress rules per pod
Highlights overlapping / conflicting policies
Exports as diagram or text table
Install: krew install np-viewer

📋

view-cert

Decode and display TLS certificate details stored in Kubernetes Secrets. Inspect expiry, issuer, and SANs without openssl.

kubectl view-cert my-tls-secret
Shows: subject, issuer, SANs, expiry date
Warns if cert expires soon
Works with any kubernetes.io/tls secret
Install: krew install view-cert

Writing Your Own Plugin

// Plugins can be simple shell scripts or full Go binaries. Any executable named kubectl-* in your PATH works instantly.

Shell Script Plugin Bash Plugin

A simple bash plugin: kubectl podfull — shows detailed pod info with node, IP, age, and resource usage in one view.

#!/usr/bin/env bash
# File: kubectl-podfull  (chmod +x, place in PATH)
# Usage: kubectl podfull [namespace]

set -euo pipefail

NS="${1:--A}"  # default: all namespaces
FLAG="--all-namespaces"
[ "$NS" != "-A" ] && FLAG="-n $NS"

echo ""
printf "%-50s %-15s %-15s %-8s %-8s\n" \
  "POD" "NODE" "IP" "STATUS" "RESTARTS"
echo "$(printf '─%.0s' {1..100})"

kubectl get pods $FLAG \
  -o custom-columns=\
'NAME:.metadata.name,
NODE:.spec.nodeName,
IP:.status.podIP,
STATUS:.status.phase,
RESTARTS:.status.containerStatuses[0].restartCount' \
  --sort-by='.status.phase' 2>/dev/null

echo ""
echo "Resource usage:"
kubectl top pods $FLAG 2>/dev/null || \
  echo "(metrics-server not available)"

Python Plugin — ns-cleanup Python Plugin

Python plugin to find and optionally delete completed/failed pods and evicted pods from a namespace.

#!/usr/bin/env python3
# File: kubectl-ns-cleanup  (chmod +x, place in PATH)
# Usage: kubectl ns-cleanup [namespace] [--delete]

import subprocess, sys, json

ns = sys.argv[1] if len(sys.argv) > 1 else "default"
do_delete = "--delete" in sys.argv

result = subprocess.run(
    ["kubectl", "get", "pods", "-n", ns,
     "-o", "json"],
    capture_output=True, text=True
)
pods = json.loads(result.stdout)["items"]

to_clean = []
for pod in pods:
    phase = pod["status"].get("phase", "")
    name  = pod["metadata"]["name"]
    # Find completed, failed, or evicted pods
    if phase in ("Succeeded", "Failed"):
        to_clean.append(name)
    reason = pod["status"].get("reason", "")
    if reason == "Evicted":
        to_clean.append(name)

print(f"Found {len(to_clean)} pods to clean in '{ns}':")
for p in to_clean:
    print(f"  - {p}")

if do_delete and to_clean:
    for p in to_clean:
        subprocess.run(
            ["kubectl", "delete", "pod", p, "-n", ns],
            check=True
        )
    print(f"\n✓ Deleted {len(to_clean)} pods")

Go Plugin — kubectl-whoami Go Plugin

Go-based plugin using client-go to show who you are authenticated as and your permissions summary.

// File: kubectl-whoami/main.go
// Build: go build -o kubectl-whoami && mv to PATH

package main

import (
    "context"
    "fmt"
    "os"

    authv1 "k8s.io/api/authorization/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/client-go/kubernetes"
    "k8s.io/client-go/tools/clientcmd"
)

func main() {
    kubeconfig := os.Getenv("KUBECONFIG")
    if kubeconfig == "" {
        kubeconfig = os.Getenv("HOME") + "/.kube/config"
    }

    config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
    if err != nil { panic(err) }

    client, err := kubernetes.NewForConfig(config)
    if err != nil { panic(err) }

    // SelfSubjectReview — who am I?
    review, err := client.AuthenticationV1().
        SelfSubjectReviews().
        Create(context.TODO(),
            &authv1.SelfSubjectReview{},
            metav1.CreateOptions{})
    if err != nil { panic(err) }

    fmt.Printf("Username : %s\n",
        review.Status.UserInfo.Username)
    fmt.Printf("Groups   : %v\n",
        review.Status.UserInfo.Groups)
    fmt.Printf("UID      : %s\n",
        review.Status.UserInfo.UID)
}

Plugin Manifest for Krew Krew Distribution

Krew plugin manifest (.yaml) to package and distribute your plugin on the Krew index for all platforms.

apiVersion: krew.googlecontainertools.github.com/v1alpha2
kind: Plugin
metadata:
  name: podfull
spec:
  version: v1.0.0
  homepage: https://github.com/myuser/kubectl-podfull
  shortDescription: Show detailed pod info with nodes and resources
  description: |
    kubectl-podfull displays comprehensive pod information
    including node placement, IPs, status, restart counts,
    and live resource usage in a single command.
  platforms:
  - selector:
      matchLabels:
        os: linux
        arch: amd64
    uri: https://github.com/myuser/kubectl-podfull/releases/download/v1.0.0/kubectl-podfull_linux_amd64.tar.gz
    sha256: abc123...
    bin: kubectl-podfull
  - selector:
      matchLabels:
        os: darwin
        arch: amd64
    uri: https://github.com/myuser/kubectl-podfull/releases/download/v1.0.0/kubectl-podfull_darwin_amd64.tar.gz
    sha256: def456...
    bin: kubectl-podfull
  - selector:
      matchLabels:
        os: windows
        arch: amd64
    uri: https://github.com/myuser/kubectl-podfull/releases/download/v1.0.0/kubectl-podfull_windows_amd64.zip
    sha256: ghi789...
    bin: kubectl-podfull.exe

Plugin Quick Reference

Plugin	Category	Key Command	What It Solves	Install
ctx / kubectx	NAVIGATION	`kubectl ctx prod`	Fast cluster/context switching	`krew install ctx`
ns / kubens	NAVIGATION	`kubectl ns staging`	Fast namespace switching	`krew install ns`
stern	LOGS	`stern web-app -n prod`	Multi-pod log tailing	`krew install stern`
neat	DEBUG	`kubectl get pod x -oyaml \| kubectl neat`	Clean up noisy kubectl YAML output	`krew install neat`
tree	DEBUG	`kubectl tree deploy app`	Visualize resource ownership tree	`krew install tree`
popeye	AUDIT	`kubectl popeye -n prod`	Cluster health & best-practice scan	`krew install popeye`
who-can	SECURITY	`kubectl who-can delete pods`	RBAC: who can do what	`krew install who-can`
access-matrix	SECURITY	`kubectl access-matrix --sa mysa`	Full RBAC permission grid	`krew install access-matrix`
view-secret	SECURITY	`kubectl view-secret my-secret`	Decode Secrets without manual base64	`krew install view-secret`
resource-capacity	RESOURCES	`kubectl resource-capacity --util`	CPU/memory requests, limits & usage	`krew install resource-capacity`
node-shell	DEBUG	`kubectl node-shell node1`	SSH-less shell on any node	`krew install node-shell`
images	RESOURCES	`kubectl images -n prod`	List all container images in cluster	`krew install images`
outdated	RESOURCES	`kubectl outdated`	Detect stale/outdated container images	`krew install outdated`
konfig	NAVIGATION	`kubectl konfig merge a.yaml b.yaml`	Merge & manage kubeconfig files	`krew install konfig`
np-viewer	NETWORKING	`kubectl np-viewer -n prod`	Visualize NetworkPolicy rules	`krew install np-viewer`
cert-manager	SECURITY	`kubectl cert-manager renew cert`	Manage cert-manager certificates	`krew install cert-manager`
pv-migrate	STORAGE	`kubectl pv-migrate migrate src dst`	Migrate PVC data between namespaces/clusters	`krew install pv-migrate`
ingress-nginx	NETWORKING	`kubectl ingress-nginx backends`	Debug NGINX Ingress configuration	`krew install ingress-nginx`

21 / CRD — Custom Resource Definitions

📖 Official Docs: Custom Resources CRD Tasks CRD Versioning CEL in Kubernetes Kubebuilder Book Operator SDK OLM

Custom Resource Definitions (CRD)

// CRDs let you extend the Kubernetes API with your own resource types. Once registered, your custom objects are stored in etcd, managed by kubectl, protected by RBAC, and can drive custom controllers — exactly like built-in resources.

// CRD lifecycle — from definition to running controller

📐

① Define CRD

Register schema with API server via kubectl apply

🌐

② New API Endpoint

API server exposes /apis/group/version/plural

📦

③ Create CR Instances

kubectl apply custom objects — stored in etcd

🔄

④ Controller Watches

Controller reconcile loop reacts to CR changes

✅

⑤ Desired State

Controller creates/updates/deletes child resources

CRD Core Concepts

📐

CustomResourceDefinition

Schema Registration

A CRD is itself a Kubernetes resource that tells the API server about a new type. It defines: the group (mycompany.io), versions (v1, v1alpha1), plural/singular/kind names, scope (Namespaced or Cluster), and an OpenAPI v3 JSON Schema for validation. Once applied, the new type is immediately usable.

apiextensions.k8s.io/v1OpenAPI v3etcd backed

📦

Custom Resource (CR)

Instances of Your Type

Once a CRD is registered, you create Custom Resources — instances of that type. They behave exactly like built-in resources: kubectl get, describe, delete, label, annotate all work. They are namespaced (or cluster-scoped), RBAC-protected, and watch-able by controllers. The spec is defined by you; status is updated by your controller.

kubectl get mytypeRBAC protectedWatchable

✅

Schema Validation

OpenAPI v3 Structural Schema

CRDs use OpenAPI v3 JSON Schema to validate Custom Resource fields at admission time. Supports: type, enum, pattern, minimum/maximum, required fields, default values, nullable, x-kubernetes-int-or-string. Validation is enforced server-side — invalid CRs are rejected by the API server before reaching etcd.

required fieldsenumdefaultsServer-side

📊

Status Subresource

Spec vs Status Separation

Enabling the status subresource creates a separate /status endpoint. Users update spec; controllers update status. This separation prevents users from accidentally overwriting controller-managed state. Controllers use client.Status().Update(). Printer columns can surface status fields in kubectl output.

spec writestatus writeSeparation

📋

Printer Columns

kubectl get Output

additionalPrinterColumns customize what kubectl get shows. Use JSONPath to extract fields from spec or status. Built-in: NAME, NAMESPACE, AGE. Add Phase, Replicas, Version, Ready. type: string, integer, boolean, date. priority: 0 = default view, >0 = wide only (-o wide).

JSONPathCustom columns-o wide

🔢

Versioning

API Evolution

CRDs support multiple versions simultaneously. One is the storage version. Conversion webhooks handle translating between versions. Use CEL validation rules (x-kubernetes-validations) for cross-field validation since v1.25+. Mark old versions deprecated before removal. Follow: v1alpha1 → v1beta1 → v1.

Conversion WebhookStorage VersionCEL Rules

⚡

CEL Validation Rules

Cross-Field Validation (v1.25+)

Common Expression Language rules allow complex cross-field validation inside the CRD schema — no webhook needed. Defined with x-kubernetes-validations. Access self (current object) and oldSelf (for update rules). Example: ensure spec.maxReplicas >= spec.minReplicas. Runs in the API server process — fast and reliable.

x-kubernetes-validationsNo webhookself / oldSelf

🔄

Scale Subresource

HPA Integration

Enabling scale subresource exposes a /scale endpoint. Allows kubectl scale to work on your CRD and lets HPA automatically scale it. Define specReplicasPath and statusReplicasPath to map to your fields. Optional labelSelectorPath for pod selection by HPA.

HPA compatiblekubectl scale/scale endpoint

🏗

Controller / Operator Pattern

Reconciliation Loop

A CRD without a controller is just stored data. Controllers bring CRDs to life: watch CR events, compare current vs desired state, act to reconcile. Built with kubebuilder or Operator SDK. The reconcile loop is the heart of every Kubernetes Operator — observe, compare, act, report.

kubebuildercontroller-runtimeReconcile loop

CRD Anatomy — Field by Field

Field	Location	Required	Description	Example
group	spec	YES	API group — use a domain you own. Reverse DNS style.	`mycompany.io`
versions[].name	spec.versions	YES	Version string. Follow: v1alpha1 → v1beta1 → v1	`v1, v1beta1`
versions[].served	spec.versions	YES	Whether this version is served by the API. False = deprecated.	`true / false`
versions[].storage	spec.versions	YES	Exactly ONE version must be the storage version.	`true (only one)`
openAPIV3Schema	spec.versions[].schema	YES	Full structural schema for validation. Required for all served versions.	type: object, properties: ...
scope	spec	YES	Namespaced (per ns) or Cluster (global like Nodes/PVs).	`Namespaced / Cluster`
names.kind	spec.names	YES	CamelCase singular name used in YAML kind: field.	`Database`
names.plural	spec.names	YES	Lowercase plural — used in URL path and kubectl get.	`databases`
names.shortNames	spec.names	NO	Short aliases for kubectl. Like po=pods, svc=services.	`["db", "dbs"]`
names.categories	spec.names	NO	Group into categories. kubectl get all uses "all" category.	`["all"]`
subresources.status	spec.versions[].subresources	NO	Enables separate /status endpoint. Best practice for all CRDs.	`status: {}`
subresources.scale	spec.versions[].subresources	NO	Enables HPA + kubectl scale support.	specReplicasPath, statusReplicasPath
additionalPrinterColumns	spec.versions[]	NO	Custom kubectl get columns via JSONPath.	name: Phase, jsonPath: .status.phase
x-kubernetes-validations	schema properties	NO	CEL rules for cross-field validation (v1.25+).	rule: self.max >= self.min
conversion.strategy	spec.conversion	NO	None (no conversion) or Webhook (call conversion webhook).	`Webhook`

CRD YAML Examples

Full CRD — WebApp Resource CRD Definition

Complete production CRD with schema validation, CEL rules, status + scale subresources, and printer columns.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: webapps.apps.mycompany.io  # plural.group
spec:
  group: apps.mycompany.io
  scope: Namespaced
  names:
    plural: webapps
    singular: webapp
    kind: WebApp
    shortNames: [wa]
    categories: [all, mycompany]
  versions:
  - name: v1
    served: true
    storage: true
    subresources:
      status: {}
      scale:
        specReplicasPath: .spec.replicas
        statusReplicasPath: .status.readyReplicas
        labelSelectorPath: .status.selector
    additionalPrinterColumns:
    - name: Replicas
      type: integer
      jsonPath: .spec.replicas
    - name: Ready
      type: integer
      jsonPath: .status.readyReplicas
    - name: Phase
      type: string
      jsonPath: .status.phase
    - name: Image
      type: string
      jsonPath: .spec.image
      priority: 1  # Only with -o wide
    - name: Age
      type: date
      jsonPath: .metadata.creationTimestamp
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required: ["image"]
            x-kubernetes-validations:
            - rule: self.maxReplicas >= self.minReplicas
              message: maxReplicas must be >= minReplicas
            - rule: self.replicas >= self.minReplicas && self.replicas <= self.maxReplicas
              message: replicas must be between min and max
            properties:
              image:
                type: string
              replicas:
                type: integer
                minimum: 0
                maximum: 100
                default: 1
              minReplicas:
                type: integer
                default: 1
              maxReplicas:
                type: integer
                default: 10
              port:
                type: integer
                minimum: 1
                maximum: 65535
                default: 8080
              env:
                type: array
                items:
                  type: object
                  required: ["name", "value"]
                  properties:
                    name:
                      type: string
                    value:
                      type: string
              ingress:
                type: object
                properties:
                  enabled:
                    type: boolean
                    default: false
                  host:
                    type: string
                  tlsEnabled:
                    type: boolean
                    default: true
          status:
            type: object
            properties:
              phase:
                type: string
                enum: [Pending, Running, Degraded, Failed]
              readyReplicas:
                type: integer
              selector:
                type: string
              conditions:
                type: array
                items:
                  type: object
                  properties:
                    type:
                      type: string
                    status:
                      type: string
                    reason:
                      type: string
                    message:
                      type: string
                    lastTransitionTime:
                      type: string
                      format: date-time

Custom Resource Instance + RBAC CR + RBAC

Create a WebApp CR instance and define RBAC for controllers (full access) and developers (no delete).

# Create a WebApp custom resource instance
apiVersion: apps.mycompany.io/v1
kind: WebApp
metadata:
  name: my-web-app
  namespace: production
  labels:
    team: frontend
spec:
  image: registry.mycompany.io/webapp:v2.1.0
  replicas: 3
  minReplicas: 2
  maxReplicas: 10
  port: 8080
  env:
  - name: LOG_LEVEL
    value: info
  ingress:
    enabled: true
    host: myapp.example.com
    tlsEnabled: true
---
# ClusterRole for the controller
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: webapp-controller
rules:
- apiGroups: ["apps.mycompany.io"]
  resources: ["webapps"]
  verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: ["apps.mycompany.io"]
  resources: ["webapps/status"]  # status subresource
  verbs: ["get", "update", "patch"]
- apiGroups: ["apps.mycompany.io"]
  resources: ["webapps/finalizers"]
  verbs: ["update"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "create", "update", "patch", "delete"]
---
# ClusterRole for developers — no delete
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: webapp-developer
rules:
- apiGroups: ["apps.mycompany.io"]
  resources: ["webapps"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]

Multi-Version CRD with Conversion Webhook Versioning

Two served versions (v1 and v1beta1) with a conversion webhook and deprecation warning on the old version.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: webapps.apps.mycompany.io
spec:
  group: apps.mycompany.io
  scope: Namespaced
  names:
    plural: webapps
    kind: WebApp
  conversion:
    strategy: Webhook
    webhook:
      conversionReviewVersions: ["v1"]
      clientConfig:
        service:
          name: webapp-conversion-webhook
          namespace: webapp-system
          path: /convert
          port: 443
        caBundle: LS0t...
  versions:
  - name: v1
    served: true
    storage: true   # Storage version
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              image:
                type: string
              containerPort:  # Renamed from v1beta1 "port"
                type: integer
  - name: v1beta1
    served: true
    storage: false  # Not storage version
    deprecated: true
    deprecationWarning: "v1beta1 deprecated, migrate to v1"
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              image:
                type: string
              port:             # Old field name
                type: integer

kubebuilder Controller — Reconciler Go Controller

Complete kubebuilder reconciler for the WebApp CRD — watches CR, creates Deployment, sets owner reference, updates status.

// controllers/webapp_controller.go
package controllers

import (
    "context"
    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/runtime"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    myv1 "mycompany.io/webapp-operator/api/v1"
)

type WebAppReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

// +kubebuilder:rbac:groups=apps.mycompany.io,resources=webapps,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=apps.mycompany.io,resources=webapps/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete

func (r *WebAppReconciler) Reconcile(
    ctx context.Context, req ctrl.Request,
) (ctrl.Result, error) {

    // 1. Fetch the WebApp CR
    webapp := &myv1.WebApp{}
    if err := r.Get(ctx, req.NamespacedName, webapp); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 2. Define the desired Deployment
    replicas := int32(webapp.Spec.Replicas)
    desired := &appsv1.Deployment{
        ObjectMeta: metav1.ObjectMeta{
            Name:      webapp.Name,
            Namespace: webapp.Namespace,
        },
        Spec: appsv1.DeploymentSpec{
            Replicas: &replicas,
            Selector: &metav1.LabelSelector{
                MatchLabels: map[string]string{"app": webapp.Name},
            },
            Template: corev1.PodTemplateSpec{
                ObjectMeta: metav1.ObjectMeta{
                    Labels: map[string]string{"app": webapp.Name},
                },
                Spec: corev1.PodSpec{
                    Containers: []corev1.Container{{
                        Name:  "webapp",
                        Image: webapp.Spec.Image,
                    }},
                },
            },
        },
    }
    // Set WebApp as owner — GC when CR deleted
    ctrl.SetControllerReference(webapp, desired, r.Scheme)

    // 3. Create or Update
    existing := &appsv1.Deployment{}
    if err := r.Get(ctx, req.NamespacedName, existing); err != nil {
        r.Create(ctx, desired)
    } else {
        existing.Spec = desired.Spec
        r.Update(ctx, existing)
    }

    // 4. Update status (use Status().Update, not Update)
    webapp.Status.Phase = "Running"
    r.Status().Update(ctx, webapp)

    return ctrl.Result{}, nil
}

func (r *WebAppReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&myv1.WebApp{}).
        Owns(&appsv1.Deployment{}).  // Watch child Deployments
        Complete(r)
}

CEL Validation Rules Validation (v1.25+)

Advanced CEL — cross-field constraints, immutability rules, and pattern validation without admission webhooks.

spec:
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            x-kubernetes-validations:
            - # Cross-field: max must be >= min
              rule: self.maxReplicas >= self.minReplicas
              message: maxReplicas must be >= minReplicas
            - # Immutability: minReplicas cannot decrease
              rule: oldSelf.minReplicas <= self.minReplicas
              message: minReplicas cannot be decreased
              fieldPath: .minReplicas
            - # CPU threshold must be reasonable
              rule: self.targetCPUPercent >= 10 && self.targetCPUPercent <= 95
              message: targetCPUPercent must be between 10 and 95
            properties:
              minReplicas:
                type: integer
                minimum: 1
              maxReplicas:
                type: integer
                maximum: 1000
              targetCPUPercent:
                type: integer
                default: 70
              scaleDownCooldown:
                type: string
                pattern: '^[0-9]+(s|m|h)$'
                default: "5m"
                x-kubernetes-validations:
                - rule: self.matches('^[0-9]+(s|m|h)$')
                  message: Must be like 30s, 5m, or 1h

Finalizer Pattern + kubectl CRD Commands Lifecycle & CLI

Finalizer for external cleanup and essential kubectl commands for CRD and CR management.

# Finalizer in a CR — prevents immediate deletion
apiVersion: apps.mycompany.io/v1
kind: WebApp
metadata:
  name: my-app
  finalizers:
  - webapps.apps.mycompany.io/finalizer
spec:
  image: myapp:latest
  replicas: 3

---
# kubectl CRD commands cheat sheet

# Discover
kubectl get crds
kubectl api-resources --api-group=apps.mycompany.io
kubectl explain webapp.spec               # field docs
kubectl explain webapp.spec.ingress

# Manage instances
kubectl get webapps -A -o wide
kubectl get wa                            # shortName
kubectl describe webapp my-web-app
kubectl get webapp my-web-app -o jsonpath='{.status.phase}'

# Patch spec
kubectl patch webapp my-web-app \
  --type=merge -p '{"spec":{"replicas":5}}'

# Patch status subresource (controller pattern)
kubectl patch webapp my-web-app \
  --subresource=status --type=merge \
  -p '{"status":{"phase":"Running"}}'

# Scale (if scale subresource enabled)
kubectl scale webapp my-web-app --replicas=5

# Cleanup — WARNING: deletes ALL CRs!
kubectl delete crd webapps.apps.mycompany.io

# kubebuilder bootstrap
kubebuilder init --domain mycompany.io
kubebuilder create api --group apps --version v1 --kind WebApp
make generate && make manifests && make install
make run

CRD Best Practices

Naming Convention

Use a Domain You Own

Always use a domain you control as the API group (mycompany.io). CRD name must be plural.group. Kind uses CamelCase. Version follows alpha→beta→stable. Avoid generic group names that may conflict with other operators in the cluster.

Enable Status Subresource

Separate Spec from Status

Always enable subresources.status: {} in every CRD. This prevents race conditions between users writing spec and controllers writing status. Controllers must use client.Status().Update(), not client.Update(), to write status fields only.

Strict Schema Validation

Validate at the Gate

Define a complete structural schema with required fields, types, and defaults. Use CEL rules for cross-field constraints. Avoid x-kubernetes-preserve-unknown-fields: true globally — it disables pruning and field validation, leading to garbage data in etcd.

Conditions Pattern

Standardize Status Reporting

Use the standard Kubernetes conditions pattern in status: type, status (True/False/Unknown), reason (CamelCase), message (human readable), lastTransitionTime. This matches built-in resources and integrates with tooling like kubectl wait --for=condition=Ready.

Versioning Strategy

Never Remove Fields

Never remove fields from a served version — it breaks existing clients. Add new optional fields with defaults. When restructuring, add a new version with a conversion webhook. Mark old versions deprecated before removing. Keep at least one previous version during migration windows.

Owner References

Automatic Child Cleanup

Always set owner references on resources created by your controller using ctrl.SetControllerReference(). When the CR is deleted, Kubernetes garbage collects all owned child resources automatically. Add finalizers only when you need to clean up external resources before deletion.

Advanced CRD Patterns

🔄

Conversion Webhooks

Multi-Version Translation

When a CRD has multiple versions, a Conversion Webhook translates objects between versions on the fly. The API server calls your HTTPS webhook with a ConversionReview request. You convert and return the new version. The storage version is always used in etcd — all other versions are converted on read. Required for zero-downtime API evolution across versions.

ConversionReviewstorage versionZero downtimeHTTPS webhook

🏗

Kubebuilder

Operator Scaffold Framework

The official Go framework for building Kubernetes operators. Uses controller-runtime under the hood. Generates: CRD manifests from Go struct tags, RBAC ClusterRole markers, Webhook boilerplate, Makefile targets (generate, manifests, install, deploy). Marker annotations (// +kubebuilder:) in Go code drive all generation automatically.

controller-runtimecode-gen markersScaffoldGo

🛠

Operator SDK

Multi-Language Operator Framework

Red Hat's operator framework. Supports Go (uses Kubebuilder internally), Ansible (reconcile logic in Ansible playbooks — no Go needed), and Helm (wrap Helm charts as Operators). operator-sdk scorecard for testing bundles. OLM (Operator Lifecycle Manager) for packaging and distribution on OperatorHub.io.

Go / Ansible / HelmOLMOperatorHub

🌡

Conditions Pattern

Standard Status Reporting

Use the Kubernetes Conditions pattern in all CRD status fields. Each condition has: Type (Ready, Available, Progressing), Status (True/False/Unknown), Reason (CamelCase machine-readable), Message (human-readable), LastTransitionTime. Enables kubectl wait --for=condition=Ready, standard tooling integration, and clear operational visibility.

kubectl waitTrue/False/Unknownmetav1.Condition

🗺

Server-Side Apply

Field Ownership Tracking

Server-Side Apply (SSA) tracks field ownership per manager. Controllers and users can co-own different fields of a CR without conflicts. Use Apply with field manager names. Enables: partial updates without reading the full object, merge conflict detection, and safe multi-actor management of CR fields across teams.

fieldManagerSSAPartial update

🔍

Informers and Watch Cache

Efficient Event Processing

Controllers use Informers (client-go cache layer) to efficiently watch CRs. Informer maintains a local cache and delivers Add/Update/Delete events to handlers. Work queue decouples event receipt from processing with rate limiting and exponential backoff on requeues. Never call the API server directly from reconcile — always use the local cache for reads.

Informer cacheWorkQueueRate limiting

📡

Owns and Watches

Cascade Reconciliation

Controllers can watch secondary resources and trigger reconciliation of the parent CR. builder.Owns(&appsv1.Deployment{}) watches Deployments and requeues the owning CR on changes. builder.Watches() with custom handler for more complex trigger logic. Critical for reactivity: when a child resource changes state, the parent CR reconciles to restore desired state.

Owns()Watches()Cascade trigger

🧪

envtest

Integration Testing for Controllers

controller-runtime's envtest runs a real API server and etcd binary for integration tests — no cluster needed. Register your CRDs, create objects, run the reconciler, assert the resulting state. Much more reliable than unit tests with mocks. Kubebuilder generates a suite_test.go with envtest setup included. Run with: go test ./... in your operator project.

Real API serverNo cluster neededIntegration test

Advanced CRD YAML Examples

Multi-Version CRD with Conversion Webhook Versioning

CRD with v1alpha1 (deprecated, still served) and v1 (storage version), connected to a conversion webhook for seamless translation between versions.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.storage.mycompany.io
spec:
  group: storage.mycompany.io
  scope: Namespaced
  names:
    plural: databases
    singular: database
    kind: Database
    shortNames: [db]
  conversion:
    strategy: Webhook
    webhook:
      conversionReviewVersions: ["v1"]
      clientConfig:
        service:
          name: database-operator-webhook
          namespace: operators
          path: /convert
  versions:
  - name: v1alpha1
    served: true   # still served but deprecated
    storage: false
    deprecated: true
    deprecationWarning: "v1alpha1 deprecated, migrate to v1"
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              dbType:        # old field name
                type: string
  - name: v1
    served: true
    storage: true  # current storage version
    subresources:
      status: {}
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required: ["engine"]
            properties:
              engine:        # renamed from dbType
                type: string
                enum: ["postgres", "mysql", "redis"]
              version:
                type: string
              replicas:
                type: integer
                default: 1
                minimum: 1

CRD with CEL Validation Rules CEL / v1.25+

Advanced cross-field validation using Common Expression Language — immutability enforcement, range checks, and format rules without any webhook.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: clusters.infra.mycompany.io
spec:
  group: infra.mycompany.io
  scope: Namespaced
  names:
    plural: clusters
    kind: Cluster
  versions:
  - name: v1
    served: true
    storage: true
    subresources:
      status: {}
    schema:
      openAPIV3Schema:
        type: object
        x-kubernetes-validations:
        - rule: "self.spec.maxNodes >= self.spec.minNodes"
          message: "maxNodes must be >= minNodes"
        - rule: "!(self.spec.highAvailability && self.spec.minNodes < 3)"
          message: "HA clusters require at least 3 nodes"
        properties:
          spec:
            type: object
            required: ["region", "minNodes", "maxNodes"]
            x-kubernetes-validations:
            - rule: "self.region == oldSelf.region"
              message: "region is immutable after creation"
            properties:
              region:
                type: string
                x-kubernetes-validations:
                - rule: "self.matches('^[a-z]+-[a-z]+-[0-9]+$')"
                  message: "region must match pattern like us-east-1"
              minNodes:
                type: integer
                minimum: 1
              maxNodes:
                type: integer
                maximum: 1000
              highAvailability:
                type: boolean
                default: false
              nodeType:
                type: string
                enum: ["standard", "memory", "compute"]
                default: standard

Kubebuilder Go Types and Markers kubebuilder / Go

Go struct with Kubebuilder marker annotations that auto-generate CRD YAML, RBAC ClusterRoles, and webhook configuration at build time.

// types.go — markers drive CRD YAML generation

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:subresource:scale:specpath=.spec.replicas,statuspath=.status.readyReplicas
// +kubebuilder:resource:scope=Namespaced,shortName=wa,categories=all
// +kubebuilder:printcolumn:name="Replicas",type=integer,JSONPath=`.spec.replicas`
// +kubebuilder:printcolumn:name="Ready",type=integer,JSONPath=`.status.readyReplicas`
// +kubebuilder:printcolumn:name="Phase",type=string,JSONPath=`.status.phase`
// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp`
type WebApp struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`
    Spec   WebAppSpec   `json:"spec,omitempty"`
    Status WebAppStatus `json:"status,omitempty"`
}

type WebAppSpec struct {
    // +kubebuilder:validation:Required
    // +kubebuilder:validation:MinLength=1
    Image string `json:"image"`

    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:validation:Maximum=50
    // +kubebuilder:default=1
    Replicas int32 `json:"replicas,omitempty"`

    // +kubebuilder:validation:Enum=RollingUpdate;Recreate
    // +kubebuilder:default=RollingUpdate
    Strategy string `json:"strategy,omitempty"`
}

type WebAppStatus struct {
    ReadyReplicas int32  `json:"readyReplicas,omitempty"`
    Phase         string `json:"phase,omitempty"`
    Conditions []metav1.Condition `json:"conditions,omitempty"`
}

// Reconciler with RBAC markers — make manifests generates ClusterRole
// +kubebuilder:rbac:groups=apps.mycompany.io,resources=webapps,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=apps.mycompany.io,resources=webapps/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups="",resources=services,verbs=get;list;watch;create;update;patch;delete

func (r *WebAppReconciler) Reconcile(ctx context.Context,
    req reconcile.Request) (reconcile.Result, error) {
    var webapp appsv1.WebApp
    if err := r.Get(ctx, req.NamespacedName, &webapp); err != nil {
        return reconcile.Result{}, client.IgnoreNotFound(err)
    }
    // reconcile logic here...
    // Write status via status subresource (not regular Update)
    webapp.Status.Phase = "Running"
    if err := r.Status().Update(ctx, &webapp); err != nil {
        return reconcile.Result{}, err
    }
    return reconcile.Result{RequeueAfter: time.Minute * 5}, nil
}

CRD Status with Conditions Pattern Status Pattern

Full CRD schema with standard Kubernetes conditions in status — enables kubectl wait and integration with monitoring tooling.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: pipelines.ci.mycompany.io
spec:
  group: ci.mycompany.io
  scope: Namespaced
  names:
    plural: pipelines
    kind: Pipeline
    shortNames: [pl]
  versions:
  - name: v1
    served: true
    storage: true
    subresources:
      status: {}
    additionalPrinterColumns:
    - name: Phase
      type: string
      jsonPath: .status.phase
    - name: Ready
      type: string
      jsonPath: .status.conditions[?(@.type=="Ready")].status
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required: ["repository"]
            properties:
              repository:
                type: string
              branch:
                type: string
                default: main
          status:
            type: object
            properties:
              phase:
                type: string
                enum: ["Pending","Running","Succeeded","Failed"]
              conditions:
                type: array
                items:
                  type: object
                  required: ["type", "status"]
                  properties:
                    type:
                      type: string
                    status:
                      type: string
                      enum: ["True","False","Unknown"]
                    reason:
                      type: string
                    message:
                      type: string
                    lastTransitionTime:
                      type: string
                      format: date-time
---
# Wait for condition to become True
kubectl wait pipeline my-pipeline   --for=condition=Ready=True   --timeout=120s

Cluster-Scoped CRD with RBAC Cluster-Scoped

Cluster-scoped resource (like Nodes, PVs) with controller ClusterRole and a viewer ClusterRole for team members.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: cloudproviders.infra.mycompany.io
spec:
  group: infra.mycompany.io
  scope: Cluster  # NOT Namespaced
  names:
    plural: cloudproviders
    kind: CloudProvider
    shortNames: [cp]
  versions:
  - name: v1
    served: true
    storage: true
    subresources:
      status: {}
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required: ["provider", "region"]
            properties:
              provider:
                type: string
                enum: ["aws", "gcp", "azure"]
              region:
                type: string
---
# Controller ClusterRole (cluster-scoped needs ClusterRole)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cloudprovider-controller
rules:
- apiGroups: ["infra.mycompany.io"]
  resources: ["cloudproviders"]
  verbs: ["get","list","watch","update","patch"]
- apiGroups: ["infra.mycompany.io"]
  resources: ["cloudproviders/status"]
  verbs: ["get","update","patch"]
- apiGroups: ["infra.mycompany.io"]
  resources: ["cloudproviders/finalizers"]
  verbs: ["update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cloudprovider-viewer
rules:
- apiGroups: ["infra.mycompany.io"]
  resources: ["cloudproviders"]
  verbs: ["get","list","watch"]

Kubebuilder Full Dev Workflow Operator Development

Complete Kubebuilder workflow from project init to cluster deployment — including code generation, testing, image build, and OLM bundle packaging.

# ── PROJECT INIT ─────────────────────────────────────────────
mkdir webapp-operator && cd webapp-operator
kubebuilder init   --domain mycompany.io   --repo github.com/mycompany/webapp-operator

# ── CREATE API (generates types.go + controller skeleton) ────
kubebuilder create api   --group apps --version v1 --kind WebApp   --resource --controller

# ── CREATE WEBHOOK ───────────────────────────────────────────
kubebuilder create webhook   --group apps --version v1 --kind WebApp   --defaulting \           # MutatingWebhook
  --programmatic-validation  # ValidatingWebhook

# ── GENERATE ─────────────────────────────────────────────────
make generate    # generate DeepCopyObject methods
make manifests   # generate CRD YAML + RBAC from markers

# ── INSTALL CRD into cluster ─────────────────────────────────
make install     # kubectl apply -f config/crd/bases/

# ── RUN LOCALLY (out-of-cluster for development) ─────────────
make run         # runs controller process locally

# ── INTEGRATION TESTS with envtest ───────────────────────────
make test        # downloads envtest binaries, runs tests
go test ./... -v -run TestReconcile

# ── BUILD AND PUSH IMAGE ─────────────────────────────────────
make docker-build docker-push IMG=registry.io/webapp-op:v1.0.0

# ── DEPLOY TO CLUSTER ────────────────────────────────────────
make deploy IMG=registry.io/webapp-op:v1.0.0

# ── OLM BUNDLE for OperatorHub distribution ──────────────────
make bundle IMG=registry.io/webapp-op:v1.0.0
make bundle-build BUNDLE_IMG=registry.io/webapp-op-bundle:v1.0.0
operator-sdk scorecard bundle/  # validate bundle quality

# ── USEFUL RUNTIME COMMANDS ──────────────────────────────────
kubectl get crds
kubectl explain webapp.spec --api-version=apps.mycompany.io/v1
kubectl get webapps -A -o wide
kubectl scale webapp my-app --replicas=5
kubectl wait webapp my-app --for=condition=Ready --timeout=60s
kubectl describe crd webapps.apps.mycompany.io

Extension Mechanism Comparison

// When to use CRDs vs API Aggregation vs other patterns — pick the right extension point.

Mechanism	Storage	Validation	Custom Logic	kubectl Support	Best For
CRD	etcd (via API server)	OpenAPI v3 + CEL	External controller	FULL	90% of cases — domain objects, Operators, config
API Aggregation (AA)	Own backend (any store)	Custom — full control	Built-in to server	FULL	Custom storage, non-standard REST (metrics-server)
Built-in Resource (upstream)	etcd	Hardcoded	Core controllers	FULL	Contributing new features to Kubernetes itself
ConfigMap (workaround)	etcd	None	App reads directly	LIMITED	Simple config only — avoid for structured domain data
Annotations / Labels	etcd (on existing obj)	None	Controller reads	LIMITED	Small metadata additions on existing resources

22 / Component Commands

📖 Official Docs: kubectl Reference Cheat Sheet

kubectl Commands by Component

// Every essential kubectl command organized by Kubernetes component — pods, deployments, services, configmaps, secrets, nodes, namespaces, and more.

📦 Pods CORE

The smallest deployable unit in Kubernetes. A Pod runs one or more containers sharing network and storage.

# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get pods                               # list pods in current ns
kubectl get pods -A                            # all namespaces
kubectl get pods -o wide                       # show node, IP
kubectl get pods -l app=nginx                  # filter by label
kubectl get pods --field-selector=status.phase=Running
kubectl get pod my-pod -o yaml                 # full YAML spec
kubectl get pod my-pod -o jsonpath='{.status.podIP}'
kubectl describe pod my-pod                    # detailed info + events

# ── CREATE & DELETE ───────────────────────────────────────────
kubectl run nginx --image=nginx                # quick pod creation
kubectl run nginx --image=nginx --dry-run=client -o yaml  # generate YAML
kubectl run tmp --image=busybox --rm -it -- sh # temp interactive pod
kubectl delete pod my-pod                      # delete a pod
kubectl delete pod my-pod --grace-period=0 --force  # force delete
kubectl delete pods -l app=old-app             # delete by label

# ── LOGS & DEBUG ──────────────────────────────────────────────
kubectl logs my-pod                            # view logs
kubectl logs my-pod -c sidecar                 # specific container
kubectl logs my-pod -f                         # follow / stream
kubectl logs my-pod --previous                 # previous crashed container
kubectl logs my-pod --since=1h                 # last hour
kubectl logs my-pod --tail=100                 # last 100 lines
kubectl logs -l app=nginx --all-containers     # logs by label

# ── EXEC & INTERACT ───────────────────────────────────────────
kubectl exec my-pod -- ls /app                 # run command
kubectl exec -it my-pod -- /bin/sh             # interactive shell
kubectl exec -it my-pod -c sidecar -- bash     # specific container
kubectl cp my-pod:/var/log/app.log ./app.log   # copy from pod
kubectl cp ./config.yaml my-pod:/etc/config/   # copy to pod
kubectl port-forward my-pod 8080:80            # forward local port
kubectl debug my-pod -it --image=busybox       # ephemeral debug container
kubectl top pod my-pod --containers            # resource usage

🚀 Deployments WORKLOAD

Manages ReplicaSets and provides declarative updates, rolling deployments, and rollbacks for stateless applications.

# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get deployments                        # list deployments
kubectl get deploy -A                          # all namespaces
kubectl get deploy my-app -o yaml              # full spec
kubectl describe deploy my-app                 # detail + events

# ── CREATE & UPDATE ───────────────────────────────────────────
kubectl create deploy my-app --image=nginx     # imperative create
kubectl create deploy my-app --image=nginx --replicas=3 --dry-run=client -o yaml
kubectl apply -f deployment.yaml               # declarative apply
kubectl set image deploy/my-app app=nginx:1.25 # update image
kubectl scale deploy my-app --replicas=5       # scale up/down
kubectl autoscale deploy my-app --min=2 --max=10 --cpu-percent=80
kubectl patch deploy my-app -p '{"spec":{"replicas":3}}'

# ── ROLLOUTS ──────────────────────────────────────────────────
kubectl rollout status deploy/my-app           # watch rollout
kubectl rollout history deploy/my-app          # revision history
kubectl rollout history deploy/my-app --revision=3  # specific revision
kubectl rollout undo deploy/my-app             # rollback to previous
kubectl rollout undo deploy/my-app --to-revision=2  # rollback to specific
kubectl rollout restart deploy/my-app          # rolling restart
kubectl rollout pause deploy/my-app            # pause rollout
kubectl rollout resume deploy/my-app           # resume rollout

# ── DELETE ────────────────────────────────────────────────────
kubectl delete deploy my-app                   # delete deployment
kubectl delete -f deployment.yaml              # delete from file

🌐 Services NETWORKING

Stable network endpoint for accessing a set of Pods. Types: ClusterIP, NodePort, LoadBalancer, ExternalName.

# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get services                           # list services
kubectl get svc -A                             # all namespaces
kubectl get svc my-svc -o yaml                 # full spec
kubectl describe svc my-svc                    # detail + endpoints
kubectl get endpoints my-svc                   # backing pod IPs

# ── CREATE ────────────────────────────────────────────────────
kubectl expose deploy my-app --port=80 --target-port=8080  # ClusterIP
kubectl expose deploy my-app --port=80 --type=NodePort
kubectl expose deploy my-app --port=80 --type=LoadBalancer
kubectl create svc clusterip my-svc --tcp=80:8080 --dry-run=client -o yaml
kubectl create svc nodeport my-svc --tcp=80:8080 --node-port=30080

# ── ACCESS & DEBUG ────────────────────────────────────────────
kubectl port-forward svc/my-svc 8080:80        # local access
kubectl run curl --image=curlimages/curl --rm -it -- curl my-svc:80  # test from cluster

# ── DELETE ────────────────────────────────────────────────────
kubectl delete svc my-svc

📝 ConfigMaps CONFIG

Store non-confidential configuration data as key-value pairs. Consumed as env vars, command args, or mounted config files.

# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get configmaps                         # list configmaps
kubectl get cm -A                              # all namespaces
kubectl get cm my-config -o yaml               # view data
kubectl describe cm my-config

# ── CREATE ────────────────────────────────────────────────────
kubectl create configmap my-config --from-literal=key1=val1 --from-literal=key2=val2
kubectl create cm my-config --from-file=config.properties
kubectl create cm my-config --from-file=app-config=./config.yaml
kubectl create cm my-config --from-env-file=.env
kubectl create cm my-config --from-literal=key=val --dry-run=client -o yaml

# ── UPDATE & DELETE ────────────────────────────────────────────
kubectl edit cm my-config                      # edit in $EDITOR
kubectl patch cm my-config -p '{"data":{"key1":"newval"}}'
kubectl delete cm my-config

🔐 Secrets CONFIG

Store sensitive data (passwords, tokens, keys). Base64-encoded by default. Use with RBAC and encryption at rest for security.

# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get secrets                            # list secrets
kubectl get secret my-secret -o yaml           # view (base64 encoded)
kubectl get secret my-secret -o jsonpath='{.data.password}' | base64 -d  # decode
kubectl describe secret my-secret              # metadata only

# ── CREATE ────────────────────────────────────────────────────
kubectl create secret generic my-secret --from-literal=user=admin --from-literal=pass=s3cret
kubectl create secret generic my-secret --from-file=ssh-key=~/.ssh/id_rsa
kubectl create secret docker-registry regcred \
  --docker-server=registry.io --docker-username=user \
  --docker-password=pass --docker-email=user@example.com
kubectl create secret tls my-tls --cert=tls.crt --key=tls.key
kubectl create secret generic my-secret --from-literal=key=val --dry-run=client -o yaml

# ── UPDATE & DELETE ────────────────────────────────────────────
kubectl edit secret my-secret
kubectl delete secret my-secret

📁 Namespaces CORE

Virtual clusters within a physical cluster. Provide scope for names, resource quotas, and access control boundaries.

# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get namespaces                         # list all namespaces
kubectl get ns                                 # shorthand
kubectl describe ns my-namespace

# ── CREATE & SWITCH ───────────────────────────────────────────
kubectl create namespace staging
kubectl create ns staging --dry-run=client -o yaml
kubectl config set-context --current --namespace=staging  # set default

# ── DELETE ────────────────────────────────────────────────────
kubectl delete ns staging                      # deletes ALL resources in ns

🖥️ Nodes CLUSTER

Worker machines (physical or virtual) that run Pods. Managed by the control plane via kubelet.

# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get nodes                              # list nodes
kubectl get nodes -o wide                      # IPs, OS, kernel, runtime
kubectl describe node node-1                   # capacity, conditions, pods
kubectl top nodes                              # CPU & memory usage
kubectl get node node-1 -o jsonpath='{.status.allocatable}'

# ── LABELS & TAINTS ───────────────────────────────────────────
kubectl label node node-1 disktype=ssd         # add label
kubectl label node node-1 disktype-             # remove label
kubectl taint nodes node-1 dedicated=gpu:NoSchedule
kubectl taint nodes node-1 dedicated=gpu:NoSchedule-  # remove taint

# ── MAINTENANCE ───────────────────────────────────────────────
kubectl cordon node-1                          # mark unschedulable
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
kubectl uncordon node-1                        # re-enable scheduling

🔁 ReplicaSets WORKLOAD

Ensures a specified number of pod replicas are running. Usually managed by Deployments — rarely created directly.

# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get replicasets                        # list replicasets
kubectl get rs -A                              # all namespaces
kubectl get rs my-rs -o yaml                   # full spec
kubectl describe rs my-rs                      # detail + events

# ── SCALE & DELETE ────────────────────────────────────────────
kubectl scale rs my-rs --replicas=5            # scale (prefer deploy)
kubectl delete rs my-rs                        # delete replicaset
kubectl delete rs my-rs --cascade=orphan       # keep pods running

💾 StatefulSets WORKLOAD

Manages stateful applications with stable network identities, persistent storage, and ordered deployment/scaling.

# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get statefulsets                       # list statefulsets
kubectl get sts -A                             # all namespaces
kubectl get sts my-db -o yaml                  # full spec
kubectl describe sts my-db

# ── SCALE & ROLLOUT ───────────────────────────────────────────
kubectl scale sts my-db --replicas=5
kubectl rollout status sts/my-db
kubectl rollout history sts/my-db
kubectl rollout undo sts/my-db
kubectl rollout restart sts/my-db
kubectl patch sts my-db -p '{"spec":{"replicas":3}}'

# ── DELETE ────────────────────────────────────────────────────
kubectl delete sts my-db                       # deletes pods too
kubectl delete sts my-db --cascade=orphan      # keep pods

🔧 DaemonSets WORKLOAD

Ensures a copy of a Pod runs on all (or selected) nodes. Used for log collectors, monitoring agents, and network plugins.

# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get daemonsets                         # list daemonsets
kubectl get ds -A                              # all namespaces
kubectl get ds my-agent -o yaml
kubectl describe ds my-agent

# ── ROLLOUT ───────────────────────────────────────────────────
kubectl rollout status ds/my-agent
kubectl rollout history ds/my-agent
kubectl rollout undo ds/my-agent
kubectl rollout restart ds/my-agent

# ── DELETE ────────────────────────────────────────────────────
kubectl delete ds my-agent

⏱️ Jobs & CronJobs WORKLOAD

Jobs run tasks to completion. CronJobs schedule Jobs on a recurring cron-based schedule.

# ── JOBS ──────────────────────────────────────────────────────
kubectl get jobs                               # list jobs
kubectl get job my-job -o yaml
kubectl describe job my-job
kubectl create job my-job --image=busybox -- echo "hello"
kubectl create job my-job --from=cronjob/my-cron  # manual trigger
kubectl logs job/my-job                        # view job logs
kubectl delete job my-job

# ── CRONJOBS ──────────────────────────────────────────────────
kubectl get cronjobs                           # list cronjobs
kubectl get cj -A
kubectl get cj my-cron -o yaml
kubectl describe cj my-cron
kubectl create cronjob my-cron --image=busybox --schedule="*/5 * * * *" -- echo "tick"
kubectl patch cj my-cron -p '{"spec":{"suspend":true}}'   # suspend
kubectl patch cj my-cron -p '{"spec":{"suspend":false}}'  # resume
kubectl delete cj my-cron

🔀 Ingress NETWORKING

HTTP/HTTPS routing rules that expose Services externally. Requires an Ingress Controller (nginx, traefik, etc.).

# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get ingress                            # list ingress resources
kubectl get ing -A
kubectl get ing my-ingress -o yaml
kubectl describe ing my-ingress

# ── CREATE & DELETE ───────────────────────────────────────────
kubectl create ingress my-ingress \
  --rule="myapp.example.com/=my-svc:80" \
  --annotation nginx.ingress.kubernetes.io/rewrite-target=/
kubectl create ingress my-ingress \
  --rule="myapp.example.com/*=my-svc:80,tls=my-tls-secret"
kubectl delete ing my-ingress

💿 PersistentVolumes & Claims STORAGE

PVs are cluster-level storage resources. PVCs are user requests for storage that bind to PVs. StorageClasses enable dynamic provisioning.

# ── PERSISTENT VOLUMES ────────────────────────────────────────
kubectl get pv                                 # list persistent volumes
kubectl get pv my-pv -o yaml
kubectl describe pv my-pv

# ── PERSISTENT VOLUME CLAIMS ──────────────────────────────────
kubectl get pvc                                # list claims
kubectl get pvc -A
kubectl get pvc my-claim -o yaml
kubectl describe pvc my-claim
kubectl delete pvc my-claim

# ── STORAGE CLASSES ───────────────────────────────────────────
kubectl get storageclass                       # list storage classes
kubectl get sc
kubectl describe sc standard

🛡️ RBAC SECURITY

Role-Based Access Control — Roles, ClusterRoles, RoleBindings, ClusterRoleBindings, and ServiceAccounts.

# ── ROLES & CLUSTERROLES ──────────────────────────────────────
kubectl get roles -A                           # list roles
kubectl get clusterroles                       # list cluster roles
kubectl describe role my-role -n my-ns
kubectl describe clusterrole admin
kubectl create role pod-reader --verb=get,list,watch --resource=pods
kubectl create clusterrole node-reader --verb=get,list --resource=nodes

# ── BINDINGS ──────────────────────────────────────────────────
kubectl get rolebindings -A
kubectl get clusterrolebindings
kubectl create rolebinding my-rb --role=pod-reader --user=jane -n my-ns
kubectl create clusterrolebinding my-crb --clusterrole=node-reader --user=jane

# ── SERVICE ACCOUNTS ──────────────────────────────────────────
kubectl get serviceaccounts                    # list service accounts
kubectl get sa -A
kubectl create sa my-sa
kubectl describe sa my-sa
kubectl create token my-sa                     # generate token (v1.24+)

# ── AUTH CHECK ────────────────────────────────────────────────
kubectl auth can-i create pods                 # check own permissions
kubectl auth can-i get pods --as=jane          # impersonate user
kubectl auth can-i '*' '*' --as=system:serviceaccount:default:my-sa
kubectl auth whoami                            # current identity (v1.27+)

🔒 NetworkPolicies NETWORKING

Control traffic flow between Pods at the IP/port level. Requires a CNI plugin that supports NetworkPolicy (Calico, Cilium, etc.).

# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get networkpolicies                    # list network policies
kubectl get netpol -A
kubectl get netpol my-policy -o yaml
kubectl describe netpol my-policy

# ── DELETE ────────────────────────────────────────────────────
kubectl delete netpol my-policy

📊 ResourceQuotas & LimitRanges ADMIN

ResourceQuotas limit total resource consumption per namespace. LimitRanges set default/min/max constraints per Pod or Container.

# ── RESOURCE QUOTAS ───────────────────────────────────────────
kubectl get resourcequotas                     # list quotas
kubectl get quota -A
kubectl describe quota my-quota
kubectl create quota my-quota --hard=pods=10,requests.cpu=4,requests.memory=8Gi

# ── LIMIT RANGES ──────────────────────────────────────────────
kubectl get limitranges
kubectl get limits -A
kubectl describe limits my-limits

📈 HPA & Autoscaling SCALING

HorizontalPodAutoscaler automatically scales workloads based on CPU, memory, or custom metrics.

# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get hpa                                # list autoscalers
kubectl get hpa -A
kubectl describe hpa my-hpa

# ── CREATE & MANAGE ───────────────────────────────────────────
kubectl autoscale deploy my-app --min=2 --max=10 --cpu-percent=80
kubectl patch hpa my-hpa -p '{"spec":{"maxReplicas":20}}'
kubectl delete hpa my-hpa