v1.35 · Complete Reference

KubernetesEncyclopedia

// container orchestration at scale — every component, every concept, every YAML

150+Topics
50+YAML Examples
18Core Areas

01 / Overview
📖 Official Docs: Kubernetes Overview Cluster Architecture Components Kubernetes API

Cluster Architecture

// A Kubernetes cluster consists of a control plane and one or more worker nodes. The control plane manages the cluster state; worker nodes run containerized workloads.

// kubernetes cluster topology
⚙ Control Plane
kube-apiserver
etcd
kube-scheduler
kube-controller-manager
cloud-controller-manager
CoreDNS
⬡ Worker Node(s)
kubelet
kube-proxy
Container Runtime (CRI)
Pods
CNI Plugin
CSI Driver
// request lifecycle
👤
kubectl / Client
🔐
API Server Auth
💾
etcd Write
📋
Scheduler
🖥
Node / kubelet
📦
Container Running

02 / Core Components
📖 Official Docs: Cluster Components kube-apiserver etcd kube-scheduler kube-controller-manager kubelet kube-proxy

Control Plane Components

// The components that form the cluster's "brain" — responsible for global decisions and responding to cluster events.

🌐
kube-apiserver
API Server
The front-end to the Kubernetes control plane. Exposes the Kubernetes API over HTTPS. All communication goes through it — kubectl, controllers, kubelets. Horizontally scalable. Validates and processes REST requests, then updates etcd.
REST API Authentication Authorization Admission Control
🗄️
etcd
Distributed Key-Value Store
Consistent, highly-available key-value store used as the backing store for all cluster data. Stores the entire cluster state. Uses Raft consensus algorithm. Only the API server communicates with etcd directly. Must be backed up regularly.
Raft Consensus Cluster State HA
📋
kube-scheduler
Scheduler
Watches for newly created Pods with no assigned node. Selects a node for them to run on. Factors in: resource requirements, hardware/software constraints, affinity/anti-affinity, data locality, inter-workload interference, and deadlines.
Node Selection Affinity Taints & Tolerations
🔄
kube-controller-manager
Controller Manager
Runs controller processes. Logically each controller is a separate process, but compiled into a single binary. Includes: Node controller, Replication controller, Endpoints controller, Service Account & Token controllers, Job controller.
Reconciliation Control Loop Multiple Controllers
☁️
cloud-controller-manager
Cloud Controller Manager
Embeds cloud-specific control logic. Lets you link your cluster into your cloud provider's API. Only runs controllers specific to your cloud: Node, Route, and Service controllers. Separates cloud-specific code from core Kubernetes code.
AWS / GCP / Azure Load Balancers

Node Components

// Run on every node, maintaining running pods and providing the Kubernetes runtime environment.

🤖
kubelet
Node Agent
An agent that runs on each node. Ensures containers in a Pod are running and healthy. Takes a set of PodSpecs (provided through API server) and ensures the described containers are running. Does NOT manage containers not created by Kubernetes.
PodSpec Health Checks CRI Interface
🔗
kube-proxy
Network Proxy
A network proxy that runs on each node, implementing part of the Kubernetes Service concept. Maintains network rules that allow communication to Pods from inside or outside the cluster. Uses OS packet filtering layer (iptables/ipvs) if available.
iptables IPVS Service Proxy
📦
Container Runtime
CRI (Container Runtime Interface)
Software responsible for running containers. Kubernetes supports any runtime implementing the CRI. Options: containerd (default), CRI-O, Docker (via cri-dockerd). Handles image pulling, container lifecycle, and resource isolation.
containerd CRI-O OCI Standard

Addons

🌍
CoreDNS
Cluster DNS
A flexible, extensible DNS server that can serve as cluster DNS. Automatically assigns DNS names to Services and Pods. Every container inherits the cluster DNS server. Supports stub zones, forwarding, and plugin architecture.
Service DiscoveryDNS Resolution
📊
Metrics Server
Resource Metrics
Scalable, efficient source of container resource metrics (CPU, memory). Used by HPA and VPA for autoscaling decisions. Provides the metrics API. Not for monitoring or alerting purposes — use Prometheus for that.
HPAVPAAutoscaling
🖥
Dashboard
Web UI
General-purpose web-based UI for Kubernetes clusters. Manage applications running in the cluster, troubleshoot them, and manage the cluster itself. Deploy containerized applications and get an overview of cluster resources.
Web UIMonitoring

03 / Workloads & Deployments
📖 Official Docs: Workloads Pods Deployments StatefulSets DaemonSets Jobs CronJobs Init Containers Pod Lifecycle

Workload Resources

// Kubernetes offers several built-in workload resources for different application patterns and deployment strategies.

Resource Type Use Case Key Features Scaling
Pod ATOMIC Smallest deployable unit. One or more containers sharing network and storage. Shared IP, shared volumes, sidecar pattern, init containers Manual
Deployment STATELESS Web servers, APIs, microservices. Rolling updates and rollbacks. ReplicaSet management, rolling update, rollback history HPA / Manual
StatefulSet STATEFUL Databases (MySQL, PostgreSQL), Kafka, Zookeeper. Needs stable identity. Stable network identity, ordered rollout, persistent volumes per pod Manual
DaemonSet DAEMON Node-level agents: log collectors, monitoring agents, network plugins. One pod per node, runs on all/selected nodes, node affinity Node count
ReplicaSet STATELESS Maintain a stable set of replica Pods. Usually managed by Deployments. Pod template, replica count, label selectors HPA / Manual
Job BATCH One-time batch processing tasks. Database migrations, report generation. Completion guarantee, parallelism, retry on failure Fixed
CronJob SCHEDULED Recurring tasks. Backups, cleanup jobs, scheduled reports. Cron schedule syntax, concurrency policy, history limits Schedule-based

Pod Internals

🔬
Init Containers
Initialization Logic
Specialized containers that run and complete before app containers start. Used for setup tasks: waiting for dependencies, loading config, running migrations. Each init container must complete successfully before the next starts.
SequentialSetup Tasks
🚦
Probes
Health Checks
Liveness: Restarts container if it fails. Readiness: Removes from Service endpoints if not ready. Startup: Disables other probes until app starts. Types: HTTP, TCP, exec command, gRPC.
HTTPTCPExecgRPC
📏
Resources
Requests & Limits
Requests: Minimum guaranteed resources (used for scheduling). Limits: Maximum allowed usage. CPU is compressible (throttled), Memory is not (OOMKilled). QoS classes: Guaranteed, Burstable, BestEffort.
CPUMemoryQoS
🏷
Labels & Selectors
Object Metadata
Labels are key/value pairs attached to objects. Selectors filter objects by labels. Services use selectors to route traffic. Deployments use selectors to manage ReplicaSets. Annotations store non-identifying metadata (build info, contact info, etc.).
FilteringGroupingAnnotations
🧲
Affinity & Anti-Affinity
Pod Placement Rules
Node Affinity: schedule pods on specific nodes by labels. Pod Affinity: co-locate pods on same node. Pod Anti-Affinity: spread pods across nodes for HA. Required (hard) vs Preferred (soft) rules. Topology spread constraints for even distribution.
Node AffinityPod Affinity
🚫
Taints & Tolerations
Node Restrictions
Taints allow nodes to repel certain pods. Tolerations allow pods to schedule onto tainted nodes. Effects: NoSchedule, PreferNoSchedule, NoExecute. Use for dedicated nodes (GPU, SSD), node isolation, eviction of existing pods.
Dedicated NodesIsolation

04 / Networking
📖 Official Docs: Services & Networking Service Ingress Ingress Controllers NetworkPolicy DNS for Pods and Services EndpointSlices Gateway API

Kubernetes Networking Model

// Every Pod gets its own IP. All Pods can communicate with all other Pods without NAT. Nodes can communicate with all Pods. The IP a Pod sees itself as is the same IP others see it as.

🔵
ClusterIP Service
Default service type. Exposes service on an internal IP in the cluster. Only reachable from within the cluster. kube-proxy manages iptables rules to forward traffic to pod endpoints.
  • Virtual IP — never bound to any interface
  • DNS: my-svc.my-namespace.svc.cluster.local
  • Stable endpoint for pod-to-pod communication
  • Load balances across healthy endpoint pods
🟡
NodePort Service
Exposes the service on each Node's IP at a static port (30000–32767). Routes external traffic from NodeIP:NodePort to ClusterIP:Port. Useful for development and testing, not production.
  • Port range: 30000–32767 by default
  • Creates ClusterIP automatically
  • Accessible: <NodeIP>:<NodePort>
  • All nodes expose the port, even without pods
🟢
LoadBalancer Service
Exposes the Service externally using a cloud provider's load balancer. Creates NodePort and ClusterIP automatically. Provisions external IP via cloud-controller-manager. Works on AWS (NLB/ALB), GCP, Azure.
  • Provisions cloud load balancer automatically
  • Gets external-ip once provisioned
  • Supports annotations for cloud-specific config
  • MetalLB for bare-metal clusters
🔗
ExternalName Service
Maps a service to a DNS name (not a selector). Returns a CNAME record with the configured external name. No proxying. Useful for accessing external databases or services with a stable in-cluster DNS name.
  • CNAME mapping only — no proxy
  • Access external services by cluster-internal name
  • Useful for legacy system migration
🌐
Ingress
Manages external access to services in a cluster (usually HTTP/HTTPS). Provides load balancing, SSL/TLS termination, and name-based virtual hosting. Requires an Ingress controller (NGINX, Traefik, HAProxy, AWS ALB).
  • Path-based and host-based routing
  • TLS termination with cert-manager
  • NGINX, Traefik, Contour, Kong controllers
  • Annotations for controller-specific features
🛡
Network Policies
Specification of how groups of pods are allowed to communicate. Ingress and Egress rules using label selectors, namespace selectors, CIDR blocks. Default: all traffic allowed. With policies: deny all then allow selectively (zero trust).
  • Requires CNI plugin support (Calico, Cilium)
  • Ingress: who can talk TO pods
  • Egress: where pods can talk TO
  • Namespace isolation best practice
CNI Plugins
Container Network Interface — standard for configuring network interfaces in Linux containers. Kubernetes uses CNI plugins to set up pod networking. Each plugin implements the Kubernetes network model differently.
  • Calico — BGP, NetworkPolicy, eBPF
  • Cilium — eBPF-based, L7 policies, service mesh
  • Flannel — simple overlay, VXLAN
  • Weave Net — mesh networking
🔍
DNS & Service Discovery
CoreDNS provides DNS-based service discovery. Every Service gets a DNS A/AAAA record. Pods can resolve services by short name within same namespace, or FQDN across namespaces.
  • FQDN: svc.namespace.svc.cluster.local
  • Headless services: returns pod IPs directly
  • SRV records for named ports
  • ndots:5 search path configuration
🔀
Service Mesh
Layer on top of Kubernetes networking providing advanced features: mutual TLS, observability, traffic management, circuit breaking. Popular options: Istio, Linkerd, Consul Connect. Uses sidecar proxy pattern (Envoy).
  • Mutual TLS (mTLS) between services
  • Traffic shifting, canary deployments
  • Distributed tracing & telemetry
  • Circuit breaker, retry policies

05 / Security
📖 Official Docs: Security Authentication Authorization RBAC Pod Security Standards Secrets Best Practices Encryption at Rest Audit Logging

Kubernetes Security Model

// The 4C's of Cloud Native Security: Code, Container, Cluster, Cloud. Defense in depth — security at every layer.

01
Authentication (AuthN)
Who Are You?
Kubernetes has no User objects. Human users managed externally. Strategies: X.509 client certs, Bearer tokens, OpenID Connect (OIDC), Webhook token auth, Authentication proxy. ServiceAccounts for in-cluster processes with auto-mounted JWT tokens.
02
Authorization (AuthZ)
What Can You Do?
RBAC (Role-Based Access Control): Role + RoleBinding (namespaced), ClusterRole + ClusterRoleBinding (cluster-wide). Verbs: get, list, watch, create, update, patch, delete. Principle of least privilege. ABAC and Webhook modes also available.
03
RBAC
Role-Based Access Control
Fine-grained permissions on API resources. Roles define permissions, RoleBindings assign them to subjects (users, groups, ServiceAccounts). ClusterRoles for node, persistent volumes, namespace creation. Aggregate ClusterRoles using labels.
04
Admission Control
Policy Enforcement
Intercepts requests after auth/authz, before persistence. Mutating admission webhooks (modify), Validating admission webhooks (allow/deny). Built-in controllers: NamespaceLifecycle, LimitRanger, ResourceQuota, PodSecurity. OPA Gatekeeper, Kyverno for policy-as-code.
05
Secrets Management
Sensitive Data
Kubernetes Secrets store sensitive data: passwords, tokens, keys. Base64 encoded (NOT encrypted by default). Enable encryption at rest with EncryptionConfiguration. Use external secret stores: HashiCorp Vault, AWS Secrets Manager (ESO, CSI Driver).
06
Pod Security
Container Isolation
Pod Security Standards (PSS): Privileged, Baseline, Restricted profiles. Pod Security Admission (PSA) enforces them per namespace. Security context: runAsNonRoot, runAsUser, fsGroup, readOnlyRootFilesystem, allowPrivilegeEscalation, seccompProfile, capabilities.
07
Network Policies
Zero-Trust Networking
Default-deny all ingress and egress, then explicitly allow required traffic. Namespace isolation: deny all cross-namespace traffic by default. Label-based selectors for fine-grained control. Requires CNI support (Calico, Cilium, Weave).
08
Image Security
Supply Chain
Use specific image tags or SHA digests (never :latest in production). Scan images with Trivy, Snyk, Grype. Sign images with Cosign (Sigstore). ImagePolicyWebhook to enforce pull policies. Private registries with imagePullSecrets.
09
Audit Logging
Compliance & Forensics
Records chronological actions taken in the cluster. Audit policy defines: None, Metadata, Request, RequestResponse levels per resource/verb. Backend: log files, webhooks. Essential for compliance (SOC2, PCI-DSS, HIPAA), incident response, and forensics.

06 / Storage
📖 Official Docs: Storage Volumes Persistent Volumes Storage Classes Volume Snapshots Dynamic Provisioning Ephemeral Volumes

Storage Resources

// Kubernetes abstracts storage from compute. Volumes attach to pods, PersistentVolumes are cluster-level resources, PersistentVolumeClaims are user requests for storage.

💾
Volume
Pod-Scoped Storage
Exists as long as the Pod exists. Types: emptyDir (ephemeral scratch space), hostPath (node filesystem, use sparingly), configMap/secret (mount configs/secrets), projected (multiple sources), downwardAPI (expose pod metadata to containers).
emptyDirhostPathconfigMap
🗃
PersistentVolume (PV)
Cluster Storage Resource
A piece of storage provisioned by an admin or dynamically. Independent of pod lifecycle. Access modes: ReadWriteOnce (RWO), ReadOnlyMany (ROX), ReadWriteMany (RWX), ReadWriteOncePod (RWOP). Reclaim policies: Retain, Delete, Recycle.
RWORWXRetain/Delete
📝
PersistentVolumeClaim (PVC)
Storage Request
Request for storage by a user. Pods reference PVCs to use persistent storage. Kubernetes binds PVCs to matching PVs. Supports dynamic provisioning via StorageClasses. PVC can request specific storage class, access mode, and size.
BindingDynamic Provisioning
StorageClass
Dynamic Provisioning
Enables dynamic provisioning of PersistentVolumes. Defines the provisioner (e.g., kubernetes.io/aws-ebs), parameters, and reclaim policy. VolumeBindingMode: Immediate vs WaitForFirstConsumer. Default StorageClass auto-assigned to unspecified PVCs.
DynamicProvisionerParameters
🔌
CSI Drivers
Container Storage Interface
Standard interface for container orchestrators to expose storage systems to containerized workloads. CSI drivers provided by storage vendors: AWS EBS, GCP Persistent Disk, Azure Disk, Ceph RBD, Longhorn, OpenEBS. Supports snapshots, cloning, resizing.
AWS EBSCephLonghorn
📸
VolumeSnapshot
Point-in-Time Backup
Create snapshots of PersistentVolumeClaims. VolumeSnapshotClass defines the snapshot driver. VolumeSnapshot creates the snapshot. VolumeSnapshotContent is the actual snapshot resource. Restore by creating PVC from snapshot. Requires CSI driver support.
BackupRestoreCSI Required

07 / Configuration
📖 Official Docs: Configuration ConfigMaps Secrets Resource Quotas LimitRange HPA Autoscaling

Configuration Resources

🗂
ConfigMap
Non-Secret Configuration
Store non-confidential data as key-value pairs. Mount as files, environment variables, or command-line arguments. Decouples application config from container images. Max size 1MB. Not encrypted — use Secrets for sensitive data. Immutable ConfigMaps for performance.
Env VarsVolume MountImmutable
🔑
Secret
Sensitive Configuration
Stores sensitive data (passwords, API keys, TLS certs). Types: Opaque, kubernetes.io/tls, kubernetes.io/dockerconfigjson, kubernetes.io/service-account-token. Base64 encoded. Enable EncryptionConfiguration for encryption at rest. Avoid committing to git.
TLSOpaqueEncryption at Rest
📊
ResourceQuota
Namespace Resource Limits
Limits total resource consumption in a namespace. Covers compute (CPU, memory), storage (PVC count, storage size), object count (pods, services, secrets). LimitRange sets default requests/limits per container/pod. Prevents noisy-neighbor problem.
CPU/MemoryObject Count
🏢
Namespace
Virtual Cluster Isolation
Provides a mechanism for isolating groups of resources. Names unique within namespace, not across. Not all objects are namespaced (Nodes, PVs, StorageClass, ClusterRole). Default namespaces: default, kube-system, kube-public, kube-node-lease.
IsolationMulti-tenancy
📈
HPA
Horizontal Pod Autoscaler
Automatically scales number of pod replicas based on observed metrics. Default: CPU/memory via Metrics Server. Custom metrics via Prometheus Adapter. External metrics from cloud providers. ScaleDown stabilization window prevents flapping.
CPUCustom MetricsKEDA
📦
VPA
Vertical Pod Autoscaler
Automatically adjusts CPU and memory requests/limits for containers. Modes: Off (recommend only), Initial (set on pod creation), Auto (update running pods). Analyzes historical usage patterns. Cannot be used with HPA on same metric simultaneously.
RightsizingRecommendations

08 / YAML Examples
📖 Official Docs: Deployment YAML StatefulSet YAML Ingress YAML API Reference

Example Manifests

// Production-ready YAML configurations for all major Kubernetes resources. Copy, adapt, deploy.

Web Application Deployment Deployment
Production-ready Deployment with resource limits, probes, security context, and rolling update strategy.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: production
  labels:
    app: web-app
    version: v1.2.0
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: web-app
    spec:
      serviceAccountName: web-app-sa
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 2000
      containers:
      - name: web-app
        image: myapp:1.2.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop: ["ALL"]
        envFrom:
        - configMapRef:
            name: web-app-config
        - secretRef:
            name: web-app-secrets
PostgreSQL StatefulSet StatefulSet
Stateful database deployment with persistent volume per replica and stable network identity.
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: database
spec:
  serviceName: postgres-headless
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15-alpine
        ports:
        - containerPort: 5432
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 10Gi
Service & Ingress Service / Ingress
ClusterIP service with NGINX Ingress, TLS termination, and path-based routing.
apiVersion: v1
kind: Service
metadata:
  name: web-app-svc
  namespace: production
spec:
  selector:
    app: web-app
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-app-ingress
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-app-svc
            port:
              number: 80
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-svc
            port:
              number: 8080
Network Policy (Zero Trust) NetworkPolicy
Deny all ingress/egress by default, then selectively allow required traffic between namespaces.
# Deny all ingress and egress by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
---
# Allow web-app to receive traffic from ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress-controller
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: web-app
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: ingress-nginx
    - podSelector:
        matchLabels:
          app.kubernetes.io/name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8080
---
# Allow egress to database namespace only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-db-egress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: web-app
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: database
    ports:
    - protocol: TCP
      port: 5432
  - to: # Allow DNS
    - namespaceSelector: {}
    ports:
    - protocol: UDP
      port: 53
RBAC — Role & Binding RBAC
ServiceAccount, Role with least-privilege permissions, and RoleBinding for a microservice.
apiVersion: v1
kind: ServiceAccount
metadata:
  name: web-app-sa
  namespace: production
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123:role/web-app
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: web-app-role
  namespace: production
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["secrets"]
  resourceNames: ["web-app-secrets"]
  verbs: ["get"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: web-app-binding
  namespace: production
subjects:
- kind: ServiceAccount
  name: web-app-sa
  namespace: production
roleRef:
  kind: Role
  name: web-app-role
  apiGroup: rbac.authorization.k8s.io
HPA + PDB Autoscaling
Horizontal Pod Autoscaler with CPU/memory targets and PodDisruptionBudget for HA during updates.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 200Mi
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
  namespace: production
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web-app
CronJob — Scheduled Backup CronJob
Scheduled job that runs a database backup every night at 2 AM with retry policy.
apiVersion: batch/v1
kind: CronJob
metadata:
  name: db-backup
  namespace: production
spec:
  schedule: "0 2 * * *"  # 2 AM daily
  timeZone: "UTC"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  startingDeadlineSeconds: 300
  jobTemplate:
    spec:
      backoffLimit: 3
      activeDeadlineSeconds: 3600
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: postgres:15-alpine
            command:
            - /bin/sh
            - -c
            - pg_dump -h postgres-0.postgres-headless -U postgres mydb | gzip | aws s3 cp - s3://backups/$(date +%Y%m%d).sql.gz
            env:
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: password
DaemonSet — Log Collector DaemonSet
Fluent Bit DaemonSet running on every node to collect and forward container logs to Elasticsearch.
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
  labels:
    app: fluent-bit
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      tolerations: # Run on control-plane too
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:2.2
        volumeMounts:
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: containers
          mountPath: /var/lib/docker/containers
          readOnly: true
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: containers
        hostPath:
          path: /var/lib/docker/containers
ConfigMap & Secret Config
Application configuration and secrets with multiple data entries and multi-line values.
apiVersion: v1
kind: ConfigMap
metadata:
  name: web-app-config
  namespace: production
data:
  APP_ENV: production
  LOG_LEVEL: info
  MAX_CONNECTIONS: "100"
  app.properties: |
    server.port=8080
    cache.ttl=300
    feature.flags=auth,metrics
immutable: false
---
apiVersion: v1
kind: Secret
metadata:
  name: web-app-secrets
  namespace: production
  annotations:
    reloader.stakater.com/match: "true"
type: Opaque
stringData: # plain text (auto base64)
  DATABASE_URL: postgresql://user:pass@postgres:5432/db
  API_KEY: supersecretapikey123
  JWT_SECRET: myverysecretjwtkey
ResourceQuota & LimitRange Quota
Namespace-level resource limits with per-container defaults using LimitRange.
apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    persistentvolumeclaims: "10"
    pods: "100"
    services: "20"
    services.loadbalancers: "2"
    secrets: "50"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: 500m
      memory: 256Mi
    defaultRequest:
      cpu: 100m
      memory: 128Mi
    max:
      cpu: "4"
      memory: 4Gi
    min:
      cpu: 10m
      memory: 32Mi

09 / Scheduling
📖 Official Docs: Scheduling & Eviction kube-scheduler Assign Pod to Node Topology Spread Priority & Preemption Taints & Tolerations Resource Bin Packing

Advanced Scheduling

// The kube-scheduler places Pods onto nodes through a filtering + scoring pipeline. Advanced controls let you influence exactly where and how workloads run.

🏆
PriorityClass
Pod Priority & Preemption
Assign integer priority values to Pods. Higher-priority Pods can preempt (evict) lower-priority ones when cluster resources are tight. System-node-critical and system-cluster-critical are built-in high-priority classes. PreemptionPolicy controls whether preemption is allowed.
PreemptionQoSSystem Critical
🌐
Topology Spread Constraints
Even Distribution
Spread Pods evenly across failure domains: zones, regions, nodes, custom topologies. Controls maxSkew (max allowed imbalance), topologyKey (the node label to spread across), and whenUnsatisfiable (DoNotSchedule or ScheduleAnyway). Supersedes podAntiAffinity for spreading.
Zone SpreadHAmaxSkew
🎛
Scheduler Profiles & Plugins
Custom Scheduling Logic
The scheduler framework exposes extension points: PreFilter, Filter, PostFilter, PreScore, Score, Reserve, Permit, PreBind, Bind. Multiple scheduler profiles can run in one binary. Deploy custom schedulers or use scheduler-plugins project for advanced features like coscheduling and capacity scheduling.
Extension PointsPlugins
Pod Overhead
Runtime Overhead Accounting
Accounts for resources consumed by the Pod sandbox (e.g. kata containers VM overhead) in addition to container resource requests/limits. Defined in RuntimeClass. Included in scheduling decisions, quota accounting, and kubelet cgroup management.
RuntimeClassSandbox Overhead
🔒
Node Selector & Node Name
Simple Node Targeting
nodeSelector: simplest form of node constraint — map of label key-values that must match. nodeName: bypasses the scheduler entirely, directly binds Pod to a specific node by name. Both are less flexible than nodeAffinity but simpler to reason about.
nodeSelectornodeName
📦
RuntimeClass
Container Runtime Selection
Select different container runtimes (and runtime configurations) per Pod. Use cases: stronger isolation with gVisor or Kata Containers for untrusted workloads, while standard workloads use containerd/runc. Specify in pod spec via runtimeClassName field.
gVisorKataIsolation
PriorityClass & Topology Spread Scheduling
PriorityClass definition and a Deployment using topology spread constraints across availability zones.
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "Critical production workloads"
---
# Deployment with topology spread + priority
apiVersion: apps/v1
kind: Deployment
metadata:
  name: critical-app
spec:
  replicas: 6
  selector:
    matchLabels:
      app: critical-app
  template:
    metadata:
      labels:
        app: critical-app
    spec:
      priorityClassName: high-priority
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: critical-app
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: critical-app
      containers:
      - name: app
        image: myapp:latest

10 / Extending Kubernetes
📖 Official Docs: Extending Kubernetes Custom Resources Operator Pattern Admission Controllers Webhook Admission API Aggregation Gateway API Garbage Collection

Extending the Platform

// Kubernetes is designed to be extensible. Add new resource types, custom controllers, admission logic, and API endpoints without modifying core code.

📐
CRD
Custom Resource Definitions
Extend the Kubernetes API with your own resource types. Define schema with OpenAPI v3 validation. CRDs become first-class API objects: storable in etcd, accessible via kubectl, RBAC-protected. The foundation for Operators. Versions, conversion webhooks, and status subresources supported.
Custom APIOpenAPI SchemaOperators
🤖
Operators
Controller Pattern
Operators extend Kubernetes to manage stateful applications using domain-specific knowledge. Built on CRDs + custom controllers. Implement the reconcile loop: observe current state → compare to desired state → act. Operator SDK, Kubebuilder, CAPI frameworks. Examples: Prometheus Operator, Strimzi (Kafka), CloudNativePG.
Reconcile LoopkubebuilderOperator SDK
🔀
Admission Webhooks
Mutating & Validating
Mutating: Modify objects before persistence (inject sidecars, set defaults, add labels). Validating: Allow or reject requests after mutation (enforce policies). Called via HTTPS webhook. Failure policy: Fail (safe) or Ignore. Tools: OPA Gatekeeper, Kyverno, custom webhooks.
OPA GatekeeperKyvernoSidecar Inject
🔌
API Aggregation Layer
Extension API Servers
Register additional API servers that serve under /apis/<group>/<version>. The API server proxies requests to your extension server. More powerful than CRDs: custom storage backends, non-standard REST semantics. Used by metrics-server (metrics.k8s.io API), service catalog.
APIServiceCustom Backend
🗑
Finalizers & Owner References
Garbage Collection
Finalizers: keys on objects that prevent deletion until external cleanup is done. Controller removes the finalizer after cleanup. Owner References: parent-child relationships. When parent is deleted, children are garbage collected (cascade). Foreground vs Background deletion propagation policies.
Cascade DeleteCleanup Hooks
🌐
Gateway API
Next-Gen Ingress
Successor to Ingress — more expressive, role-oriented, extensible. Resources: GatewayClass, Gateway, HTTPRoute, GRPCRoute, TCPRoute, TLSRoute. Separates infrastructure (Gateway) from application routing (Routes). Supported by Cilium, Istio, Envoy Gateway, NGINX, Traefik.
HTTPRouteGRPCRouteRole-Oriented
📋
EndpointSlices
Scalable Endpoints
Replacement for Endpoints objects — more scalable for large clusters. Each slice holds up to 100 endpoints by default. Conditions: Ready, Serving, Terminating. Supports IPv4, IPv6, FQDN. Required for topology-aware routing and traffic policies (Local, Cluster). Automatically managed by endpoint-slice controller.
ScalableIPv4/IPv6Topology Aware
🎭
Lease Objects
Leader Election & Heartbeats
Lightweight objects in the coordination.k8s.io API group. Used for: node heartbeats (kubelet updates Lease every 10s, reduces etcd load), leader election in controllers (only one replica acts as leader), and distributed locking in custom controllers.
Leader ElectionHeartbeat
Custom Resource Definition CRD
A CRD defining a custom "Database" resource with OpenAPI v3 schema validation and status subresource.
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.mycompany.io
spec:
  group: mycompany.io
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required: ["engine", "version"]
            properties:
              engine:
                type: string
                enum: ["postgres", "mysql"]
              version:
                type: string
              replicas:
                type: integer
                minimum: 1
                maximum: 5
              storageGB:
                type: integer
          status:
            type: object
            properties:
              phase:
                type: string
              readyReplicas:
                type: integer
    subresources:
      status: {}
    additionalPrinterColumns:
    - name: Engine
      type: string
      jsonPath: .spec.engine
    - name: Phase
      type: string
      jsonPath: .status.phase
  scope: Namespaced
  names:
    plural: databases
    singular: database
    kind: Database
    shortNames: [db]
---
# Custom Resource instance
apiVersion: mycompany.io/v1
kind: Database
metadata:
  name: my-postgres
  namespace: production
spec:
  engine: postgres
  version: "15"
  replicas: 3
  storageGB: 50
Validating Admission Webhook Webhook
Register a validating webhook that enforces policy — e.g. all Deployments must have resource limits set.
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: resource-limits-enforcer
webhooks:
- name: check-limits.mycompany.io
  admissionReviewVersions: ["v1"]
  sideEffects: None
  failurePolicy: Fail
  rules:
  - apiGroups: ["apps"]
    apiVersions: ["v1"]
    operations: ["CREATE", "UPDATE"]
    resources: ["deployments"]
  namespaceSelector:
    matchLabels:
      admission-webhook: enabled
  clientConfig:
    service:
      name: webhook-service
      namespace: webhook-system
      path: /validate
      port: 443
    caBundle: LS0t... # base64 CA cert
  timeoutSeconds: 5
---
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  name: sidecar-injector
webhooks:
- name: inject.istio.io
  admissionReviewVersions: ["v1"]
  sideEffects: None
  rules:
  - apiGroups: [""]
    apiVersions: ["v1"]
    operations: ["CREATE"]
    resources: ["pods"]
  clientConfig:
    service:
      name: istiod
      namespace: istio-system
      path: /inject
Gateway API — HTTPRoute Gateway API
Next-gen Ingress using Gateway API with canary traffic splitting between stable and canary versions.
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: prod-gateway
  namespace: production
spec:
  gatewayClassName: cilium
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    tls:
      certificateRefs:
      - name: app-tls
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: web-app-route
  namespace: production
spec:
  parentRefs:
  - name: prod-gateway
  hostnames: ["app.example.com"]
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: web-app-stable
      port: 80
      weight: 90
    - name: web-app-canary
      port: 80
      weight: 10  # 10% canary traffic

11 / Cluster Administration
📖 Official Docs: Cluster Administration kubeadm Drain Nodes etcd Backup TLS Certificates Container Images Lifecycle Hooks Ephemeral Containers

Cluster Administration

// Day-2 operations: bootstrapping, node lifecycle management, upgrades, etcd backup, multi-tenancy, and cluster-level policies.

🛠
kubeadm
Cluster Bootstrap Tool
Official tool to bootstrap a production-grade Kubernetes cluster. Commands: kubeadm init (control plane), kubeadm join (worker nodes), kubeadm upgrade (upgrade cluster), kubeadm reset (tear down), kubeadm token (manage bootstrap tokens), kubeadm certs (certificate management).
kubeadm initkubeadm joinUpgrade
🔄
Node Lifecycle
Cordon, Drain & Delete
cordon: Mark node as unschedulable (no new pods). drain: Evict all pods (respects PDBs), then cordon — use before maintenance. uncordon: Restore scheduling. Node conditions: Ready, MemoryPressure, DiskPressure, PIDPressure, NetworkUnavailable.
cordondrainMaintenance
💾
etcd Backup & Restore
Disaster Recovery
etcd is the source of truth — back it up regularly. Use etcdctl snapshot save to create snapshots. etcdctl snapshot restore to restore to new data directory. Back up before every cluster upgrade. Store snapshots off-cluster (S3, GCS). Velero for application-level backups including PV data.
etcdctlsnapshotVelero
🔐
Certificate Management
PKI & TLS
Kubernetes uses PKI certificates for all internal communication. Certs: ca.crt, apiserver.crt, apiserver-kubelet-client.crt, front-proxy-ca.crt, etcd/ca.crt. Default 1-year expiry. Use kubeadm certs renew to rotate. cert-manager automates issuance from Let's Encrypt, Vault, or internal CA.
cert-managerLet's EncryptPKI
🏢
Multi-Tenancy
Cluster Sharing Patterns
Soft multi-tenancy: namespaces + RBAC + NetworkPolicy + ResourceQuota. Hard multi-tenancy: separate clusters per tenant (vcluster for virtual clusters). Hierarchical namespaces (HNC) for team org structures. Node selectors/taints for dedicated node pools per team. Capsule, Loft for enterprise multi-tenancy.
vclusterCapsuleHNC
🌍
Cluster Upgrades
Version Management
Upgrade control plane first, then worker nodes. Kubernetes skew policy: kubelet may not be more than 2 minor versions older than kube-apiserver. Rolling node upgrades: drain → upgrade → uncordon. Managed services (EKS, GKE, AKS) handle upgrades via their APIs. Always upgrade to next minor version sequentially.
Skew PolicyRollingMinor-by-Minor
🌐
Windows Nodes
Heterogeneous Clusters
Kubernetes supports Windows Server worker nodes alongside Linux nodes. Windows pods run Windows containers (no Linux containers on Windows nodes). Use node selectors (kubernetes.io/os: windows) to target Windows workloads. Limitations: no HostNetwork, no privileged containers, no Linux-specific capabilities.
Windows ServerMixed OS
📦
Image Management
Registry & Pull Policy
imagePullPolicy: Always (re-pulls every time), Never (local only), IfNotPresent (default — pull if not cached). Use image digests (sha256:...) instead of tags for reproducibility. imagePullSecrets for private registries — create docker-registry secret. Image garbage collection via kubelet: imageGCHighThresholdPercent / Low.
imagePullSecretsDigest PinningGC
Container Lifecycle Hooks
postStart & preStop
postStart: Executes immediately after container starts (async — not guaranteed to run before entrypoint). preStop: Called before container is terminated — use for graceful shutdown (drain connections, flush buffers). Pair with terminationGracePeriodSeconds for zero-downtime deploys.
postStartpreStopGraceful Shutdown
🔬
Ephemeral Containers
Live Debugging
Temporary containers added to a running Pod for debugging purposes. Cannot be removed once added. Share the Pod's namespaces (network, PID). Use kubectl debug to inject a debug container (e.g. busybox, netshoot) into a distroless or crash-looping container for live diagnosis.
kubectl debugDistrolessLive Debug
Container Lifecycle Hooks Pod Spec
postStart and preStop hooks for graceful startup and zero-downtime shutdown.
apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  terminationGracePeriodSeconds: 60
  containers:
  - name: app
    image: myapp:1.0
    lifecycle:
      postStart:
        exec:
          command:
          - /bin/sh
          - -c
          - echo "Container started" >> /var/log/lifecycle.log
      preStop:
        exec:
          command:
          - /bin/sh
          - -c
          - |
              # Graceful shutdown: stop accepting new connections,
              # wait for in-flight requests to complete
              kill -SIGTERM 1
              sleep 30
    startupProbe:
      httpGet:
        path: /startup
        port: 8080
      failureThreshold: 30
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 0
      periodSeconds: 10
Pod Security Standards PSA / Security
Namespace-level Pod Security Admission enforcement using labels and a restricted pod example.
# Label namespace to enforce security standards
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    # Enforce: reject violating pods
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.35
    # Audit: log violations
    pod-security.kubernetes.io/audit: restricted
    # Warn: show warnings
    pod-security.kubernetes.io/warn: restricted
---
# Restricted-compliant pod (all security requirements met)
apiVersion: v1
kind: Pod
metadata:
  name: restricted-pod
  namespace: production
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp:1.0
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop: ["ALL"]

12 / Observability
📖 Official Docs: Logging Architecture Debugging System Metrics Cluster Troubleshooting Application Debugging Metrics Reference

Monitoring, Logging & Tracing

// The three pillars of observability in Kubernetes: metrics for dashboards & alerting, logs for debugging, traces for distributed request flows.

📊
Metrics Architecture
Layered metrics system — core metrics for autoscaling, full metrics for monitoring.
  • Metrics Server → HPA / VPA (CPU, memory)
  • Prometheus → custom metrics + alerting
  • kube-state-metrics → object state metrics
  • node-exporter → OS/hardware metrics
  • Grafana → dashboards (kube-prometheus-stack)
  • PrometheusRule CRD for alerting rules
📋
Logging Architecture
Kubernetes doesn't provide native log storage — logs flow to external systems.
  • Node-level: DaemonSet log agents (Fluent Bit)
  • Sidecar: dedicated log container per pod
  • kubectl logs: direct container log access
  • Backends: Elasticsearch, Loki, CloudWatch
  • Structured logging (JSON) for queryability
  • Log rotation: containerLogMaxSize / MaxFiles
🔍
Distributed Tracing
Track requests across multiple microservices using trace context propagation.
  • OpenTelemetry (OTel) — CNCF standard SDK
  • Jaeger / Zipkin — trace storage & UI
  • W3C Trace Context propagation headers
  • Automatic injection via service mesh (Istio)
  • OTel Collector DaemonSet / sidecar pattern
🚨
Alerting
Proactive notification when cluster or application health degrades.
  • Alertmanager — route, deduplicate, silence alerts
  • PrometheusRule CRDs for alert definitions
  • Receivers: Slack, PagerDuty, email, webhooks
  • Inhibition rules to suppress downstream alerts
  • kube-prometheus-stack bundles everything
🩺
Cluster Health Events
Kubernetes Events provide real-time cluster activity and troubleshooting info.
  • kubectl get events -n <ns> --sort-by=.lastTimestamp
  • Types: Normal, Warning
  • Events expire after 1 hour by default
  • Event exporter → persist to Elasticsearch/BigQuery
  • KubeWatch, Robusta for event-driven alerting
📈
Resource Monitoring
Key metrics to monitor for cluster and workload health.
  • Node: CPU/mem utilization, disk I/O, network
  • Pod: container restarts, OOMKilled events
  • API server: request latency, error rate (4xx/5xx)
  • etcd: DB size, leader elections, write latency
  • Scheduler: scheduling latency, pending pods
ServiceMonitor (Prometheus Operator) Monitoring
Prometheus Operator ServiceMonitor to automatically scrape metrics from a Service's /metrics endpoint.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: web-app-monitor
  namespace: production
  labels:
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: web-app
  endpoints:
  - port: http
    path: /metrics
    interval: 30s
    scrapeTimeout: 10s
  namespaceSelector:
    matchNames:
    - production
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: web-app-alerts
  namespace: production
spec:
  groups:
  - name: web-app.rules
    rules:
    - alert: HighErrorRate
      expr: |
        rate(http_requests_total{status=~"5.."}[5m])
        / rate(http_requests_total[5m]) > 0.05
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: High HTTP error rate on web-app
    - alert: PodCrashLooping
      expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
      for: 5m
      labels:
        severity: warning
OpenTelemetry Collector Tracing
OTel Collector as a DaemonSet receiving traces/metrics and exporting to Jaeger and Prometheus.
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel-collector
  namespace: observability
spec:
  mode: DaemonSet
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch:
        timeout: 1s
      memory_limiter:
        limit_mib: 400
    exporters:
      jaeger:
        endpoint: jaeger-collector:14250
        tls:
          insecure: true
      prometheusremotewrite:
        endpoint: http://prometheus:9090/api/v1/write
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [jaeger]
        metrics:
          receivers: [otlp]
          processors: [batch]
          exporters: [prometheusremotewrite]

13 / kubectl Reference
📖 Official Docs: kubectl Reference kubectl Cheat Sheet kubectl Conventions JSONPath in kubectl Install kubectl

kubectl CLI Reference

// The primary command-line tool for interacting with Kubernetes clusters. Essential commands, flags, and patterns for daily operations.

📋
Get & Describe
Inspect Resources
Core commands for viewing cluster state. Use -o wide, -o yaml, -o json for different output formats. --watch (-w) for live updates. --all-namespaces (-A) across all namespaces. -l for label selector filtering. --field-selector for field-based filtering.
getdescribe-o yaml
⚙️
Apply & Create
Manage Resources
kubectl apply -f: declarative resource management (preferred). kubectl create: imperative creation. kubectl delete: remove resources. kubectl replace: full resource replacement. kubectl patch: partial update (strategic merge, JSON merge, JSON patch). kubectl edit: open in $EDITOR.
applypatchdelete
🐛
Debug & Exec
Troubleshooting
kubectl exec: run commands in containers. kubectl logs: view container logs (-f to follow, --previous for crashed containers). kubectl debug: ephemeral debug containers. kubectl port-forward: local port to pod/service. kubectl cp: copy files to/from containers. kubectl top: resource usage.
execlogsdebug
🌐
Context & Config
kubeconfig Management
kubectl config view: show kubeconfig. config use-context: switch active cluster. config get-contexts: list contexts. config set-context: modify context. KUBECONFIG env var or --kubeconfig flag for custom config paths. kubectx / kubens tools for fast switching. k9s for TUI interface.
contextkubectxk9s
🔄
Rollout Management
Deployment Operations
kubectl rollout status: watch rollout progress. rollout history: view revision history. rollout undo: rollback (--to-revision=N for specific). rollout pause / resume: canary-style pausing. rollout restart: rolling restart of all pods (triggers new ReplicaSet). scale: change replica count.
rollout undorollout restart
🏷
Label & Annotate
Metadata Operations
kubectl label: add/modify/remove labels on resources. kubectl annotate: add/modify/remove annotations. Use -- to remove (kubectl label pod foo env-). --overwrite to replace existing values. Label nodes for affinity, cordon, drain workflows. JSONPath and Go template output formats.
labelannotateJSONPath
kubectl Cheat Sheet CLI Reference
Most-used kubectl commands for daily cluster operations, debugging, and administration.
# ── CONTEXT & CLUSTER ────────────────────────────────────────
kubectl config get-contexts                    # list all contexts
kubectl config use-context my-cluster          # switch context
kubectl config set-context --current --namespace=prod  # set default ns
kubectl cluster-info                           # cluster endpoints
kubectl api-resources                          # all resource types
kubectl api-versions                           # all API versions

# ── GET / INSPECT ─────────────────────────────────────────────
kubectl get pods -A -o wide                    # all pods, all namespaces
kubectl get pod my-pod -o yaml                 # full pod spec
kubectl describe pod my-pod                    # events + status detail
kubectl get events --sort-by=.lastTimestamp    # sorted events
kubectl get all -n production                  # all resources in ns
kubectl top nodes                              # node resource usage
kubectl top pods --containers                  # container-level usage

# ── APPLY / MANAGE ────────────────────────────────────────────
kubectl apply -f manifest.yaml                 # declarative apply
kubectl apply -f ./k8s/                        # apply whole directory
kubectl delete -f manifest.yaml                # delete from file
kubectl patch deploy my-app -p '{"spec":{"replicas":5}}'
kubectl scale deploy my-app --replicas=5
kubectl set image deploy/my-app app=myapp:2.0  # update image
kubectl label node node1 disktype=ssd          # label a node

# ── ROLLOUTS ─────────────────────────────────────────────────
kubectl rollout status deploy/my-app
kubectl rollout history deploy/my-app
kubectl rollout undo deploy/my-app
kubectl rollout undo deploy/my-app --to-revision=3
kubectl rollout restart deploy/my-app          # rolling restart
kubectl rollout pause deploy/my-app            # pause rollout

# ── DEBUG / TROUBLESHOOT ──────────────────────────────────────
kubectl logs my-pod -c my-container -f         # follow logs
kubectl logs my-pod --previous                 # crashed container logs
kubectl exec -it my-pod -- /bin/sh             # interactive shell
kubectl exec my-pod -- env                     # list env vars
kubectl debug my-pod -it --image=busybox       # ephemeral debug container
kubectl port-forward svc/my-svc 8080:80        # local port forward
kubectl cp my-pod:/app/logs.txt ./logs.txt     # copy from pod

# ── NODE MANAGEMENT ───────────────────────────────────────────
kubectl cordon node1                           # mark unschedulable
kubectl drain node1 --ignore-daemonsets --delete-emptydir-data
kubectl uncordon node1                         # re-enable scheduling
kubectl taint nodes node1 key=val:NoSchedule   # add taint
kubectl taint nodes node1 key=val:NoSchedule-  # remove taint

# ── GENERATING YAML ──────────────────────────────────────────
kubectl create deploy my-app --image=nginx --dry-run=client -o yaml
kubectl create svc clusterip my-svc --tcp=80:8080 --dry-run=client -o yaml
kubectl create secret generic my-secret --from-literal=key=val --dry-run=client -o yaml

# ── USEFUL OUTPUT FORMATS ────────────────────────────────────
kubectl get pods -o jsonpath='{.items[*].metadata.name}'
kubectl get nodes -o custom-columns='NAME:.metadata.name,STATUS:.status.conditions[-1].type'
kubectl get pods --sort-by='.status.startTime'

14 / Additional YAML Examples
📖 Official Docs: Policies etcd Backup Pod Security Admission Volume Snapshots

Policies, Multi-tenancy & Operations

// Additional production patterns: Kyverno policies, etcd backup jobs, namespace setup, and node management.

Kyverno Policy Policy-as-Code
Kyverno ClusterPolicy to enforce image registry restrictions and require resource limits on all pods.
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: enforce-registry-and-limits
spec:
  validationFailureAction: Enforce
  background: true
  rules:
  - name: restrict-image-registries
    match:
      any:
      - resources:
          kinds: ["Pod"]
    validate:
      message: "Only images from approved registries allowed"
      pattern:
        spec:
          containers:
          - image: "registry.mycompany.io/* | gcr.io/*"
  - name: require-resource-limits
    match:
      any:
      - resources:
          kinds: ["Pod"]
    validate:
      message: "CPU and memory limits are required"
      pattern:
        spec:
          containers:
          - resources:
              limits:
                cpu: "?*"
                memory: "?*"
  - name: add-default-labels  # mutate rule
    match:
      any:
      - resources:
          kinds: ["Deployment"]
    mutate:
      patchStrategicMerge:
        metadata:
          labels:
            managed-by: kyverno
etcd Backup CronJob Cluster Admin
Automated etcd snapshot backup every 6 hours, uploaded to S3 using etcdctl.
apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcd-backup
  namespace: kube-system
spec:
  schedule: "0 */6 * * *"  # every 6 hours
  jobTemplate:
    spec:
      template:
        spec:
          hostNetwork: true
          restartPolicy: OnFailure
          nodeSelector:
            node-role.kubernetes.io/control-plane: ""
          tolerations:
          - key: node-role.kubernetes.io/control-plane
            operator: Exists
            effect: NoSchedule
          containers:
          - name: etcd-backup
            image: bitnami/etcd:3.5
            command:
            - /bin/sh
            - -c
            - |
                BACKUP_FILE="/tmp/etcd-$(date +%Y%m%d-%H%M%S).db"
                ETCDCTL_API=3 etcdctl snapshot save $BACKUP_FILE \
                  --endpoints=https://127.0.0.1:2379 \
                  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
                  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
                  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
                aws s3 cp $BACKUP_FILE s3://my-etcd-backups/
                echo "Backup complete: $BACKUP_FILE"
            volumeMounts:
            - name: etcd-certs
              mountPath: /etc/kubernetes/pki/etcd
              readOnly: true
          volumes:
          - name: etcd-certs
            hostPath:
              path: /etc/kubernetes/pki/etcd
Namespace + Full Tenant Setup Multi-Tenancy
Complete namespace setup for a team: namespace, RBAC, ResourceQuota, LimitRange, and NetworkPolicy isolation.
# 1. Namespace
apiVersion: v1
kind: Namespace
metadata:
  name: team-alpha
  labels:
    team: alpha
    pod-security.kubernetes.io/enforce: baseline
---
# 2. Team RBAC - developers get edit rights
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: team-alpha-developers
  namespace: team-alpha
subjects:
- kind: Group
  name: team-alpha-devs
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: edit
  apiGroup: rbac.authorization.k8s.io
---
# 3. Quota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"
---
# 4. Namespace isolation network policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: namespace-isolation
  namespace: team-alpha
spec:
  podSelector: {}
  policyTypes: [Ingress]
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          team: alpha  # only same-team ns
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: ingress-nginx
VolumeSnapshot & Restore Storage
Create a VolumeSnapshot from a PVC, then restore it by creating a new PVC from that snapshot.
# VolumeSnapshotClass (CSI driver dependent)
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-aws-vsc
driver: ebs.csi.aws.com
deletionPolicy: Delete
---
# Take a snapshot of existing PVC
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-snapshot-20240101
  namespace: database
spec:
  volumeSnapshotClassName: csi-aws-vsc
  source:
    persistentVolumeClaimName: data-postgres-0
---
# Restore: create new PVC from snapshot
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-restored
  namespace: database
spec:
  accessModes: ["ReadWriteOnce"]
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 10Gi
  dataSource:
    name: postgres-snapshot-20240101
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io

15 / Cluster Setup
📖 Official Docs: Getting Started / Setup kubeadm Install Create Cluster (kubeadm) Install Tools Container Runtimes

Setting Up Kubernetes

// Multiple ways to run Kubernetes depending on your use case — local development, bare metal, or cloud-managed. Choose the right tool for the right environment.

Tool Best For Complexity Production? Notes
k3d Local dev, CI/CD testing LOW k3s in Docker containers — fastest spin-up (<30s)
kind Local dev, e2e testing LOW Kubernetes IN Docker — used by k8s upstream CI
minikube Local dev, learning LOW Single-node, many drivers (Docker, VM, Podman)
k3s Edge, IoT, bare-metal, RPi MEDIUM Lightweight k8s — 40MB binary, SQLite or etcd
kubeadm Self-managed production clusters HIGH Official bootstrap tool — full control, manual upgrades
RKE2 Enterprise, FIPS-compliant MEDIUM Rancher's hardened k8s distribution
EKS / GKE / AKS Cloud-managed, teams LOW Managed control plane — pay for worker nodes only

k3d — Local Development

// k3d wraps k3s (a lightweight Kubernetes) inside Docker containers. Create full multi-node clusters on your laptop in seconds.

k3d Overview
k3s in Docker
k3d uses Docker to run k3s nodes as containers. Each "node" is a Docker container. Control plane and workers all in Docker. Supports multi-node clusters, LoadBalancer via built-in Traefik, persistent volumes via local-path-provisioner. Perfect for local development and CI pipelines.
Docker RequiredMulti-NodeTraefik LB
📦
k3d Install
Prerequisites & Installation
Requires: Docker Desktop or Docker Engine, kubectl. Install k3d via curl script, Homebrew (macOS), or Chocolatey (Windows). Very small binary (~15MB). Works on Linux, macOS, Windows (WSL2). No VM required — pure Docker networking.
curl | bashbrewchoco
🔧
k3d Config File
Declarative Cluster Config
k3d supports a YAML config file for reproducible cluster creation. Define server count, agent count, port mappings, volume mounts, extra k3s args, image registry mirrors, and environment variables. Commit to git for team-shared dev environments.
Config as CodeReproducible
k3d Setup — Complete Walkthroughk3d
Full k3d installation and cluster creation commands for local development.
# ── INSTALL k3d ──────────────────────────────────────────────
# Linux / macOS
curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash

# macOS via Homebrew
brew install k3d

# Verify installation
k3d version
kubectl version --client

# ── CREATE CLUSTERS ──────────────────────────────────────────
# Simple single-server cluster
k3d cluster create mycluster

# Production-like: 1 server + 3 agents + port mappings
k3d cluster create devcluster \
  --servers 1 \
  --agents 3 \
  --port "80:80@loadbalancer" \
  --port "443:443@loadbalancer" \
  --api-port 6550 \
  --k3s-arg "--disable=traefik@server:0"  # disable built-in Traefik

# With local registry (for custom images without pushing to remote)
k3d registry create myregistry --port 5050
k3d cluster create devcluster \
  --registry-use k3d-myregistry:5050 \
  --agents 2

# ── CLUSTER MANAGEMENT ───────────────────────────────────────
k3d cluster list                    # list all clusters
k3d cluster stop devcluster         # stop cluster (keep state)
k3d cluster start devcluster        # restart cluster
k3d cluster delete devcluster       # delete cluster
k3d node list                       # list all nodes
k3d node add --cluster devcluster   # add a worker node

# ── KUBECONFIG ───────────────────────────────────────────────
# Automatically merged into ~/.kube/config
kubectl config use-context k3d-devcluster
kubectl get nodes

# ── LOAD IMAGES INTO CLUSTER ─────────────────────────────────
# Build locally and import into k3d (no registry push needed)
docker build -t myapp:dev .
k3d image import myapp:dev --cluster devcluster
k3d Config Filek3d YAML
Declarative k3d cluster configuration file — commit to git for reproducible dev environments.
# k3d-config.yaml
apiVersion: k3d.io/v1alpha5
kind: Simple
metadata:
  name: dev-cluster

servers: 1
agents: 2

kubeAPI:
  hostPort: "6550"

ports:
- port: 8080:80
  nodeFilters: [loadbalancer]
- port: 8443:443
  nodeFilters: [loadbalancer]

volumes:
- volume: /tmp/k3dvol:/data
  nodeFilters: ["server:*", "agent:*"]

registries:
  use: [k3d-myregistry:5050]
  config: |
    mirrors:
      "docker.io":
        endpoint:
          - "https://mirror.gcr.io"

options:
  k3s:
    extraArgs:
    - arg: --disable=traefik
      nodeFilters: ["server:*"]
    - arg: --cluster-cidr=10.20.0.0/16
      nodeFilters: ["server:*"]

# Create from config file:
# k3d cluster create --config k3d-config.yaml

kubeadm — Production Cluster

// kubeadm is the official Kubernetes cluster bootstrapping tool. Use it to set up production-grade clusters on bare metal, VMs, or cloud instances.

🏗
Prerequisites
Node Requirements
Each node: 2+ CPUs, 2GB+ RAM, unique hostname/MAC/product_uuid, swap disabled, full network connectivity between nodes. Ports open: 6443 (API server), 2379-2380 (etcd), 10250 (kubelet), 10257/10259 (controller/scheduler). Container runtime installed (containerd recommended).
No SwapPort 6443containerd
🌐
HA Control Plane
High Availability Setup
Stacked etcd: etcd runs on same nodes as control plane (simpler, 3+ control-plane nodes). External etcd: separate etcd cluster (more resilient, more nodes). Need a load balancer in front of multiple API servers — HAProxy or cloud LB. VIP or DNS round-robin for API server endpoint.
3 Control PlaneHAProxyStacked etcd
📋
kubeadm Config
ClusterConfiguration
Use a kubeadm config file instead of flags for reproducibility. Defines API server endpoint, pod/service CIDRs, feature gates, etcd config, image repository, encryption provider, audit policy, scheduler/controller-manager extra args.
ClusterConfigurationInitConfiguration
kubeadm — Full Production Setupkubeadm
Complete step-by-step kubeadm cluster setup on Ubuntu 22.04 — control plane + worker nodes.
## ═══════════════════════════════════════════════════
##  RUN ON ALL NODES (control-plane + workers)
## ═══════════════════════════════════════════════════

# 1. Disable swap (required by kubelet)
swapoff -a
sed -i '/swap/d' /etc/fstab

# 2. Enable required kernel modules
cat <# 3. Kernel networking params
cat <# 4. Install containerd runtime
apt-get install -y containerd
mkdir -p /etc/containerd
containerd config default | tee /etc/containerd/config.toml
# Enable SystemdCgroup (critical!)
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' \
  /etc/containerd/config.toml
systemctl restart containerd && systemctl enable containerd

# 5. Install kubeadm, kubelet, kubectl
apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.35/deb/Release.key \
  | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.35/deb/ /' \
  | tee /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet=1.30.0-1.1 kubeadm=1.30.0-1.1 kubectl=1.30.0-1.1
apt-mark hold kubelet kubeadm kubectl  # prevent auto-upgrade

## ═══════════════════════════════════════════════════
##  RUN ON CONTROL-PLANE NODE ONLY
## ═══════════════════════════════════════════════════

# 6. Initialize the cluster
kubeadm init \
  --control-plane-endpoint "k8s-api.example.com:6443" \
  --pod-network-cidr "10.244.0.0/16" \
  --service-cidr "10.96.0.0/12" \
  --upload-certs  # needed for HA: share certs with other control-planes

# 7. Set up kubeconfig for root
mkdir -p $HOME/.kube
cp /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

# 8. Install CNI plugin (Calico)
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/tigera-operator.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/custom-resources.yaml

# 9. Verify control plane is ready
kubectl get nodes
kubectl get pods -n kube-system

## ═══════════════════════════════════════════════════
##  RUN ON EACH WORKER NODE
## ═══════════════════════════════════════════════════

# 10. Join worker nodes (token from kubeadm init output)
kubeadm join k8s-api.example.com:6443 \
  --token abc123.0123456789abcdef \
  --discovery-token-ca-cert-hash sha256:<hash>

# Regenerate join token if expired (24h TTL)
kubeadm token create --print-join-command
kubeadm Config File (HA)ClusterConfig
Production kubeadm configuration file for HA cluster with encryption at rest and audit logging.
# kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.1.10
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///run/containerd/containerd.sock
  kubeletExtraArgs:
    node-labels: "node-role=control-plane"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
clusterName: production
kubernetesVersion: v1.35.0
controlPlaneEndpoint: "k8s-api.example.com:6443"
networking:
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/12
  dnsDomain: cluster.local
etcd:
  local:
    dataDir: /var/lib/etcd
    extraArgs:
      auto-compaction-retention: "8"
      quota-backend-bytes: "8589934592"  # 8Gi
apiServer:
  certSANs:
  - k8s-api.example.com
  - 192.168.1.10
  - 192.168.1.11
  - 127.0.0.1
  extraArgs:
    audit-log-path: /var/log/kubernetes/audit.log
    audit-policy-file: /etc/kubernetes/audit-policy.yaml
    encryption-provider-config: /etc/kubernetes/encryption.yaml
    enable-admission-plugins: NodeRestriction,PodSecurity
  extraVolumes:
  - name: audit-logs
    hostPath: /var/log/kubernetes
    mountPath: /var/log/kubernetes
controllerManager:
  extraArgs:
    bind-address: 0.0.0.0
scheduler:
  extraArgs:
    bind-address: 0.0.0.0
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
containerLogMaxSize: 100Mi
containerLogMaxFiles: 5
maxPods: 110
kubeReserved:
  cpu: 200m
  memory: 500Mi
systemReserved:
  cpu: 200m
  memory: 500Mi
evictionHard:
  memory.available: "300Mi"
  nodefs.available: "10%"

# Run: kubeadm init --config kubeadm-config.yaml --upload-certs

k3s — Lightweight Kubernetes

// k3s is a CNCF-certified, fully conformant Kubernetes distribution packaged as a single binary. Ideal for edge computing, IoT, Raspberry Pi, and resource-constrained environments.

k3s Setup — Server + Agentsk3s
Full k3s cluster setup with a server (control plane) node and multiple agent (worker) nodes.
# ── INSTALL k3s SERVER (Control Plane) ───────────────────────
# Single command install — runs as systemd service
curl -sfL https://get.k3s.io | sh -s - server \
  --cluster-init \
  --tls-san k3s.example.com \
  --disable traefik \
  --disable servicelb \
  --flannel-backend=none \   # disable flannel (use Calico instead)
  --write-kubeconfig-mode 644

# Get node token (needed for agents to join)
cat /var/lib/rancher/k3s/server/node-token

# Get kubeconfig
cat /etc/rancher/k3s/k3s.yaml

# ── JOIN AGENT NODES ─────────────────────────────────────────
# Run on each worker node
curl -sfL https://get.k3s.io | K3S_URL=https://k3s.example.com:6443 \
  K3S_TOKEN=<node-token> sh -

# ── HA k3s WITH EMBEDDED etcd ────────────────────────────────
# First server (bootstraps etcd)
curl -sfL https://get.k3s.io | sh -s - server \
  --cluster-init \
  --token my-shared-secret

# Additional control plane servers join the cluster
curl -sfL https://get.k3s.io | sh -s - server \
  --server https://first-server:6443 \
  --token my-shared-secret

# ── k3s MANAGEMENT ───────────────────────────────────────────
kubectl get nodes                   # k3s bundles kubectl
systemctl status k3s                # service status
systemctl restart k3s               # restart server
k3s kubectl get pods -A             # alternative kubectl

# ── UNINSTALL ────────────────────────────────────────────────
/usr/local/bin/k3s-uninstall.sh     # server
/usr/local/bin/k3s-agent-uninstall.sh  # agent

16 / Certificates & PKI
📖 Official Docs: Certificates & PKI Managing TLS CSR Reference PKI Best Practices

Kubernetes PKI & Certificate Management

// Kubernetes uses TLS everywhere for secure communication between all components. Understanding the PKI is essential for troubleshooting, rotating certs, and securing clusters.

🏛
Kubernetes PKI
Certificate Authority Hierarchy
kubeadm creates a PKI under /etc/kubernetes/pki/. Two root CAs: the Kubernetes CA (for cluster components) and the etcd CA (for etcd peers). A front-proxy CA for API aggregation. Each CA signs specific leaf certificates. Self-signed by default — use your own CA for enterprise.
CA Hierarchy/etc/kubernetes/pki
📜
Certificate Files
What Gets Created
Cluster CA: ca.crt / ca.key

API Server: apiserver.crt (SANs: hostname, IPs, DNS), apiserver-kubelet-client.crt, apiserver-etcd-client.crt

etcd: etcd/ca.crt, etcd/server.crt, etcd/peer.crt, etcd/healthcheck-client.crt

Front Proxy: front-proxy-ca.crt, front-proxy-client.crt

SA Keys: sa.key / sa.pub (service account token signing)
1 year expiryAuto-renewed on upgrade
Certificate Expiry
Rotation & Renewal
kubeadm-issued certs expire after 1 year (CA: 10 years). kubeadm upgrade auto-renews certs. Manual renewal: kubeadm certs renew all. Check expiry: kubeadm certs check-expiration. Kubelet client certs auto-rotate when kubelet --rotate-certificates=true (default). Set up monitoring for cert expiry.
1yr ExpiryAuto-RotateMonitor Expiry
🔐
CertificateSigningRequest
Kubernetes CSR API
Kubernetes has a built-in CSR API for issuing certificates. Users/services submit CSR objects, admins approve them (kubectl certificate approve), and Kubernetes signs them with the cluster CA. Used for: adding new users, kubelet bootstrap, custom components needing cluster-trusted certs.
CSR APIkubectl certificate
🤖
cert-manager
Automated Certificate Lifecycle
The de-facto certificate controller for Kubernetes. Issues and renews certs from: Let's Encrypt (ACME), HashiCorp Vault, Venafi, self-signed, or cluster CA issuers. Stores certs as Kubernetes Secrets. Automatically renews before expiry. Used for Ingress TLS, mTLS between services, webhook server certs.
Let's EncryptVaultAuto-Renew
🔑
kubeconfig & User Auth
Client Certificate Auth
kubeconfig contains: cluster CA cert, client cert, client key (or token). Create new user cert: generate key → create CSR → submit K8s CSR → approve → download cert → add to kubeconfig. CN becomes username, O becomes group. Bind to RBAC roles using the username/group.
CN=usernameO=groupkubeconfig
Certificate Operations — Full ReferenceCertificates
All essential certificate management commands: inspection, renewal, rotation, and user creation.
# ── CHECK CERTIFICATE EXPIRY ─────────────────────────────────
kubeadm certs check-expiration

# Manual check with openssl
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -dates
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text \
  | grep -A2 "Subject Alternative"

# Check all certs in /etc/kubernetes/pki
for cert in /etc/kubernetes/pki/*.crt; do
  echo "=== $cert ===";
  openssl x509 -in $cert -noout -subject -dates 2>/dev/null;
done

# ── RENEW ALL CERTIFICATES ───────────────────────────────────
# Renew all control plane certs (run on control-plane node)
kubeadm certs renew all

# Renew specific cert
kubeadm certs renew apiserver
kubeadm certs renew apiserver-kubelet-client
kubeadm certs renew front-proxy-client

# After renewal: restart control plane components
kubectl -n kube-system delete pod -l component=kube-apiserver
kubectl -n kube-system delete pod -l component=kube-controller-manager
kubectl -n kube-system delete pod -l component=kube-scheduler

# Update kubeconfig after renewal
kubeadm kubeconfig user --client-name admin > ~/.kube/config

# ── CREATE A NEW USER WITH CERT AUTH ─────────────────────────
# Step 1: Generate user private key
openssl genrsa -out alice.key 4096

# Step 2: Create CSR (CN=username, O=group)
openssl req -new -key alice.key \
  -subj "/CN=alice/O=team-alpha" \
  -out alice.csr

# Step 3: Submit as Kubernetes CSR object
cat <# Step 4: Approve the CSR
kubectl certificate approve alice

# Step 5: Download signed cert
kubectl get csr alice -o jsonpath='{.status.certificate}' | \
  base64 -d > alice.crt

# Step 6: Add to kubeconfig
kubectl config set-credentials alice \
  --client-certificate=alice.crt \
  --client-key=alice.key \
  --embed-certs=true

kubectl config set-context alice-context \
  --cluster=my-cluster \
  --user=alice \
  --namespace=team-alpha
cert-manager — Issuer & Certificatecert-manager
cert-manager ClusterIssuer with Let's Encrypt + Certificate resource for automatic TLS.
# Install cert-manager
# kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml

# ── ClusterIssuer: Let's Encrypt Production ───────────────────
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
    - http01:      # HTTP-01 challenge via Ingress
        ingress:
          ingressClassName: nginx
    - dns01:       # DNS-01 for wildcard certs
        route53:
          region: us-east-1
          hostedZoneID: YOURZONEID
---
# ── Internal CA Issuer ────────────────────────────────────────
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: internal-ca
spec:
  ca:
    secretName: internal-ca-key-pair
---
# ── Certificate Resource ──────────────────────────────────────
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: app-tls
  namespace: production
spec:
  secretName: app-tls
  duration: 2160h    # 90 days
  renewBefore: 360h  # renew 15 days before expiry
  subject:
    organizations: [MyCompany]
  isCA: false
  privateKey:
    algorithm: RSA
    encoding: PKCS1
    size: 2048
  usages: [server auth, client auth]
  dnsNames:
  - app.example.com
  - www.app.example.com
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
    group: cert-manager.io

17 / Helm
📖 Official Docs: Helm Docs Chart Templates Charts Hooks Artifact Hub Kustomize

Helm — Kubernetes Package Manager

// Helm is the package manager for Kubernetes. Charts are packages of pre-configured Kubernetes resources. Releases track deployed instances. Repositories store and share charts.

📦
Core Concepts
Charts, Releases, Repos
Chart: Package of K8s resource templates + default values. Release: A running instance of a chart in the cluster. Repo: HTTP server hosting chart packages. Values: Configuration injected at install time. Revision: Versioned history of a release for rollbacks.
ChartsReleasesRepositories
📁
Chart Structure
Directory Layout
Chart.yaml: metadata (name, version, appVersion, description, dependencies). values.yaml: default config values. templates/: Go template files for K8s manifests. templates/NOTES.txt: post-install instructions. charts/: sub-chart dependencies. .helmignore: files to exclude from packaging.
Chart.yamlvalues.yamltemplates/
🔧
Templating Engine
Go Templates + Sprig
Templates use Go templating with Sprig function library. Built-in objects: .Values (from values.yaml), .Release (name, namespace, revision), .Chart (metadata), .Files (non-template files), .Capabilities (API versions). Named templates via define/include. Hooks: pre-install, post-install, pre-upgrade, pre-delete.
Go TemplatesSprigHooks
📊
Values Override
Configuration Layers
Values precedence (lowest → highest): chart defaults (values.yaml) → parent chart values → user values file (-f values.yaml) → --set flags. Use values files per environment (values-prod.yaml, values-staging.yaml). --set-string for string type enforcement. --set-file for file contents. Secrets: use helm-secrets plugin.
-f values.yaml--sethelm-secrets
🔄
Release Lifecycle
Install, Upgrade, Rollback
helm install: deploy a chart. helm upgrade: update a release (--install for upsert). helm rollback: revert to previous revision. helm uninstall: remove release + resources. helm history: view revision history. helm status: current state of release. Atomic installs: --atomic rolls back on failure.
upgrade --installrollback--atomic
🧪
Testing & Linting
Chart Quality
helm lint: validate chart for errors. helm template: render templates locally without installing. helm test: run test pods (annotated with helm.sh/hook: test). helm diff plugin: show what would change before upgrade. ct (chart-testing) for CI validation. unittest Helm plugin for TDD of templates.
helm linthelm templatehelm diff
Helm CLI — Complete ReferenceHelm Commands
Essential Helm commands for managing charts, releases, repositories, and debugging.
# ── INSTALLATION ─────────────────────────────────────────────
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
brew install helm   # macOS
helm version

# ── REPOSITORIES ─────────────────────────────────────────────
helm repo add stable https://charts.helm.sh/stable
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update                          # fetch latest chart versions
helm repo list                            # list configured repos
helm repo remove bitnami                  # remove repo
helm search repo nginx                    # search in repos
helm search hub wordpress                 # search artifact hub

# ── INSTALL / DEPLOY ─────────────────────────────────────────
helm install my-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace \
  --version 4.10.0 \
  -f values-prod.yaml \
  --set controller.replicaCount=2

# Upsert (install or upgrade if exists)
helm upgrade --install my-app ./my-chart \
  --namespace production \
  --create-namespace \
  --atomic \
  --timeout 5m \
  -f values.yaml \
  --set image.tag=v1.2.3

# ── INSPECT BEFORE INSTALLING ────────────────────────────────
helm show chart bitnami/postgresql        # chart metadata
helm show values bitnami/postgresql       # all default values
helm template my-release ./my-chart \
  -f values.yaml > rendered.yaml         # render locally
helm install my-release ./my-chart --dry-run --debug

# ── MANAGE RELEASES ──────────────────────────────────────────
helm list -A                              # all releases, all namespaces
helm status my-app -n production          # release status
helm history my-app -n production         # revision history
helm rollback my-app 2 -n production      # rollback to revision 2
helm uninstall my-app -n production       # remove release
helm get values my-app -n production      # get user-supplied values
helm get manifest my-app -n production    # get rendered manifests

# ── CHART DEVELOPMENT ────────────────────────────────────────
helm create my-chart                      # scaffold new chart
helm lint ./my-chart                      # validate chart
helm package ./my-chart                   # create .tgz package
helm push my-chart-1.0.0.tgz oci://registry.example.com/charts  # push to OCI

# ── PLUGINS ──────────────────────────────────────────────────
helm plugin install https://github.com/databus23/helm-diff
helm diff upgrade my-app ./my-chart -f values.yaml  # show diff before upgrade
helm plugin install https://github.com/jkroepke/helm-secrets
helm secrets upgrade my-app ./my-chart -f secrets.enc.yaml
Chart.yaml + values.yamlChart Files
Chart metadata and default values file for a web application chart with dependency management.
# Chart.yaml
apiVersion: v2
name: web-app
description: A Helm chart for web-app microservice
type: application
version: 1.4.2       # chart version (semver)
appVersion: "2.1.0"  # app version (informational)
keywords: [web, api, microservice]
maintainers:
- name: Platform Team
  email: [email protected]
dependencies:
- name: postgresql
  version: "~14.x.x"
  repository: https://charts.bitnami.com/bitnami
  condition: postgresql.enabled
- name: redis
  version: "~18.x.x"
  repository: https://charts.bitnami.com/bitnami
  condition: redis.enabled
---
# values.yaml
replicaCount: 2

image:
  repository: registry.example.com/web-app
  pullPolicy: IfNotPresent
  tag: ""  # overridden by CI with .Chart.AppVersion

service:
  type: ClusterIP
  port: 80
  targetPort: 8080

ingress:
  enabled: true
  className: nginx
  host: app.example.com
  tls: true

resources:
  requests: { cpu: 100m, memory: 128Mi }
  limits:   { cpu: 500m, memory: 256Mi }

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

postgresql:
  enabled: true
  auth:
    database: mydb
    existingSecret: postgres-secret
Helm Template — DeploymentGo Template
A production-quality Helm template for a Deployment using values, conditionals, helpers, and hooks.
{{/* templates/deployment.yaml */}}
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "web-app.fullname" . }}
  namespace: {{ .Release.Namespace }}
  labels:
    {{- include "web-app.labels" . | nindent 4 }}
  annotations:
    app.kubernetes.io/managed-by: {{ .Release.Service }}
    helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "web-app.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      labels:
        {{- include "web-app.selectorLabels" . | nindent 8 }}
      annotations:
        {{/* Force pod restart when configmap changes */}}
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
    spec:
      containers:
      - name: {{ .Chart.Name }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        ports:
        - containerPort: {{ .Values.service.targetPort }}
        {{- with .Values.resources }}
        resources:
          {{- toYaml . | nindent 10 }}
        {{- end }}
        {{- if .Values.envFrom }}
        envFrom:
          {{- toYaml .Values.envFrom | nindent 10 }}
        {{- end }}
---
{{/* templates/_helpers.tpl */}}
{{- define "web-app.fullname" -}}
{{- printf "%s-%s" .Release.Name .Chart.Name | trunc 63 | trimSuffix "-" }}
{{- end }}

{{- define "web-app.labels" -}}
helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version }}
{{ include "web-app.selectorLabels" . }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

{{- define "web-app.selectorLabels" -}}
app.kubernetes.io/name: {{ .Chart.Name }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}

18 / Expert Level
📖 Official Docs: API Priority & Fairness Server-Side Apply Mixed-Version Proxy Node Resource Managers Kubelet Checkpoint Kubernetes Blog

Expert Kubernetes Patterns

// Advanced topics for senior engineers and platform teams: performance tuning, GitOps, multi-cluster, eBPF networking, cost optimization, and production hardening.

🔥
GitOps
Declarative CD with ArgoCD / Flux
GitOps: Git is the single source of truth for cluster state. A GitOps operator continuously reconciles cluster state to match git. ArgoCD: pull-based, rich UI, multi-cluster, App of Apps pattern. Flux v2: GitOps Toolkit, Kustomization + HelmRelease CRDs, image automation. Both support progressive delivery (Argo Rollouts, Flagger).
ArgoCDFlux v2Reconciliation
🌍
Multi-Cluster
Federation & Management
Patterns: Hub-spoke (one management cluster controls workload clusters), fleet management (Cluster API, Rancher Fleet, ArgoCD ApplicationSets). Service mesh federation (Istio multicluster, Cilium Cluster Mesh) for cross-cluster communication. KubeVela, Liqo for workload placement. Submariner for cross-cluster networking.
Cluster APICilium MeshFleet
eBPF Networking
Cilium & Kernel-Level Observability
Cilium uses eBPF to implement networking, security, and observability directly in the Linux kernel — bypassing iptables entirely. Enables: L7 NetworkPolicies (HTTP, gRPC, Kafka-aware), transparent encryption (WireGuard/IPSec), Hubble for network flow visibility, per-call latency metrics, DNS security policies.
eBPFNo iptablesHubble
💰
Cost Optimization
FinOps for Kubernetes
Rightsizing: VPA recommendations, Goldilocks tool. Spot/Preemptible nodes with node pools + PodDisruptionBudgets. Karpenter (AWS) / Cluster Autoscaler for dynamic node provisioning. OpenCost / Kubecost for cost visibility per namespace/team. Bin-packing with balanced resource allocation. Pod topology constraints to avoid cross-AZ traffic costs.
KarpenterKubecostSpot Nodes
🚀
Progressive Delivery
Canary, Blue/Green, A/B
Argo Rollouts: CRD-based Deployment replacement with canary, blue/green, analysis runs (automated rollback if metrics degrade). Flagger: Istio/Linkerd-based canary automation. Gateway API traffic splitting for canary without service mesh. Feature flags (LaunchDarkly, Flagd) for application-level A/B testing independent of deployments.
Argo RolloutsFlaggerAnalysis Runs
📐
Kustomize
Template-Free Config Management
Kustomize uses overlays and patches instead of templating. Base: common resources. Overlays: environment-specific patches (dev, staging, prod). Strategic merge patches, JSON patches, image transformers, namespace transformers. Built into kubectl (kubectl apply -k). Pairs well with Flux and ArgoCD for GitOps.
OverlaysPatcheskubectl -k
🛡
Supply Chain Security
SLSA & Sigstore
SBOM (Software Bill of Materials) generation with Syft. Image signing with Cosign (keyless via Sigstore/Fulcio). Policy enforcement with Kyverno verifyImages or Connaisseur. SLSA framework for build provenance attestations. In-toto for supply chain integrity. Rekor transparency log for tamper-evident signing events.
CosignSBOMSLSA
⚙️
Performance Tuning
Large-Scale Optimization
API server: increase --max-requests-inflight, tune --event-ttl. etcd: SSD storage, tune heartbeat-interval, snapshot-count. Scheduler: tune qps/burst. Kubelet: reduce --sync-frequency, tune image GC. Use node local DNS cache to reduce CoreDNS load. Watch vs List for controllers. Large clusters (>1000 nodes): use Kube-OVN or Calico with BGP.
API Server Tuningetcd SSDDNS Cache
🔒
Zero-Trust Security
Defence in Depth
Workload Identity: SPIFFE/SPIRE for cryptographic workload identity. mTLS everywhere with Istio/Cilium. OPA/Gatekeeper for policy-as-code. Seccomp profiles (RuntimeDefault or custom). AppArmor/SELinux profiles. Falco for runtime threat detection (syscall monitoring). Encrypted etcd at rest. Regular CIS Benchmark scanning with kube-bench.
SPIFFE/SPIREFalcokube-bench
🤝
KEDA
Event-Driven Autoscaling
Kubernetes Event-Driven Autoscaling — extends HPA to scale on external events. 60+ built-in scalers: Kafka lag, RabbitMQ queue depth, Redis lists, AWS SQS/Kinesis, GCP Pub/Sub, Azure Service Bus, Prometheus metrics, cron schedules. Scales to zero (unlike HPA min=0). ScaledObject and ScaledJob CRDs.
Scale to ZeroKafka60+ Scalers
🧱
Platform Engineering
Internal Developer Platform
Building an IDP on top of Kubernetes: Backstage (service catalog, software templates), Crossplane (infrastructure as K8s CRDs — provision cloud resources), Port.io (developer portal), Kratix (platform promises), vcluster for on-demand dev namespaces. Golden paths: self-service templates that encode best practices.
BackstageCrossplaneIDP
🔁
Chaos Engineering
Resilience Testing
Chaos Mesh: CRD-based chaos experiments — PodChaos (kill pods), NetworkChaos (delay/loss/partition), StressChaos (CPU/memory), IOChaos (disk failures), TimeChaos. LitmusChaos: CNCF project with experiment hub. k6: load testing from within cluster. Always run chaos in staging first; gate production chaos behind GameDays.
Chaos MeshLitmusChaosResilience
ArgoCD ApplicationGitOps
ArgoCD Application resource deploying from a Helm chart in git with auto-sync and self-healing.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: web-app-production
  namespace: argocd
  finalizers:
  - resources-finalizer.argocd.argoproj.io
spec:
  project: production
  source:
    repoURL: https://github.com/myorg/k8s-gitops
    targetRevision: HEAD
    path: apps/web-app
    helm:
      valueFiles:
      - values-prod.yaml
      parameters:
      - name: image.tag
        value: v2.1.0
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true     # delete resources removed from git
      selfHeal: true  # revert manual cluster changes
    syncOptions:
    - CreateNamespace=true
    - PrunePropagationPolicy=foreground
    - ApplyOutOfSyncOnly=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        maxDuration: 3m
        factor: 2
  revisionHistoryLimit: 10
Argo Rollouts — CanaryProgressive Delivery
Argo Rollouts canary strategy with automated analysis — rolls back if error rate exceeds threshold.
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: web-app-rollout
  namespace: production
spec:
  replicas: 10
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: myapp:v2.0.0
  strategy:
    canary:
      canaryService: web-app-canary
      stableService: web-app-stable
      trafficRouting:
        nginx:
          stableIngress: web-app-ingress
      steps:
      - setWeight: 5    # 5% traffic to canary
      - pause: {duration: 5m}
      - analysis:      # run automated analysis
          templates:
          - templateName: success-rate
      - setWeight: 20
      - pause: {duration: 10m}
      - setWeight: 50
      - pause: {duration: 10m}
      - setWeight: 100
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
  - name: success-rate
    interval: 1m
    successCondition: result[0] >= 0.95
    failureLimit: 3
    provider:
      prometheus:
        address: http://prometheus:9090
        query: |
          sum(rate(http_requests_total{status!~"5.."}[5m]))
          / sum(rate(http_requests_total[5m]))
KEDA ScaledObjectEvent-Driven Autoscaling
KEDA ScaledObject scaling a consumer Deployment based on Kafka consumer group lag.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: kafka-consumer
  pollingInterval: 15
  cooldownPeriod: 300
  minReplicaCount: 0  # scale to zero!
  maxReplicaCount: 50
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka:9092
      consumerGroup: my-consumer-group
      topic: orders
      lagThreshold: "100"  # 1 replica per 100 messages lag
      offsetResetPolicy: latest
---
# ScaledJob: scale Jobs (not Deployments) for batch processing
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: sqs-processor
spec:
  jobTargetRef:
    template:
      spec:
        containers:
        - name: processor
          image: myprocessor:latest
        restartPolicy: Never
  maxReplicaCount: 100
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123/my-queue
      targetQueueLength: "1"
      awsRegion: us-east-1
Crossplane — Cloud InfrastructurePlatform Engineering
Crossplane Composite Resource Definition — provision an AWS RDS instance as a Kubernetes resource.
# Crossplane lets you provision cloud infra as K8s resources
# Developers request infrastructure via K8s objects

apiVersion: database.example.com/v1alpha1
kind: PostgreSQLInstance
metadata:
  name: my-db
  namespace: production
spec:
  parameters:
    storageGB: 20
    size: db.t3.medium
    version: "15"
  compositionSelector:
    matchLabels:
      provider: aws
      env: production
  writeConnectionSecretToRef:
    name: my-db-conn   # K8s Secret with DB connection string
---
# Kustomize overlay structure example
# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
- configmap.yaml
---
# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: production
namePrefix: prod-
resources:
- ../../base
patches:
- patch: |
    - op: replace
      path: /spec/replicas
      value: 5
  target:
    kind: Deployment
images:
- name: myapp
  newTag: v2.1.0

19 / Service Mesh
📖 Official Docs: Services & Networking Istio Docs Linkerd Docs Cilium Service Mesh Gateway API SIG Istio Security Istio Traffic Mgmt

Service Mesh — Deep Dive

// A service mesh is a dedicated infrastructure layer for handling service-to-service communication. It provides traffic management, security (mTLS), and observability — without changing application code.

// service mesh architecture — data plane vs control plane
⚙ Control Plane (Istiod)
Pilot — service discovery & traffic rules
Citadel — certificate authority (mTLS)
Galley — config validation & distribution
Telemetry — metrics, traces, logs
Webhook — sidecar auto-injection
⬡ Data Plane (Envoy Sidecars)
Envoy Proxy — intercepts all traffic
Inbound listener — incoming requests
Outbound listener — egress routing
mTLS termination & origination
Metrics, traces, access logs
// request flow through mesh
📦
Pod A App
🔀
Sidecar (Envoy) Outbound
🔐
mTLS Encrypt
🔐
mTLS Decrypt
🔀
Sidecar (Envoy) Inbound
📦
Pod B App

Core Concepts

🔐
Mutual TLS (mTLS)
Zero-Trust Service Identity
Both client and server authenticate each other using X.509 certificates. Certificates issued automatically by the mesh CA (Citadel/Istiod). Modes: DISABLE, PERMISSIVE (accepts both TLS and plaintext — migration mode), STRICT (TLS only). Identity based on SPIFFE/SPIRE — URI SANs tied to ServiceAccount. Eliminates need for application-level auth between services.
STRICT modeSPIFFEAuto Cert RotationPERMISSIVE
🔀
Traffic Management
Fine-Grained Routing Control
Route traffic based on: HTTP headers, URI, method, weight (canary), source labels. VirtualService defines routing rules. DestinationRule defines subsets (versions) and load balancing policy. Timeout and retry configuration at the mesh level — no code changes. Supports A/B testing, canary deployments, blue/green deployments, mirroring (shadow traffic).
VirtualServiceDestinationRuleCanaryMirroring
🛡
Resilience Patterns
Circuit Breaking & Retries
Retries: Automatic retry on 5xx errors, configurable attempts and retry conditions. Timeout: Per-route request timeout. Circuit Breaker: Outlier detection ejects unhealthy hosts from load balancing pool. Bulkhead: Connection pool limits prevent cascade failures. All configured in DestinationRule without code changes.
Circuit BreakerRetriesTimeoutOutlier Detection
📊
Observability
Automatic Telemetry
Every sidecar automatically emits: L7 metrics (request rate, error rate, latency percentiles — the Golden Signals), distributed traces (Zipkin/Jaeger compatible), and access logs. No application instrumentation needed. Kiali — service mesh topology UI. Grafana dashboards included in istio addons. Trace context propagated via B3 or W3C headers.
Golden SignalsKialiAuto TracingZero Instrumentation
🌐
Ingress & Egress Gateways
Mesh Edge Traffic
IngressGateway: Entry point for external traffic into the mesh — replaces traditional Ingress. Terminates TLS, enforces policies before traffic enters mesh. EgressGateway: Controls all outbound traffic leaving the mesh to external services. Enforce policy: which services can access external APIs. Register external services with ServiceEntry.
GatewayServiceEntryEgress Control
🔑
Authorization Policies
L7 Access Control
Fine-grained L4-L7 access control based on service identity, request properties. Define: which source (from), what target (to), which conditions (when). Example: allow only frontend to call backend on /api/* with GET. Works alongside Kubernetes RBAC (different layer). PeerAuthentication for mTLS mode per namespace/workload.
AuthorizationPolicyPeerAuthenticationL7 ACL

Service Mesh Comparison

// Major service mesh implementations — choose based on complexity tolerance, performance needs, and feature requirements.

Mesh Data Plane Architecture Strengths Trade-offs Best For
Istio Envoy Sidecar + Istiod control plane Most features, rich traffic mgmt, large community, Gateway API support High resource overhead, complexity, steep learning curve Large enterprises needing full feature set
Linkerd Rust proxy Sidecar + linkerd-control-plane Ultra-lightweight, simple install, excellent performance, CNCF graduated Fewer advanced features than Istio, no Envoy ecosystem Teams wanting simplicity and low overhead
Cilium eBPF No sidecar — kernel-level eBPF Zero sidecar overhead, highest performance, L3-L7, NetworkPolicy, Gateway API Requires Linux kernel ≥5.10, newer project Performance-critical, CNI + mesh in one
Consul Connect Envoy Sidecar + Consul server Multi-platform (VMs + K8s), HashiCorp ecosystem, service catalog Requires Consul cluster, more ops burden Hybrid cloud / VM + Kubernetes environments
AWS App Mesh Envoy Sidecar + AWS managed CP Native AWS integration, managed control plane, no CP ops AWS-only, less flexible, fewer features AWS-native teams wanting managed option
Kuma / Kong Envoy Sidecar or sidecarless Multi-zone mesh, universal (K8s + VMs), Kong ecosystem integration Smaller community than Istio Multi-zone deployments, Kong API Gateway users

Feature Deep Dive

🔄
Load Balancing Algorithms
Mesh-level LB goes beyond kube-proxy's basic round-robin:
  • ROUND_ROBIN — default, even distribution
  • LEAST_CONN — route to least active connection
  • RANDOM — random endpoint selection
  • PASSTHROUGH — forward to original destination
  • Consistent hash — sticky sessions (cookie, header, source IP)
  • Locality-aware — prefer same zone/region
🕶
Traffic Mirroring (Shadow)
Send a copy of live traffic to a shadow service for testing:
  • 100% of live requests duplicated to shadow
  • Shadow responses are discarded (fire-and-forget)
  • Test new versions with real traffic — no user impact
  • Validate performance, correctness, error rates
  • Configured via VirtualService mirror field
  • mirrorPercentage to control mirrored volume
🎚
Canary & Traffic Splitting
Gradually shift traffic between service versions:
  • Weight-based: 90% v1 → 10% v2
  • Header-based: route beta users to v2
  • Progressive delivery with Flagger + Argo Rollouts
  • Automated rollback on metric degradation
  • A/B testing by user segment
  • Blue/green with instant cutover
🔍
Fault Injection
Deliberately inject failures to test resilience:
  • HTTP delay injection (artificial latency)
  • HTTP abort injection (return 5xx errors)
  • Test circuit breaker behavior
  • Chaos engineering without Chaos tools
  • Configured per-route in VirtualService
  • Percentage-based — affect % of requests
🏷
Service Entry
Register external services inside the mesh:
  • Add external APIs (AWS S3, Stripe, etc.) to mesh
  • Apply mTLS, retries, timeouts to external calls
  • Block all external traffic — allowlist via ServiceEntry
  • MESH_INTERNAL vs MESH_EXTERNAL location
  • Supports TCP, HTTP, HTTPS, gRPC protocols
  • Enable Egress Gateway enforcement
🏗
Sidecar Resource
Limit sidecar proxy scope to reduce memory and CPU:
  • By default Envoy holds config for ALL services
  • Sidecar resource scopes proxy to only needed services
  • Reduces config push size — faster convergence
  • Define egress hosts the workload actually calls
  • Critical for large clusters (1000+ services)
  • Namespace-scoped or workload-specific

Istio YAML Examples

VirtualService — Traffic Splitting Traffic Management
Route 90% traffic to stable v1, 10% to canary v2, with header-based override for testers.
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: web-app-vs
  namespace: production
spec:
  hosts:
  - web-app
  http:
  - # Testers with X-Canary header go to v2
    match:
    - headers:
        x-canary:
          exact: "true"
    route:
    - destination:
        host: web-app
        subset: v2
  - # Everyone else: 90/10 split
    route:
    - destination:
        host: web-app
        subset: v1
      weight: 90
    - destination:
        host: web-app
        subset: v2
      weight: 10
    timeout: 30s
    retries:
      attempts: 3
      perTryTimeout: 10s
      retryOn: gateway-error,connect-failure,retriable-4xx
    mirror:
      host: web-app-shadow
      subset: v2
    mirrorPercentage:
      value: 10.0  # Mirror 10% to shadow
DestinationRule — Circuit Breaker Resilience
Define subsets (v1/v2) and circuit breaker with connection pool limits and outlier detection.
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: web-app-dr
  namespace: production
spec:
  host: web-app
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 50
        http2MaxRequests: 1000
        maxRequestsPerConnection: 10
    outlierDetection:  # Circuit breaker
      consecutiveGatewayErrors: 5
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
    loadBalancer:
      simple: LEAST_CONN
  subsets:
  - name: v1
    labels:
      version: v1
    trafficPolicy:
      loadBalancer:
        simple: ROUND_ROBIN
  - name: v2
    labels:
      version: v2
    trafficPolicy:
      loadBalancer:
        consistentHash:
          httpHeaderName: x-user-id  # Sticky by user
PeerAuthentication & AuthorizationPolicy Security
Enforce strict mTLS for a namespace and define L7 access control — only frontend can call backend.
# Enforce STRICT mTLS for entire namespace
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT
---
# Allow frontend → backend on /api/* GET only
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: backend-authz
  namespace: production
spec:
  selector:
    matchLabels:
      app: backend
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - cluster.local/ns/production/sa/frontend-sa
    to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/api/*"]
---
# Deny all other traffic to backend (default deny)
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: backend-deny-all
  namespace: production
spec:
  selector:
    matchLabels:
      app: backend
  action: DENY
  rules:
  - from:
    - source:
        notPrincipals:
        - cluster.local/ns/production/sa/frontend-sa
Istio Gateway + ServiceEntry Edge Traffic
Ingress Gateway for external traffic entry and ServiceEntry to allow egress to an external API.
# Istio Ingress Gateway
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: main-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: app-tls-cert
    hosts:
    - app.example.com
  - port:
      number: 80
      name: http
      protocol: HTTP
    tls:
      httpsRedirect: true  # Force HTTPS
    hosts:
    - app.example.com
---
# Allow egress to external Stripe API
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: stripe-api
  namespace: production
spec:
  hosts:
  - api.stripe.com
  ports:
  - number: 443
    name: https
    protocol: HTTPS
  resolution: DNS
  location: MESH_EXTERNAL
Fault Injection Testing Chaos / Resilience
Inject latency and HTTP errors into requests to test application resilience and circuit breaker behavior.
# Inject 3s delay for 10% of requests to ratings service
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: ratings-fault-injection
spec:
  hosts:
  - ratings
  http:
  - fault:
      delay:
        percentage:
          value: 10.0
        fixedDelay: 3s
      abort:
        percentage:
          value: 5.0
        httpStatus: 503
    route:
    - destination:
        host: ratings
        subset: v1
---
# Sidecar resource scoping for large clusters
apiVersion: networking.istio.io/v1alpha3
kind: Sidecar
metadata:
  name: web-app-sidecar
  namespace: production
spec:
  workloadSelector:
    labels:
      app: web-app
  egress:
  - hosts:
    - ./backend      # same namespace
    - ./postgres
    - istio-system/*
    - monitoring/prometheus
    # Only these — Envoy won't load other services
Linkerd — Annotations & Profile Linkerd
Linkerd mesh injection via annotations and ServiceProfile for per-route metrics and retries.
# Enable Linkerd injection for a namespace
apiVersion: v1
kind: Namespace
metadata:
  name: production
  annotations:
    linkerd.io/inject: enabled
---
# Per-pod injection control
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  template:
    metadata:
      annotations:
        linkerd.io/inject: enabled
        config.linkerd.io/proxy-cpu-request: "10m"
        config.linkerd.io/proxy-memory-request: "20Mi"
---
# ServiceProfile for per-route observability & retries
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: web-app.production.svc.cluster.local
  namespace: production
spec:
  routes:
  - name: GET /api/users
    condition:
      method: GET
      pathRegex: /api/users.*
    isRetryable: true
    timeout: 30s
  - name: POST /api/orders
    condition:
      method: POST
      pathRegex: /api/orders
    isRetryable: false  # Non-idempotent
    timeout: 60s

When to Use a Service Mesh

Scenario Without Mesh With Mesh Verdict
mTLS between services Manual cert management per service Automatic, zero-config mTLS USE MESH
Canary deployments Needs two services + Ingress hacks VirtualService weight split USE MESH
Distributed tracing Instrument every app with SDK Automatic from sidecar USE MESH
Small cluster (1–5 services) Simple, low overhead Adds 50–100ms latency, complexity SKIP MESH
Circuit breaking Implement per-service (Hystrix, Resilience4j) One DestinationRule for all USE MESH
Compliance (SOC2, PCI-DSS) Hard to prove in-transit encryption mTLS + audit logs prove it USE MESH
Resource-constrained edge/IoT Direct pod communication Sidecar doubles memory per pod SKIP MESH

20 / Kubernetes Plugins
📖 Official Docs: kubectl Plugins Krew Docs Plugin Index kubectl Reference

kubectl Plugins & Krew Ecosystem

// kubectl plugins extend the CLI with new commands. Krew is the official plugin manager — over 200 community plugins available. Any executable named kubectl-* in your PATH becomes a kubectl subcommand.

// how kubectl plugins work
① Discovery
kubectl scans every directory in your $PATH for executables starting with kubectl-. No registration needed.
② Naming
File kubectl-ns-switch → command kubectl ns-switch. Hyphens in filenames become spaces in the command. Must be executable.
③ Language
Write plugins in any language — Bash, Python, Go, Ruby. Go is most common (uses client-go). Shell scripts great for simple wrappers.
④ Krew
Package manager for plugins. Cross-platform. Curated index with 200+ plugins. Auto-update. kubectl krew install <name>
Krew — Install & Basic Usage Plugin Manager
Install Krew, then discover and manage kubectl plugins from the community index.
# ── INSTALL KREW (macOS/Linux) ────────────────────────────────
(
  set -x; cd "$(mktemp -d)" &&
  OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
  ARCH="$(uname -m | sed 's/x86_64/amd64/;s/arm.*/arm/;s/aarch64$/arm64/')" &&
  KREW="krew-${OS}_${ARCH}" &&
  curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
  tar zxvf "${KREW}.tar.gz" &&
  ./"${KREW}" install krew
)

# Add to shell profile (.bashrc / .zshrc)
export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"

# ── KREW COMMANDS ─────────────────────────────────────────────
kubectl krew version                    # show krew version
kubectl krew update                     # update plugin index
kubectl krew search                     # list all available plugins
kubectl krew search <keyword>           # search plugins by keyword
kubectl krew info <plugin>              # details about a plugin
kubectl krew install <plugin>           # install a plugin
kubectl krew install <p1> <p2> <p3>    # install multiple at once
kubectl krew upgrade                    # upgrade all installed plugins
kubectl krew upgrade <plugin>           # upgrade specific plugin
kubectl krew uninstall <plugin>         # remove a plugin
kubectl krew list                       # list installed plugins

# ── DISCOVER ALL PLUGINS (without krew) ──────────────────────
kubectl plugin list                     # show all plugins in PATH

# ── INSTALL ESSENTIAL PLUGINS IN ONE GO ──────────────────────
kubectl krew install \
  ctx ns stern neat tree \
  who-can access-matrix rbac-view \
  resource-capacity node-shell \
  images outdated \
  konfig view-secret \
  doctor popeye

Essential Plugin Categories

// Context, namespace & cluster navigation

🔀
kubectx / ctx
Fastest way to switch between Kubernetes contexts (clusters). Fuzzy search supported. Works with fzf for interactive selection.
  • kubectl ctx — list all contexts
  • kubectl ctx prod — switch to prod
  • kubectl ctx - — switch to previous
  • kubectl ctx -d old-ctx — delete context
  • Install: krew install ctx
📁
kubens / ns
Instant namespace switching with auto-completion. Sets the default namespace so you don't need -n on every command.
  • kubectl ns — list all namespaces
  • kubectl ns production — switch namespace
  • kubectl ns - — switch to previous ns
  • Pairs perfectly with kubectx
  • Install: krew install ns
🗂
konfig
Merge, split, and import kubeconfig files. Essential when working with many clusters. Safely combines configs without conflicts.
  • kubectl konfig merge a.yaml b.yaml
  • kubectl konfig split — split into files
  • kubectl konfig import --save cfg.yaml
  • Non-destructive — validates before merging
  • Install: krew install konfig

// Logs, debugging & troubleshooting

📜
stern
Multi-pod log tailing with color coding per pod. Regex filtering. Tail logs from all pods matching a pattern simultaneously.
  • stern web-app — tail all pods matching name
  • stern . -n prod — all pods in namespace
  • stern web --since 15m — last 15 minutes
  • stern web -c sidecar — specific container
  • Install: krew install stern
🌳
tree
Show ownership hierarchy of Kubernetes objects. Visualise which resources belong to a Deployment/ReplicaSet/Pod chain.
  • kubectl tree deploy web-app
  • Shows: Deployment → ReplicaSet → Pods
  • Works with any resource (StatefulSet, Job…)
  • Shows owner references as a tree
  • Install: krew install tree
🔬
neat
Remove clutter from kubectl output. Strips managed fields, status, and default values to make YAML readable and reusable.
  • kubectl get pod web -o yaml | kubectl neat
  • Removes: managedFields, creationTimestamp
  • Removes: status, default annotations
  • Perfect for exporting clean manifests
  • Install: krew install neat
🏥
doctor
Cluster health checker — scans for common misconfigurations, missing resources, and best-practice violations.
  • kubectl doctor — full cluster scan
  • Checks: deprecated API versions
  • Checks: pods without resource limits
  • Checks: services without endpoints
  • Install: krew install doctor
💥
node-shell
Open an interactive shell directly on a Kubernetes node — without SSH. Spins up a privileged pod with host PID/network access.
  • kubectl node-shell node1
  • Uses a privileged DaemonSet pod
  • Full node access: systemd, journalctl, crictl
  • Auto-cleans up the debug pod after exit
  • Install: krew install node-shell
🔎
popeye
Live Kubernetes cluster sanitizer. Scans resources and reports issues with graded severity across your whole cluster or namespace.
  • kubectl popeye — full scan with report
  • kubectl popeye -n production
  • Grades: A (clean) to F (critical issues)
  • Checks: resource limits, probes, RBAC, images
  • Install: krew install popeye

// Security, RBAC & access auditing

🔐
who-can
Show which users and ServiceAccounts can perform specific actions. Essential for RBAC auditing and security reviews.
  • kubectl who-can get pods
  • kubectl who-can delete secrets -n prod
  • kubectl who-can create deployments
  • Shows: users, groups, serviceaccounts
  • Install: krew install who-can
🗝
access-matrix
Show an RBAC access matrix for all resources and verbs for a user or ServiceAccount. Visual permission grid.
  • kubectl access-matrix
  • kubectl access-matrix --sa mysa -n prod
  • Grid: resources × verbs (get/list/create…)
  • Color coded: ✓ allowed, ✗ denied
  • Install: krew install access-matrix
📊
rbac-view
Web UI for visualizing RBAC permissions in your cluster. Launches a local server with an interactive permission explorer.
  • kubectl rbac-view — opens browser UI
  • Visual graph of Role → Subject bindings
  • Filter by namespace, subject, or resource
  • Export reports as HTML
  • Install: krew install rbac-view
🕵
view-secret
Decode and view Kubernetes Secret values directly — no manual base64 decoding required.
  • kubectl view-secret my-secret
  • kubectl view-secret my-secret key
  • Automatically base64 decodes all values
  • Shows all keys in a secret at once
  • Install: krew install view-secret

// Resources, capacity & image management

📈
resource-capacity
Overview of resource requests, limits, and live utilization per node and pod. Better than kubectl top.
  • kubectl resource-capacity — node overview
  • kubectl resource-capacity --pods
  • kubectl resource-capacity --util — live usage
  • Shows: requests, limits, % utilization
  • Install: krew install resource-capacity
🖼
images
List all container images running across your cluster. Filter by namespace, show image digests, or list unique images only.
  • kubectl images — all images cluster-wide
  • kubectl images -n prod
  • kubectl images --no-trunc — full names
  • Great for auditing image versions
  • Install: krew install images
outdated
Scan your cluster for containers running outdated images. Checks Docker Hub and other registries for newer tags.
  • kubectl outdated — full cluster scan
  • Shows: current tag vs latest available
  • Flags: patch / minor / major updates
  • Supports private registries with credentials
  • Install: krew install outdated
🔄
pv-migrate
Migrate PersistentVolumeClaim data between storage classes, namespaces, or clusters. Uses rsync under the hood.
  • kubectl pv-migrate migrate src-pvc dst-pvc
  • Cross-namespace migration supported
  • Cross-cluster migration with kubeconfig
  • Handles live vs stopped workloads
  • Install: krew install pv-migrate

// Networking, certificates & cluster tools

🌐
ingress-nginx
Interact with and inspect NGINX Ingress controller deployments. Debug routing, backends, and configuration.
  • kubectl ingress-nginx backends
  • kubectl ingress-nginx conf --host foo.com
  • kubectl ingress-nginx logs
  • kubectl ingress-nginx exec -- nginx -T
  • Install: krew install ingress-nginx
🔒
cert-manager
Interact with cert-manager resources directly. Trigger renewals, inspect certificate status, debug issuance failures.
  • kubectl cert-manager status certificate tls
  • kubectl cert-manager renew tls-cert
  • kubectl cert-manager inspect secret tls
  • Shows: expiry, issuer, SANs, renewal status
  • Install: krew install cert-manager
🗺
np-viewer
Visualize NetworkPolicy rules in a human-readable format. Explains what traffic is allowed or blocked for selected pods.
  • kubectl np-viewer -n production
  • Shows ingress and egress rules per pod
  • Highlights overlapping / conflicting policies
  • Exports as diagram or text table
  • Install: krew install np-viewer
📋
view-cert
Decode and display TLS certificate details stored in Kubernetes Secrets. Inspect expiry, issuer, and SANs without openssl.
  • kubectl view-cert my-tls-secret
  • Shows: subject, issuer, SANs, expiry date
  • Warns if cert expires soon
  • Works with any kubernetes.io/tls secret
  • Install: krew install view-cert

Writing Your Own Plugin

// Plugins can be simple shell scripts or full Go binaries. Any executable named kubectl-* in your PATH works instantly.

Shell Script Plugin Bash Plugin
A simple bash plugin: kubectl podfull — shows detailed pod info with node, IP, age, and resource usage in one view.
#!/usr/bin/env bash
# File: kubectl-podfull  (chmod +x, place in PATH)
# Usage: kubectl podfull [namespace]

set -euo pipefail

NS="${1:--A}"  # default: all namespaces
FLAG="--all-namespaces"
[ "$NS" != "-A" ] && FLAG="-n $NS"

echo ""
printf "%-50s %-15s %-15s %-8s %-8s\n" \
  "POD" "NODE" "IP" "STATUS" "RESTARTS"
echo "$(printf '─%.0s' {1..100})"

kubectl get pods $FLAG \
  -o custom-columns=\
'NAME:.metadata.name,
NODE:.spec.nodeName,
IP:.status.podIP,
STATUS:.status.phase,
RESTARTS:.status.containerStatuses[0].restartCount' \
  --sort-by='.status.phase' 2>/dev/null

echo ""
echo "Resource usage:"
kubectl top pods $FLAG 2>/dev/null || \
  echo "(metrics-server not available)"
Python Plugin — ns-cleanup Python Plugin
Python plugin to find and optionally delete completed/failed pods and evicted pods from a namespace.
#!/usr/bin/env python3
# File: kubectl-ns-cleanup  (chmod +x, place in PATH)
# Usage: kubectl ns-cleanup [namespace] [--delete]

import subprocess, sys, json

ns = sys.argv[1] if len(sys.argv) > 1 else "default"
do_delete = "--delete" in sys.argv

result = subprocess.run(
    ["kubectl", "get", "pods", "-n", ns,
     "-o", "json"],
    capture_output=True, text=True
)
pods = json.loads(result.stdout)["items"]

to_clean = []
for pod in pods:
    phase = pod["status"].get("phase", "")
    name  = pod["metadata"]["name"]
    # Find completed, failed, or evicted pods
    if phase in ("Succeeded", "Failed"):
        to_clean.append(name)
    reason = pod["status"].get("reason", "")
    if reason == "Evicted":
        to_clean.append(name)

print(f"Found {len(to_clean)} pods to clean in '{ns}':")
for p in to_clean:
    print(f"  - {p}")

if do_delete and to_clean:
    for p in to_clean:
        subprocess.run(
            ["kubectl", "delete", "pod", p, "-n", ns],
            check=True
        )
    print(f"\n✓ Deleted {len(to_clean)} pods")
Go Plugin — kubectl-whoami Go Plugin
Go-based plugin using client-go to show who you are authenticated as and your permissions summary.
// File: kubectl-whoami/main.go
// Build: go build -o kubectl-whoami && mv to PATH

package main

import (
    "context"
    "fmt"
    "os"

    authv1 "k8s.io/api/authorization/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/client-go/kubernetes"
    "k8s.io/client-go/tools/clientcmd"
)

func main() {
    kubeconfig := os.Getenv("KUBECONFIG")
    if kubeconfig == "" {
        kubeconfig = os.Getenv("HOME") + "/.kube/config"
    }

    config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
    if err != nil { panic(err) }

    client, err := kubernetes.NewForConfig(config)
    if err != nil { panic(err) }

    // SelfSubjectReview — who am I?
    review, err := client.AuthenticationV1().
        SelfSubjectReviews().
        Create(context.TODO(),
            &authv1.SelfSubjectReview{},
            metav1.CreateOptions{})
    if err != nil { panic(err) }

    fmt.Printf("Username : %s\n",
        review.Status.UserInfo.Username)
    fmt.Printf("Groups   : %v\n",
        review.Status.UserInfo.Groups)
    fmt.Printf("UID      : %s\n",
        review.Status.UserInfo.UID)
}
Plugin Manifest for Krew Krew Distribution
Krew plugin manifest (.yaml) to package and distribute your plugin on the Krew index for all platforms.
apiVersion: krew.googlecontainertools.github.com/v1alpha2
kind: Plugin
metadata:
  name: podfull
spec:
  version: v1.0.0
  homepage: https://github.com/myuser/kubectl-podfull
  shortDescription: Show detailed pod info with nodes and resources
  description: |
    kubectl-podfull displays comprehensive pod information
    including node placement, IPs, status, restart counts,
    and live resource usage in a single command.
  platforms:
  - selector:
      matchLabels:
        os: linux
        arch: amd64
    uri: https://github.com/myuser/kubectl-podfull/releases/download/v1.0.0/kubectl-podfull_linux_amd64.tar.gz
    sha256: abc123...
    bin: kubectl-podfull
  - selector:
      matchLabels:
        os: darwin
        arch: amd64
    uri: https://github.com/myuser/kubectl-podfull/releases/download/v1.0.0/kubectl-podfull_darwin_amd64.tar.gz
    sha256: def456...
    bin: kubectl-podfull
  - selector:
      matchLabels:
        os: windows
        arch: amd64
    uri: https://github.com/myuser/kubectl-podfull/releases/download/v1.0.0/kubectl-podfull_windows_amd64.zip
    sha256: ghi789...
    bin: kubectl-podfull.exe

Plugin Quick Reference

Plugin Category Key Command What It Solves Install
ctx / kubectx NAVIGATION kubectl ctx prod Fast cluster/context switching krew install ctx
ns / kubens NAVIGATION kubectl ns staging Fast namespace switching krew install ns
stern LOGS stern web-app -n prod Multi-pod log tailing krew install stern
neat DEBUG kubectl get pod x -oyaml | kubectl neat Clean up noisy kubectl YAML output krew install neat
tree DEBUG kubectl tree deploy app Visualize resource ownership tree krew install tree
popeye AUDIT kubectl popeye -n prod Cluster health & best-practice scan krew install popeye
who-can SECURITY kubectl who-can delete pods RBAC: who can do what krew install who-can
access-matrix SECURITY kubectl access-matrix --sa mysa Full RBAC permission grid krew install access-matrix
view-secret SECURITY kubectl view-secret my-secret Decode Secrets without manual base64 krew install view-secret
resource-capacity RESOURCES kubectl resource-capacity --util CPU/memory requests, limits & usage krew install resource-capacity
node-shell DEBUG kubectl node-shell node1 SSH-less shell on any node krew install node-shell
images RESOURCES kubectl images -n prod List all container images in cluster krew install images
outdated RESOURCES kubectl outdated Detect stale/outdated container images krew install outdated
konfig NAVIGATION kubectl konfig merge a.yaml b.yaml Merge & manage kubeconfig files krew install konfig
np-viewer NETWORKING kubectl np-viewer -n prod Visualize NetworkPolicy rules krew install np-viewer
cert-manager SECURITY kubectl cert-manager renew cert Manage cert-manager certificates krew install cert-manager
pv-migrate STORAGE kubectl pv-migrate migrate src dst Migrate PVC data between namespaces/clusters krew install pv-migrate
ingress-nginx NETWORKING kubectl ingress-nginx backends Debug NGINX Ingress configuration krew install ingress-nginx

21 / CRD — Custom Resource Definitions
📖 Official Docs: Custom Resources CRD Tasks CRD Versioning CEL in Kubernetes Kubebuilder Book Operator SDK OLM

Custom Resource Definitions (CRD)

// CRDs let you extend the Kubernetes API with your own resource types. Once registered, your custom objects are stored in etcd, managed by kubectl, protected by RBAC, and can drive custom controllers — exactly like built-in resources.

// CRD lifecycle — from definition to running controller
📐
① Define CRD
Register schema with API server via kubectl apply
🌐
② New API Endpoint
API server exposes /apis/group/version/plural
📦
③ Create CR Instances
kubectl apply custom objects — stored in etcd
🔄
④ Controller Watches
Controller reconcile loop reacts to CR changes
⑤ Desired State
Controller creates/updates/deletes child resources

CRD Core Concepts

📐
CustomResourceDefinition
Schema Registration
A CRD is itself a Kubernetes resource that tells the API server about a new type. It defines: the group (mycompany.io), versions (v1, v1alpha1), plural/singular/kind names, scope (Namespaced or Cluster), and an OpenAPI v3 JSON Schema for validation. Once applied, the new type is immediately usable.
apiextensions.k8s.io/v1OpenAPI v3etcd backed
📦
Custom Resource (CR)
Instances of Your Type
Once a CRD is registered, you create Custom Resources — instances of that type. They behave exactly like built-in resources: kubectl get, describe, delete, label, annotate all work. They are namespaced (or cluster-scoped), RBAC-protected, and watch-able by controllers. The spec is defined by you; status is updated by your controller.
kubectl get mytypeRBAC protectedWatchable
Schema Validation
OpenAPI v3 Structural Schema
CRDs use OpenAPI v3 JSON Schema to validate Custom Resource fields at admission time. Supports: type, enum, pattern, minimum/maximum, required fields, default values, nullable, x-kubernetes-int-or-string. Validation is enforced server-side — invalid CRs are rejected by the API server before reaching etcd.
required fieldsenumdefaultsServer-side
📊
Status Subresource
Spec vs Status Separation
Enabling the status subresource creates a separate /status endpoint. Users update spec; controllers update status. This separation prevents users from accidentally overwriting controller-managed state. Controllers use client.Status().Update(). Printer columns can surface status fields in kubectl output.
spec writestatus writeSeparation
📋
Printer Columns
kubectl get Output
additionalPrinterColumns customize what kubectl get shows. Use JSONPath to extract fields from spec or status. Built-in: NAME, NAMESPACE, AGE. Add Phase, Replicas, Version, Ready. type: string, integer, boolean, date. priority: 0 = default view, >0 = wide only (-o wide).
JSONPathCustom columns-o wide
🔢
Versioning
API Evolution
CRDs support multiple versions simultaneously. One is the storage version. Conversion webhooks handle translating between versions. Use CEL validation rules (x-kubernetes-validations) for cross-field validation since v1.25+. Mark old versions deprecated before removal. Follow: v1alpha1 → v1beta1 → v1.
Conversion WebhookStorage VersionCEL Rules
CEL Validation Rules
Cross-Field Validation (v1.25+)
Common Expression Language rules allow complex cross-field validation inside the CRD schema — no webhook needed. Defined with x-kubernetes-validations. Access self (current object) and oldSelf (for update rules). Example: ensure spec.maxReplicas >= spec.minReplicas. Runs in the API server process — fast and reliable.
x-kubernetes-validationsNo webhookself / oldSelf
🔄
Scale Subresource
HPA Integration
Enabling scale subresource exposes a /scale endpoint. Allows kubectl scale to work on your CRD and lets HPA automatically scale it. Define specReplicasPath and statusReplicasPath to map to your fields. Optional labelSelectorPath for pod selection by HPA.
HPA compatiblekubectl scale/scale endpoint
🏗
Controller / Operator Pattern
Reconciliation Loop
A CRD without a controller is just stored data. Controllers bring CRDs to life: watch CR events, compare current vs desired state, act to reconcile. Built with kubebuilder or Operator SDK. The reconcile loop is the heart of every Kubernetes Operator — observe, compare, act, report.
kubebuildercontroller-runtimeReconcile loop

CRD Anatomy — Field by Field

FieldLocationRequiredDescriptionExample
groupspecYESAPI group — use a domain you own. Reverse DNS style.mycompany.io
versions[].namespec.versionsYESVersion string. Follow: v1alpha1 → v1beta1 → v1v1, v1beta1
versions[].servedspec.versionsYESWhether this version is served by the API. False = deprecated.true / false
versions[].storagespec.versionsYESExactly ONE version must be the storage version.true (only one)
openAPIV3Schemaspec.versions[].schemaYESFull structural schema for validation. Required for all served versions.type: object, properties: ...
scopespecYESNamespaced (per ns) or Cluster (global like Nodes/PVs).Namespaced / Cluster
names.kindspec.namesYESCamelCase singular name used in YAML kind: field.Database
names.pluralspec.namesYESLowercase plural — used in URL path and kubectl get.databases
names.shortNamesspec.namesNOShort aliases for kubectl. Like po=pods, svc=services.["db", "dbs"]
names.categoriesspec.namesNOGroup into categories. kubectl get all uses "all" category.["all"]
subresources.statusspec.versions[].subresourcesNOEnables separate /status endpoint. Best practice for all CRDs.status: {}
subresources.scalespec.versions[].subresourcesNOEnables HPA + kubectl scale support.specReplicasPath, statusReplicasPath
additionalPrinterColumnsspec.versions[]NOCustom kubectl get columns via JSONPath.name: Phase, jsonPath: .status.phase
x-kubernetes-validationsschema propertiesNOCEL rules for cross-field validation (v1.25+).rule: self.max >= self.min
conversion.strategyspec.conversionNONone (no conversion) or Webhook (call conversion webhook).Webhook

CRD YAML Examples

Full CRD — WebApp Resource CRD Definition
Complete production CRD with schema validation, CEL rules, status + scale subresources, and printer columns.
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: webapps.apps.mycompany.io  # plural.group
spec:
  group: apps.mycompany.io
  scope: Namespaced
  names:
    plural: webapps
    singular: webapp
    kind: WebApp
    shortNames: [wa]
    categories: [all, mycompany]
  versions:
  - name: v1
    served: true
    storage: true
    subresources:
      status: {}
      scale:
        specReplicasPath: .spec.replicas
        statusReplicasPath: .status.readyReplicas
        labelSelectorPath: .status.selector
    additionalPrinterColumns:
    - name: Replicas
      type: integer
      jsonPath: .spec.replicas
    - name: Ready
      type: integer
      jsonPath: .status.readyReplicas
    - name: Phase
      type: string
      jsonPath: .status.phase
    - name: Image
      type: string
      jsonPath: .spec.image
      priority: 1  # Only with -o wide
    - name: Age
      type: date
      jsonPath: .metadata.creationTimestamp
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required: ["image"]
            x-kubernetes-validations:
            - rule: self.maxReplicas >= self.minReplicas
              message: maxReplicas must be >= minReplicas
            - rule: self.replicas >= self.minReplicas && self.replicas <= self.maxReplicas
              message: replicas must be between min and max
            properties:
              image:
                type: string
              replicas:
                type: integer
                minimum: 0
                maximum: 100
                default: 1
              minReplicas:
                type: integer
                default: 1
              maxReplicas:
                type: integer
                default: 10
              port:
                type: integer
                minimum: 1
                maximum: 65535
                default: 8080
              env:
                type: array
                items:
                  type: object
                  required: ["name", "value"]
                  properties:
                    name:
                      type: string
                    value:
                      type: string
              ingress:
                type: object
                properties:
                  enabled:
                    type: boolean
                    default: false
                  host:
                    type: string
                  tlsEnabled:
                    type: boolean
                    default: true
          status:
            type: object
            properties:
              phase:
                type: string
                enum: [Pending, Running, Degraded, Failed]
              readyReplicas:
                type: integer
              selector:
                type: string
              conditions:
                type: array
                items:
                  type: object
                  properties:
                    type:
                      type: string
                    status:
                      type: string
                    reason:
                      type: string
                    message:
                      type: string
                    lastTransitionTime:
                      type: string
                      format: date-time
Custom Resource Instance + RBAC CR + RBAC
Create a WebApp CR instance and define RBAC for controllers (full access) and developers (no delete).
# Create a WebApp custom resource instance
apiVersion: apps.mycompany.io/v1
kind: WebApp
metadata:
  name: my-web-app
  namespace: production
  labels:
    team: frontend
spec:
  image: registry.mycompany.io/webapp:v2.1.0
  replicas: 3
  minReplicas: 2
  maxReplicas: 10
  port: 8080
  env:
  - name: LOG_LEVEL
    value: info
  ingress:
    enabled: true
    host: myapp.example.com
    tlsEnabled: true
---
# ClusterRole for the controller
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: webapp-controller
rules:
- apiGroups: ["apps.mycompany.io"]
  resources: ["webapps"]
  verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: ["apps.mycompany.io"]
  resources: ["webapps/status"]  # status subresource
  verbs: ["get", "update", "patch"]
- apiGroups: ["apps.mycompany.io"]
  resources: ["webapps/finalizers"]
  verbs: ["update"]
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["get", "list", "create", "update", "patch", "delete"]
---
# ClusterRole for developers — no delete
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: webapp-developer
rules:
- apiGroups: ["apps.mycompany.io"]
  resources: ["webapps"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]
Multi-Version CRD with Conversion Webhook Versioning
Two served versions (v1 and v1beta1) with a conversion webhook and deprecation warning on the old version.
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: webapps.apps.mycompany.io
spec:
  group: apps.mycompany.io
  scope: Namespaced
  names:
    plural: webapps
    kind: WebApp
  conversion:
    strategy: Webhook
    webhook:
      conversionReviewVersions: ["v1"]
      clientConfig:
        service:
          name: webapp-conversion-webhook
          namespace: webapp-system
          path: /convert
          port: 443
        caBundle: LS0t...
  versions:
  - name: v1
    served: true
    storage: true   # Storage version
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              image:
                type: string
              containerPort:  # Renamed from v1beta1 "port"
                type: integer
  - name: v1beta1
    served: true
    storage: false  # Not storage version
    deprecated: true
    deprecationWarning: "v1beta1 deprecated, migrate to v1"
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              image:
                type: string
              port:             # Old field name
                type: integer
kubebuilder Controller — Reconciler Go Controller
Complete kubebuilder reconciler for the WebApp CRD — watches CR, creates Deployment, sets owner reference, updates status.
// controllers/webapp_controller.go
package controllers

import (
    "context"
    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/runtime"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    myv1 "mycompany.io/webapp-operator/api/v1"
)

type WebAppReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

// +kubebuilder:rbac:groups=apps.mycompany.io,resources=webapps,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=apps.mycompany.io,resources=webapps/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete

func (r *WebAppReconciler) Reconcile(
    ctx context.Context, req ctrl.Request,
) (ctrl.Result, error) {

    // 1. Fetch the WebApp CR
    webapp := &myv1.WebApp{}
    if err := r.Get(ctx, req.NamespacedName, webapp); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 2. Define the desired Deployment
    replicas := int32(webapp.Spec.Replicas)
    desired := &appsv1.Deployment{
        ObjectMeta: metav1.ObjectMeta{
            Name:      webapp.Name,
            Namespace: webapp.Namespace,
        },
        Spec: appsv1.DeploymentSpec{
            Replicas: &replicas,
            Selector: &metav1.LabelSelector{
                MatchLabels: map[string]string{"app": webapp.Name},
            },
            Template: corev1.PodTemplateSpec{
                ObjectMeta: metav1.ObjectMeta{
                    Labels: map[string]string{"app": webapp.Name},
                },
                Spec: corev1.PodSpec{
                    Containers: []corev1.Container{{
                        Name:  "webapp",
                        Image: webapp.Spec.Image,
                    }},
                },
            },
        },
    }
    // Set WebApp as owner — GC when CR deleted
    ctrl.SetControllerReference(webapp, desired, r.Scheme)

    // 3. Create or Update
    existing := &appsv1.Deployment{}
    if err := r.Get(ctx, req.NamespacedName, existing); err != nil {
        r.Create(ctx, desired)
    } else {
        existing.Spec = desired.Spec
        r.Update(ctx, existing)
    }

    // 4. Update status (use Status().Update, not Update)
    webapp.Status.Phase = "Running"
    r.Status().Update(ctx, webapp)

    return ctrl.Result{}, nil
}

func (r *WebAppReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&myv1.WebApp{}).
        Owns(&appsv1.Deployment{}).  // Watch child Deployments
        Complete(r)
}
CEL Validation Rules Validation (v1.25+)
Advanced CEL — cross-field constraints, immutability rules, and pattern validation without admission webhooks.
spec:
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            x-kubernetes-validations:
            - # Cross-field: max must be >= min
              rule: self.maxReplicas >= self.minReplicas
              message: maxReplicas must be >= minReplicas
            - # Immutability: minReplicas cannot decrease
              rule: oldSelf.minReplicas <= self.minReplicas
              message: minReplicas cannot be decreased
              fieldPath: .minReplicas
            - # CPU threshold must be reasonable
              rule: self.targetCPUPercent >= 10 && self.targetCPUPercent <= 95
              message: targetCPUPercent must be between 10 and 95
            properties:
              minReplicas:
                type: integer
                minimum: 1
              maxReplicas:
                type: integer
                maximum: 1000
              targetCPUPercent:
                type: integer
                default: 70
              scaleDownCooldown:
                type: string
                pattern: '^[0-9]+(s|m|h)$'
                default: "5m"
                x-kubernetes-validations:
                - rule: self.matches('^[0-9]+(s|m|h)$')
                  message: Must be like 30s, 5m, or 1h
Finalizer Pattern + kubectl CRD Commands Lifecycle & CLI
Finalizer for external cleanup and essential kubectl commands for CRD and CR management.
# Finalizer in a CR — prevents immediate deletion
apiVersion: apps.mycompany.io/v1
kind: WebApp
metadata:
  name: my-app
  finalizers:
  - webapps.apps.mycompany.io/finalizer
spec:
  image: myapp:latest
  replicas: 3

---
# kubectl CRD commands cheat sheet

# Discover
kubectl get crds
kubectl api-resources --api-group=apps.mycompany.io
kubectl explain webapp.spec               # field docs
kubectl explain webapp.spec.ingress

# Manage instances
kubectl get webapps -A -o wide
kubectl get wa                            # shortName
kubectl describe webapp my-web-app
kubectl get webapp my-web-app -o jsonpath='{.status.phase}'

# Patch spec
kubectl patch webapp my-web-app \
  --type=merge -p '{"spec":{"replicas":5}}'

# Patch status subresource (controller pattern)
kubectl patch webapp my-web-app \
  --subresource=status --type=merge \
  -p '{"status":{"phase":"Running"}}'

# Scale (if scale subresource enabled)
kubectl scale webapp my-web-app --replicas=5

# Cleanup — WARNING: deletes ALL CRs!
kubectl delete crd webapps.apps.mycompany.io

# kubebuilder bootstrap
kubebuilder init --domain mycompany.io
kubebuilder create api --group apps --version v1 --kind WebApp
make generate && make manifests && make install
make run

CRD Best Practices

01
Naming Convention
Use a Domain You Own
Always use a domain you control as the API group (mycompany.io). CRD name must be plural.group. Kind uses CamelCase. Version follows alpha→beta→stable. Avoid generic group names that may conflict with other operators in the cluster.
02
Enable Status Subresource
Separate Spec from Status
Always enable subresources.status: {} in every CRD. This prevents race conditions between users writing spec and controllers writing status. Controllers must use client.Status().Update(), not client.Update(), to write status fields only.
03
Strict Schema Validation
Validate at the Gate
Define a complete structural schema with required fields, types, and defaults. Use CEL rules for cross-field constraints. Avoid x-kubernetes-preserve-unknown-fields: true globally — it disables pruning and field validation, leading to garbage data in etcd.
04
Conditions Pattern
Standardize Status Reporting
Use the standard Kubernetes conditions pattern in status: type, status (True/False/Unknown), reason (CamelCase), message (human readable), lastTransitionTime. This matches built-in resources and integrates with tooling like kubectl wait --for=condition=Ready.
05
Versioning Strategy
Never Remove Fields
Never remove fields from a served version — it breaks existing clients. Add new optional fields with defaults. When restructuring, add a new version with a conversion webhook. Mark old versions deprecated before removing. Keep at least one previous version during migration windows.
06
Owner References
Automatic Child Cleanup
Always set owner references on resources created by your controller using ctrl.SetControllerReference(). When the CR is deleted, Kubernetes garbage collects all owned child resources automatically. Add finalizers only when you need to clean up external resources before deletion.

Advanced CRD Patterns

🔄
Conversion Webhooks
Multi-Version Translation
When a CRD has multiple versions, a Conversion Webhook translates objects between versions on the fly. The API server calls your HTTPS webhook with a ConversionReview request. You convert and return the new version. The storage version is always used in etcd — all other versions are converted on read. Required for zero-downtime API evolution across versions.
ConversionReviewstorage versionZero downtimeHTTPS webhook
🏗
Kubebuilder
Operator Scaffold Framework
The official Go framework for building Kubernetes operators. Uses controller-runtime under the hood. Generates: CRD manifests from Go struct tags, RBAC ClusterRole markers, Webhook boilerplate, Makefile targets (generate, manifests, install, deploy). Marker annotations (// +kubebuilder:) in Go code drive all generation automatically.
controller-runtimecode-gen markersScaffoldGo
🛠
Operator SDK
Multi-Language Operator Framework
Red Hat's operator framework. Supports Go (uses Kubebuilder internally), Ansible (reconcile logic in Ansible playbooks — no Go needed), and Helm (wrap Helm charts as Operators). operator-sdk scorecard for testing bundles. OLM (Operator Lifecycle Manager) for packaging and distribution on OperatorHub.io.
Go / Ansible / HelmOLMOperatorHub
🌡
Conditions Pattern
Standard Status Reporting
Use the Kubernetes Conditions pattern in all CRD status fields. Each condition has: Type (Ready, Available, Progressing), Status (True/False/Unknown), Reason (CamelCase machine-readable), Message (human-readable), LastTransitionTime. Enables kubectl wait --for=condition=Ready, standard tooling integration, and clear operational visibility.
kubectl waitTrue/False/Unknownmetav1.Condition
🗺
Server-Side Apply
Field Ownership Tracking
Server-Side Apply (SSA) tracks field ownership per manager. Controllers and users can co-own different fields of a CR without conflicts. Use Apply with field manager names. Enables: partial updates without reading the full object, merge conflict detection, and safe multi-actor management of CR fields across teams.
fieldManagerSSAPartial update
🔍
Informers and Watch Cache
Efficient Event Processing
Controllers use Informers (client-go cache layer) to efficiently watch CRs. Informer maintains a local cache and delivers Add/Update/Delete events to handlers. Work queue decouples event receipt from processing with rate limiting and exponential backoff on requeues. Never call the API server directly from reconcile — always use the local cache for reads.
Informer cacheWorkQueueRate limiting
📡
Owns and Watches
Cascade Reconciliation
Controllers can watch secondary resources and trigger reconciliation of the parent CR. builder.Owns(&appsv1.Deployment{}) watches Deployments and requeues the owning CR on changes. builder.Watches() with custom handler for more complex trigger logic. Critical for reactivity: when a child resource changes state, the parent CR reconciles to restore desired state.
Owns()Watches()Cascade trigger
🧪
envtest
Integration Testing for Controllers
controller-runtime's envtest runs a real API server and etcd binary for integration tests — no cluster needed. Register your CRDs, create objects, run the reconciler, assert the resulting state. Much more reliable than unit tests with mocks. Kubebuilder generates a suite_test.go with envtest setup included. Run with: go test ./... in your operator project.
Real API serverNo cluster neededIntegration test

Advanced CRD YAML Examples

Multi-Version CRD with Conversion Webhook Versioning
CRD with v1alpha1 (deprecated, still served) and v1 (storage version), connected to a conversion webhook for seamless translation between versions.
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.storage.mycompany.io
spec:
  group: storage.mycompany.io
  scope: Namespaced
  names:
    plural: databases
    singular: database
    kind: Database
    shortNames: [db]
  conversion:
    strategy: Webhook
    webhook:
      conversionReviewVersions: ["v1"]
      clientConfig:
        service:
          name: database-operator-webhook
          namespace: operators
          path: /convert
  versions:
  - name: v1alpha1
    served: true   # still served but deprecated
    storage: false
    deprecated: true
    deprecationWarning: "v1alpha1 deprecated, migrate to v1"
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              dbType:        # old field name
                type: string
  - name: v1
    served: true
    storage: true  # current storage version
    subresources:
      status: {}
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required: ["engine"]
            properties:
              engine:        # renamed from dbType
                type: string
                enum: ["postgres", "mysql", "redis"]
              version:
                type: string
              replicas:
                type: integer
                default: 1
                minimum: 1
CRD with CEL Validation Rules CEL / v1.25+
Advanced cross-field validation using Common Expression Language — immutability enforcement, range checks, and format rules without any webhook.
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: clusters.infra.mycompany.io
spec:
  group: infra.mycompany.io
  scope: Namespaced
  names:
    plural: clusters
    kind: Cluster
  versions:
  - name: v1
    served: true
    storage: true
    subresources:
      status: {}
    schema:
      openAPIV3Schema:
        type: object
        x-kubernetes-validations:
        - rule: "self.spec.maxNodes >= self.spec.minNodes"
          message: "maxNodes must be >= minNodes"
        - rule: "!(self.spec.highAvailability && self.spec.minNodes < 3)"
          message: "HA clusters require at least 3 nodes"
        properties:
          spec:
            type: object
            required: ["region", "minNodes", "maxNodes"]
            x-kubernetes-validations:
            - rule: "self.region == oldSelf.region"
              message: "region is immutable after creation"
            properties:
              region:
                type: string
                x-kubernetes-validations:
                - rule: "self.matches('^[a-z]+-[a-z]+-[0-9]+$')"
                  message: "region must match pattern like us-east-1"
              minNodes:
                type: integer
                minimum: 1
              maxNodes:
                type: integer
                maximum: 1000
              highAvailability:
                type: boolean
                default: false
              nodeType:
                type: string
                enum: ["standard", "memory", "compute"]
                default: standard
Kubebuilder Go Types and Markers kubebuilder / Go
Go struct with Kubebuilder marker annotations that auto-generate CRD YAML, RBAC ClusterRoles, and webhook configuration at build time.
// types.go — markers drive CRD YAML generation

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:subresource:scale:specpath=.spec.replicas,statuspath=.status.readyReplicas
// +kubebuilder:resource:scope=Namespaced,shortName=wa,categories=all
// +kubebuilder:printcolumn:name="Replicas",type=integer,JSONPath=`.spec.replicas`
// +kubebuilder:printcolumn:name="Ready",type=integer,JSONPath=`.status.readyReplicas`
// +kubebuilder:printcolumn:name="Phase",type=string,JSONPath=`.status.phase`
// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp`
type WebApp struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`
    Spec   WebAppSpec   `json:"spec,omitempty"`
    Status WebAppStatus `json:"status,omitempty"`
}

type WebAppSpec struct {
    // +kubebuilder:validation:Required
    // +kubebuilder:validation:MinLength=1
    Image string `json:"image"`

    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:validation:Maximum=50
    // +kubebuilder:default=1
    Replicas int32 `json:"replicas,omitempty"`

    // +kubebuilder:validation:Enum=RollingUpdate;Recreate
    // +kubebuilder:default=RollingUpdate
    Strategy string `json:"strategy,omitempty"`
}

type WebAppStatus struct {
    ReadyReplicas int32  `json:"readyReplicas,omitempty"`
    Phase         string `json:"phase,omitempty"`
    Conditions []metav1.Condition `json:"conditions,omitempty"`
}

// Reconciler with RBAC markers — make manifests generates ClusterRole
// +kubebuilder:rbac:groups=apps.mycompany.io,resources=webapps,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=apps.mycompany.io,resources=webapps/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups="",resources=services,verbs=get;list;watch;create;update;patch;delete

func (r *WebAppReconciler) Reconcile(ctx context.Context,
    req reconcile.Request) (reconcile.Result, error) {
    var webapp appsv1.WebApp
    if err := r.Get(ctx, req.NamespacedName, &webapp); err != nil {
        return reconcile.Result{}, client.IgnoreNotFound(err)
    }
    // reconcile logic here...
    // Write status via status subresource (not regular Update)
    webapp.Status.Phase = "Running"
    if err := r.Status().Update(ctx, &webapp); err != nil {
        return reconcile.Result{}, err
    }
    return reconcile.Result{RequeueAfter: time.Minute * 5}, nil
}
CRD Status with Conditions Pattern Status Pattern
Full CRD schema with standard Kubernetes conditions in status — enables kubectl wait and integration with monitoring tooling.
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: pipelines.ci.mycompany.io
spec:
  group: ci.mycompany.io
  scope: Namespaced
  names:
    plural: pipelines
    kind: Pipeline
    shortNames: [pl]
  versions:
  - name: v1
    served: true
    storage: true
    subresources:
      status: {}
    additionalPrinterColumns:
    - name: Phase
      type: string
      jsonPath: .status.phase
    - name: Ready
      type: string
      jsonPath: .status.conditions[?(@.type=="Ready")].status
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required: ["repository"]
            properties:
              repository:
                type: string
              branch:
                type: string
                default: main
          status:
            type: object
            properties:
              phase:
                type: string
                enum: ["Pending","Running","Succeeded","Failed"]
              conditions:
                type: array
                items:
                  type: object
                  required: ["type", "status"]
                  properties:
                    type:
                      type: string
                    status:
                      type: string
                      enum: ["True","False","Unknown"]
                    reason:
                      type: string
                    message:
                      type: string
                    lastTransitionTime:
                      type: string
                      format: date-time
---
# Wait for condition to become True
kubectl wait pipeline my-pipeline   --for=condition=Ready=True   --timeout=120s
Cluster-Scoped CRD with RBAC Cluster-Scoped
Cluster-scoped resource (like Nodes, PVs) with controller ClusterRole and a viewer ClusterRole for team members.
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: cloudproviders.infra.mycompany.io
spec:
  group: infra.mycompany.io
  scope: Cluster  # NOT Namespaced
  names:
    plural: cloudproviders
    kind: CloudProvider
    shortNames: [cp]
  versions:
  - name: v1
    served: true
    storage: true
    subresources:
      status: {}
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required: ["provider", "region"]
            properties:
              provider:
                type: string
                enum: ["aws", "gcp", "azure"]
              region:
                type: string
---
# Controller ClusterRole (cluster-scoped needs ClusterRole)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cloudprovider-controller
rules:
- apiGroups: ["infra.mycompany.io"]
  resources: ["cloudproviders"]
  verbs: ["get","list","watch","update","patch"]
- apiGroups: ["infra.mycompany.io"]
  resources: ["cloudproviders/status"]
  verbs: ["get","update","patch"]
- apiGroups: ["infra.mycompany.io"]
  resources: ["cloudproviders/finalizers"]
  verbs: ["update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cloudprovider-viewer
rules:
- apiGroups: ["infra.mycompany.io"]
  resources: ["cloudproviders"]
  verbs: ["get","list","watch"]
Kubebuilder Full Dev Workflow Operator Development
Complete Kubebuilder workflow from project init to cluster deployment — including code generation, testing, image build, and OLM bundle packaging.
# ── PROJECT INIT ─────────────────────────────────────────────
mkdir webapp-operator && cd webapp-operator
kubebuilder init   --domain mycompany.io   --repo github.com/mycompany/webapp-operator

# ── CREATE API (generates types.go + controller skeleton) ────
kubebuilder create api   --group apps --version v1 --kind WebApp   --resource --controller

# ── CREATE WEBHOOK ───────────────────────────────────────────
kubebuilder create webhook   --group apps --version v1 --kind WebApp   --defaulting \           # MutatingWebhook
  --programmatic-validation  # ValidatingWebhook

# ── GENERATE ─────────────────────────────────────────────────
make generate    # generate DeepCopyObject methods
make manifests   # generate CRD YAML + RBAC from markers

# ── INSTALL CRD into cluster ─────────────────────────────────
make install     # kubectl apply -f config/crd/bases/

# ── RUN LOCALLY (out-of-cluster for development) ─────────────
make run         # runs controller process locally

# ── INTEGRATION TESTS with envtest ───────────────────────────
make test        # downloads envtest binaries, runs tests
go test ./... -v -run TestReconcile

# ── BUILD AND PUSH IMAGE ─────────────────────────────────────
make docker-build docker-push IMG=registry.io/webapp-op:v1.0.0

# ── DEPLOY TO CLUSTER ────────────────────────────────────────
make deploy IMG=registry.io/webapp-op:v1.0.0

# ── OLM BUNDLE for OperatorHub distribution ──────────────────
make bundle IMG=registry.io/webapp-op:v1.0.0
make bundle-build BUNDLE_IMG=registry.io/webapp-op-bundle:v1.0.0
operator-sdk scorecard bundle/  # validate bundle quality

# ── USEFUL RUNTIME COMMANDS ──────────────────────────────────
kubectl get crds
kubectl explain webapp.spec --api-version=apps.mycompany.io/v1
kubectl get webapps -A -o wide
kubectl scale webapp my-app --replicas=5
kubectl wait webapp my-app --for=condition=Ready --timeout=60s
kubectl describe crd webapps.apps.mycompany.io

Extension Mechanism Comparison

// When to use CRDs vs API Aggregation vs other patterns — pick the right extension point.

MechanismStorageValidationCustom Logickubectl SupportBest For
CRD etcd (via API server) OpenAPI v3 + CEL External controller FULL 90% of cases — domain objects, Operators, config
API Aggregation (AA) Own backend (any store) Custom — full control Built-in to server FULL Custom storage, non-standard REST (metrics-server)
Built-in Resource (upstream) etcd Hardcoded Core controllers FULL Contributing new features to Kubernetes itself
ConfigMap (workaround) etcd None App reads directly LIMITED Simple config only — avoid for structured domain data
Annotations / Labels etcd (on existing obj) None Controller reads LIMITED Small metadata additions on existing resources

22 / Component Commands
📖 Official Docs: kubectl Reference Cheat Sheet

kubectl Commands by Component

// Every essential kubectl command organized by Kubernetes component — pods, deployments, services, configmaps, secrets, nodes, namespaces, and more.

📦 Pods CORE
The smallest deployable unit in Kubernetes. A Pod runs one or more containers sharing network and storage.
# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get pods                               # list pods in current ns
kubectl get pods -A                            # all namespaces
kubectl get pods -o wide                       # show node, IP
kubectl get pods -l app=nginx                  # filter by label
kubectl get pods --field-selector=status.phase=Running
kubectl get pod my-pod -o yaml                 # full YAML spec
kubectl get pod my-pod -o jsonpath='{.status.podIP}'
kubectl describe pod my-pod                    # detailed info + events

# ── CREATE & DELETE ───────────────────────────────────────────
kubectl run nginx --image=nginx                # quick pod creation
kubectl run nginx --image=nginx --dry-run=client -o yaml  # generate YAML
kubectl run tmp --image=busybox --rm -it -- sh # temp interactive pod
kubectl delete pod my-pod                      # delete a pod
kubectl delete pod my-pod --grace-period=0 --force  # force delete
kubectl delete pods -l app=old-app             # delete by label

# ── LOGS & DEBUG ──────────────────────────────────────────────
kubectl logs my-pod                            # view logs
kubectl logs my-pod -c sidecar                 # specific container
kubectl logs my-pod -f                         # follow / stream
kubectl logs my-pod --previous                 # previous crashed container
kubectl logs my-pod --since=1h                 # last hour
kubectl logs my-pod --tail=100                 # last 100 lines
kubectl logs -l app=nginx --all-containers     # logs by label

# ── EXEC & INTERACT ───────────────────────────────────────────
kubectl exec my-pod -- ls /app                 # run command
kubectl exec -it my-pod -- /bin/sh             # interactive shell
kubectl exec -it my-pod -c sidecar -- bash     # specific container
kubectl cp my-pod:/var/log/app.log ./app.log   # copy from pod
kubectl cp ./config.yaml my-pod:/etc/config/   # copy to pod
kubectl port-forward my-pod 8080:80            # forward local port
kubectl debug my-pod -it --image=busybox       # ephemeral debug container
kubectl top pod my-pod --containers            # resource usage
🚀 Deployments WORKLOAD
Manages ReplicaSets and provides declarative updates, rolling deployments, and rollbacks for stateless applications.
# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get deployments                        # list deployments
kubectl get deploy -A                          # all namespaces
kubectl get deploy my-app -o yaml              # full spec
kubectl describe deploy my-app                 # detail + events

# ── CREATE & UPDATE ───────────────────────────────────────────
kubectl create deploy my-app --image=nginx     # imperative create
kubectl create deploy my-app --image=nginx --replicas=3 --dry-run=client -o yaml
kubectl apply -f deployment.yaml               # declarative apply
kubectl set image deploy/my-app app=nginx:1.25 # update image
kubectl scale deploy my-app --replicas=5       # scale up/down
kubectl autoscale deploy my-app --min=2 --max=10 --cpu-percent=80
kubectl patch deploy my-app -p '{"spec":{"replicas":3}}'

# ── ROLLOUTS ──────────────────────────────────────────────────
kubectl rollout status deploy/my-app           # watch rollout
kubectl rollout history deploy/my-app          # revision history
kubectl rollout history deploy/my-app --revision=3  # specific revision
kubectl rollout undo deploy/my-app             # rollback to previous
kubectl rollout undo deploy/my-app --to-revision=2  # rollback to specific
kubectl rollout restart deploy/my-app          # rolling restart
kubectl rollout pause deploy/my-app            # pause rollout
kubectl rollout resume deploy/my-app           # resume rollout

# ── DELETE ────────────────────────────────────────────────────
kubectl delete deploy my-app                   # delete deployment
kubectl delete -f deployment.yaml              # delete from file
🌐 Services NETWORKING
Stable network endpoint for accessing a set of Pods. Types: ClusterIP, NodePort, LoadBalancer, ExternalName.
# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get services                           # list services
kubectl get svc -A                             # all namespaces
kubectl get svc my-svc -o yaml                 # full spec
kubectl describe svc my-svc                    # detail + endpoints
kubectl get endpoints my-svc                   # backing pod IPs

# ── CREATE ────────────────────────────────────────────────────
kubectl expose deploy my-app --port=80 --target-port=8080  # ClusterIP
kubectl expose deploy my-app --port=80 --type=NodePort
kubectl expose deploy my-app --port=80 --type=LoadBalancer
kubectl create svc clusterip my-svc --tcp=80:8080 --dry-run=client -o yaml
kubectl create svc nodeport my-svc --tcp=80:8080 --node-port=30080

# ── ACCESS & DEBUG ────────────────────────────────────────────
kubectl port-forward svc/my-svc 8080:80        # local access
kubectl run curl --image=curlimages/curl --rm -it -- curl my-svc:80  # test from cluster

# ── DELETE ────────────────────────────────────────────────────
kubectl delete svc my-svc
📝 ConfigMaps CONFIG
Store non-confidential configuration data as key-value pairs. Consumed as env vars, command args, or mounted config files.
# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get configmaps                         # list configmaps
kubectl get cm -A                              # all namespaces
kubectl get cm my-config -o yaml               # view data
kubectl describe cm my-config

# ── CREATE ────────────────────────────────────────────────────
kubectl create configmap my-config --from-literal=key1=val1 --from-literal=key2=val2
kubectl create cm my-config --from-file=config.properties
kubectl create cm my-config --from-file=app-config=./config.yaml
kubectl create cm my-config --from-env-file=.env
kubectl create cm my-config --from-literal=key=val --dry-run=client -o yaml

# ── UPDATE & DELETE ────────────────────────────────────────────
kubectl edit cm my-config                      # edit in $EDITOR
kubectl patch cm my-config -p '{"data":{"key1":"newval"}}'
kubectl delete cm my-config
🔐 Secrets CONFIG
Store sensitive data (passwords, tokens, keys). Base64-encoded by default. Use with RBAC and encryption at rest for security.
# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get secrets                            # list secrets
kubectl get secret my-secret -o yaml           # view (base64 encoded)
kubectl get secret my-secret -o jsonpath='{.data.password}' | base64 -d  # decode
kubectl describe secret my-secret              # metadata only

# ── CREATE ────────────────────────────────────────────────────
kubectl create secret generic my-secret --from-literal=user=admin --from-literal=pass=s3cret
kubectl create secret generic my-secret --from-file=ssh-key=~/.ssh/id_rsa
kubectl create secret docker-registry regcred \
  --docker-server=registry.io --docker-username=user \
  --docker-password=pass --docker-email=user@example.com
kubectl create secret tls my-tls --cert=tls.crt --key=tls.key
kubectl create secret generic my-secret --from-literal=key=val --dry-run=client -o yaml

# ── UPDATE & DELETE ────────────────────────────────────────────
kubectl edit secret my-secret
kubectl delete secret my-secret
📁 Namespaces CORE
Virtual clusters within a physical cluster. Provide scope for names, resource quotas, and access control boundaries.
# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get namespaces                         # list all namespaces
kubectl get ns                                 # shorthand
kubectl describe ns my-namespace

# ── CREATE & SWITCH ───────────────────────────────────────────
kubectl create namespace staging
kubectl create ns staging --dry-run=client -o yaml
kubectl config set-context --current --namespace=staging  # set default

# ── DELETE ────────────────────────────────────────────────────
kubectl delete ns staging                      # deletes ALL resources in ns
🖥️ Nodes CLUSTER
Worker machines (physical or virtual) that run Pods. Managed by the control plane via kubelet.
# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get nodes                              # list nodes
kubectl get nodes -o wide                      # IPs, OS, kernel, runtime
kubectl describe node node-1                   # capacity, conditions, pods
kubectl top nodes                              # CPU & memory usage
kubectl get node node-1 -o jsonpath='{.status.allocatable}'

# ── LABELS & TAINTS ───────────────────────────────────────────
kubectl label node node-1 disktype=ssd         # add label
kubectl label node node-1 disktype-             # remove label
kubectl taint nodes node-1 dedicated=gpu:NoSchedule
kubectl taint nodes node-1 dedicated=gpu:NoSchedule-  # remove taint

# ── MAINTENANCE ───────────────────────────────────────────────
kubectl cordon node-1                          # mark unschedulable
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
kubectl uncordon node-1                        # re-enable scheduling
🔁 ReplicaSets WORKLOAD
Ensures a specified number of pod replicas are running. Usually managed by Deployments — rarely created directly.
# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get replicasets                        # list replicasets
kubectl get rs -A                              # all namespaces
kubectl get rs my-rs -o yaml                   # full spec
kubectl describe rs my-rs                      # detail + events

# ── SCALE & DELETE ────────────────────────────────────────────
kubectl scale rs my-rs --replicas=5            # scale (prefer deploy)
kubectl delete rs my-rs                        # delete replicaset
kubectl delete rs my-rs --cascade=orphan       # keep pods running
💾 StatefulSets WORKLOAD
Manages stateful applications with stable network identities, persistent storage, and ordered deployment/scaling.
# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get statefulsets                       # list statefulsets
kubectl get sts -A                             # all namespaces
kubectl get sts my-db -o yaml                  # full spec
kubectl describe sts my-db

# ── SCALE & ROLLOUT ───────────────────────────────────────────
kubectl scale sts my-db --replicas=5
kubectl rollout status sts/my-db
kubectl rollout history sts/my-db
kubectl rollout undo sts/my-db
kubectl rollout restart sts/my-db
kubectl patch sts my-db -p '{"spec":{"replicas":3}}'

# ── DELETE ────────────────────────────────────────────────────
kubectl delete sts my-db                       # deletes pods too
kubectl delete sts my-db --cascade=orphan      # keep pods
🔧 DaemonSets WORKLOAD
Ensures a copy of a Pod runs on all (or selected) nodes. Used for log collectors, monitoring agents, and network plugins.
# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get daemonsets                         # list daemonsets
kubectl get ds -A                              # all namespaces
kubectl get ds my-agent -o yaml
kubectl describe ds my-agent

# ── ROLLOUT ───────────────────────────────────────────────────
kubectl rollout status ds/my-agent
kubectl rollout history ds/my-agent
kubectl rollout undo ds/my-agent
kubectl rollout restart ds/my-agent

# ── DELETE ────────────────────────────────────────────────────
kubectl delete ds my-agent
⏱️ Jobs & CronJobs WORKLOAD
Jobs run tasks to completion. CronJobs schedule Jobs on a recurring cron-based schedule.
# ── JOBS ──────────────────────────────────────────────────────
kubectl get jobs                               # list jobs
kubectl get job my-job -o yaml
kubectl describe job my-job
kubectl create job my-job --image=busybox -- echo "hello"
kubectl create job my-job --from=cronjob/my-cron  # manual trigger
kubectl logs job/my-job                        # view job logs
kubectl delete job my-job

# ── CRONJOBS ──────────────────────────────────────────────────
kubectl get cronjobs                           # list cronjobs
kubectl get cj -A
kubectl get cj my-cron -o yaml
kubectl describe cj my-cron
kubectl create cronjob my-cron --image=busybox --schedule="*/5 * * * *" -- echo "tick"
kubectl patch cj my-cron -p '{"spec":{"suspend":true}}'   # suspend
kubectl patch cj my-cron -p '{"spec":{"suspend":false}}'  # resume
kubectl delete cj my-cron
🔀 Ingress NETWORKING
HTTP/HTTPS routing rules that expose Services externally. Requires an Ingress Controller (nginx, traefik, etc.).
# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get ingress                            # list ingress resources
kubectl get ing -A
kubectl get ing my-ingress -o yaml
kubectl describe ing my-ingress

# ── CREATE & DELETE ───────────────────────────────────────────
kubectl create ingress my-ingress \
  --rule="myapp.example.com/=my-svc:80" \
  --annotation nginx.ingress.kubernetes.io/rewrite-target=/
kubectl create ingress my-ingress \
  --rule="myapp.example.com/*=my-svc:80,tls=my-tls-secret"
kubectl delete ing my-ingress
💿 PersistentVolumes & Claims STORAGE
PVs are cluster-level storage resources. PVCs are user requests for storage that bind to PVs. StorageClasses enable dynamic provisioning.
# ── PERSISTENT VOLUMES ────────────────────────────────────────
kubectl get pv                                 # list persistent volumes
kubectl get pv my-pv -o yaml
kubectl describe pv my-pv

# ── PERSISTENT VOLUME CLAIMS ──────────────────────────────────
kubectl get pvc                                # list claims
kubectl get pvc -A
kubectl get pvc my-claim -o yaml
kubectl describe pvc my-claim
kubectl delete pvc my-claim

# ── STORAGE CLASSES ───────────────────────────────────────────
kubectl get storageclass                       # list storage classes
kubectl get sc
kubectl describe sc standard
🛡️ RBAC SECURITY
Role-Based Access Control — Roles, ClusterRoles, RoleBindings, ClusterRoleBindings, and ServiceAccounts.
# ── ROLES & CLUSTERROLES ──────────────────────────────────────
kubectl get roles -A                           # list roles
kubectl get clusterroles                       # list cluster roles
kubectl describe role my-role -n my-ns
kubectl describe clusterrole admin
kubectl create role pod-reader --verb=get,list,watch --resource=pods
kubectl create clusterrole node-reader --verb=get,list --resource=nodes

# ── BINDINGS ──────────────────────────────────────────────────
kubectl get rolebindings -A
kubectl get clusterrolebindings
kubectl create rolebinding my-rb --role=pod-reader --user=jane -n my-ns
kubectl create clusterrolebinding my-crb --clusterrole=node-reader --user=jane

# ── SERVICE ACCOUNTS ──────────────────────────────────────────
kubectl get serviceaccounts                    # list service accounts
kubectl get sa -A
kubectl create sa my-sa
kubectl describe sa my-sa
kubectl create token my-sa                     # generate token (v1.24+)

# ── AUTH CHECK ────────────────────────────────────────────────
kubectl auth can-i create pods                 # check own permissions
kubectl auth can-i get pods --as=jane          # impersonate user
kubectl auth can-i '*' '*' --as=system:serviceaccount:default:my-sa
kubectl auth whoami                            # current identity (v1.27+)
🔒 NetworkPolicies NETWORKING
Control traffic flow between Pods at the IP/port level. Requires a CNI plugin that supports NetworkPolicy (Calico, Cilium, etc.).
# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get networkpolicies                    # list network policies
kubectl get netpol -A
kubectl get netpol my-policy -o yaml
kubectl describe netpol my-policy

# ── DELETE ────────────────────────────────────────────────────
kubectl delete netpol my-policy
📊 ResourceQuotas & LimitRanges ADMIN
ResourceQuotas limit total resource consumption per namespace. LimitRanges set default/min/max constraints per Pod or Container.
# ── RESOURCE QUOTAS ───────────────────────────────────────────
kubectl get resourcequotas                     # list quotas
kubectl get quota -A
kubectl describe quota my-quota
kubectl create quota my-quota --hard=pods=10,requests.cpu=4,requests.memory=8Gi

# ── LIMIT RANGES ──────────────────────────────────────────────
kubectl get limitranges
kubectl get limits -A
kubectl describe limits my-limits
📈 HPA & Autoscaling SCALING
HorizontalPodAutoscaler automatically scales workloads based on CPU, memory, or custom metrics.
# ── LIST & INSPECT ────────────────────────────────────────────
kubectl get hpa                                # list autoscalers
kubectl get hpa -A
kubectl describe hpa my-hpa

# ── CREATE & MANAGE ───────────────────────────────────────────
kubectl autoscale deploy my-app --min=2 --max=10 --cpu-percent=80
kubectl patch hpa my-hpa -p '{"spec":{"maxReplicas":20}}'
kubectl delete hpa my-hpa