freeleaps-ops/docs/PVC_Deep_Dive_Guide.md
2025-09-04 00:58:59 -07:00

30 KiB

PVC Deep Dive Guide: Understanding Persistent Storage in Kubernetes

🎯 Overview

This guide explains Persistent Volume Claims (PVCs) in detail, why they're essential, and how your current Kubernetes setup uses them. PVCs are crucial for applications that need to store data that survives pod restarts, crashes, or migrations.


📊 How PVCs Work: Visual Explanation

🔄 PVC Lifecycle Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                              PVC LIFECYCLE                                   │
│                                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │
│  │   DEVELOPER │  │   PVC       │  │   PV        │  │   STORAGE   │      │
│  │   Creates   │  │   Requests  │  │   Provides  │  │   Backend   │      │
│  │   PVC       │  │   Storage    │  │   Storage   │  │   (Azure)   │      │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘      │
│         │                │                │                │                │
│         │ 1. Create PVC  │                │                │                │
│         │───────────────▶│                │                │                │
│         │                │ 2. Find PV     │                │                │
│         │                │───────────────▶│                │                │
│         │                │                │ 3. Provision   │                │
│         │                │                │───────────────▶│                │
│         │                │                │                │ 4. Create Disk │
│         │                │                │                │◀───────────────│
│         │                │                │ 5. Bind PV    │                │
│         │                │                │◀───────────────│                │
│         │                │ 6. Bind PVC   │                │                │
│         │                │◀───────────────│                │                │
│         │ 7. Ready       │                │                │                │
│         │◀───────────────│                │                │                │
└─────────────────────────────────────────────────────────────────────────────┘

🏗️ Storage Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              STORAGE ARCHITECTURE                           │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                        KUBERNETES CLUSTER                           │   │
│  │                                                                     │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐   │   │
│  │  │   POD 1     │  │   POD 2     │  │   POD 3     │  │   POD 4     │   │   │
│  │  │             │  │             │  │             │  │             │   │   │
│  │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │   │   │
│  │  │ │ Volume  │ │  │ │ Volume  │ │  │ │ Volume  │ │  │ │ Volume  │ │   │   │
│  │  │ │ Mount   │ │  │ │ Mount   │ │  │ │ Mount   │ │  │ │ Mount   │ │   │   │
│  │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │   │   │
│  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘   │   │
│  │         │                │                │                │            │   │
│  │         └────────────────┼────────────────┼────────────────┘            │   │
│  │                          │                │                            │   │
│  │  ┌─────────────────────────────────────────────────────────────────────┐   │   │
│  │  │                        PVCs                                          │   │   │
│  │  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐   │   │   │
│  │  │  │ PVC: gitea  │  │ PVC: mongo  │  │ PVC: logs   │  │ PVC: jenkins│   │   │   │   │
│  │  │  │ 15Gi        │  │ 8Gi         │  │ 1Gi         │  │ 50Gi        │   │   │   │   │
│  │  │  │ RWO         │  │ RWO         │  │ RWO         │  │ RWO         │   │   │   │   │
│  │  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘   │   │   │
│  │  └─────────────────────────────────────────────────────────────────────┘   │   │
│  │                                    │                                      │   │
│  │  ┌─────────────────────────────────────────────────────────────────────┐   │   │
│  │  │                        PVs                                          │   │   │
│  │  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐   │   │   │
│  │  │  │ PV: gitea   │  │ PV: mongo   │  │ PV: logs    │  │ PV: jenkins │   │   │   │   │
│  │  │  │ 15Gi        │  │ 8Gi         │  │ 1Gi         │  │ 50Gi        │   │   │   │   │
│  │  │  │ azure-disk  │  │ azure-disk  │  │ azure-disk  │  │ azure-disk  │   │   │   │   │
│  │  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘   │   │   │
│  │  └─────────────────────────────────────────────────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                        │
│                                    ▼                                        │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                    AZURE STORAGE BACKEND                            │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐   │   │
│  │  │ Managed Disk│  │ Managed Disk│  │ Managed Disk│  │ Managed Disk│   │   │
│  │  │ 15Gi SSD    │  │ 8Gi SSD     │  │ 1Gi SSD     │  │ 50Gi SSD    │   │   │   │
│  │  │ Premium     │  │ Premium     │  │ Standard    │  │ Standard    │   │   │   │
│  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘   │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘

🤔 Why Each Pod Needs PVC: The Data Persistence Problem

Without PVC: Data Loss Scenario

┌─────────────────────────────────────────────────────────────────────────────┐
│                              WITHOUT PVC (BAD)                              │
│                                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │
│  │   POD 1     │  │   POD 2     │  │   POD 3     │  │   POD 4     │      │
│  │ nginx:latest│  │ nginx:latest│  │ nginx:latest│  │ nginx:latest│      │
│  │             │  │             │  │             │  │             │      │
│  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │      │
│  │ │ /tmp    │ │  │ │ /tmp    │ │  │ │ /tmp    │ │  │ │ /tmp    │ │      │
│  │ │ (temp)  │ │  │ │ (temp)  │ │  │ │ (temp)  │ │  │ │ (temp)  │ │      │
│  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │      │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘      │
│                                                                             │
│  🔄 Pod Restart/Delete → ❌ ALL DATA LOST                                   │
│                                                                             │
│  ❌ User uploads gone                                                       │
│  ❌ Database files gone                                                     │
│  ❌ Configuration gone                                                      │
│  ❌ Logs gone                                                               │
└─────────────────────────────────────────────────────────────────────────────┘

With PVC: Data Persistence

┌─────────────────────────────────────────────────────────────────────────────┐
│                              WITH PVC (GOOD)                               │
│                                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │
│  │   POD 1     │  │   POD 2     │  │   POD 3     │  │   POD 4     │      │
│  │ nginx:latest│  │ nginx:latest│  │ nginx:latest│  │ nginx:latest│      │
│  │             │  │             │  │             │  │             │      │
│  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │  │ ┌─────────┐ │      │
│  │ │ /data   │ │  │ │ /data   │ │  │ │ /data   │ │  │ │ /data   │ │      │
│  │ │ (PVC)   │ │  │ │ (PVC)   │ │  │ │ (PVC)   │ │  │ │ (PVC)   │ │      │
│  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │  │ └─────────┘ │      │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘      │
│         │                │                │                │                │
│         └────────────────┼────────────────┼────────────────┘                │
│                          │                │                                │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                        SHARED STORAGE                               │   │
│  │  ┌─────────────────────────────────────────────────────────────┐   │   │
│  │  │  📁 /data                                                   │   │   │
│  │  │  ├── 📄 user-uploads/                                       │   │   │
│  │  │  ├── 📄 database/                                            │   │   │
│  │  │  ├── 📄 config/                                             │   │   │
│  │  │  └── 📄 logs/                                               │   │   │
│  │  └─────────────────────────────────────────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  🔄 Pod Restart/Delete → ✅ DATA PERSISTS                                   │
│                                                                             │
│  ✅ User uploads preserved                                                   │
│  ✅ Database files preserved                                                 │
│  ✅ Configuration preserved                                                  │
│  ✅ Logs preserved                                                          │
└─────────────────────────────────────────────────────────────────────────────┘

🏭 Your Current Kubernetes Setup: PVC Analysis

📊 Your Actual PVC Usage

Based on your codebase analysis, here's how PVCs are currently used:

1. Gitea (Git Repository)

# 🏭 ACTUAL CONFIGURATION FROM YOUR CODEBASE
# freeleaps-ops/freeleaps/helm-pkg/3rd/gitea/values.prod.yaml
persistence:
  enabled: true
  create: true
  mount: true
  claimName: gitea-shared-storage
  size: 15Gi
  accessModes:
    - ReadWriteOnce
  storageClass: azure-disk-std-lrs
  annotations:
    helm.sh/resource-policy: keep

What this means:

  • Gitea uses PVC for storing repositories, user data, and configuration
  • 15GB storage allocated for Git repositories and user data
  • Azure Standard Disk (cost-effective for this use case)
  • ReadWriteOnce - only one pod can access at a time
  • Data persists when Gitea pod restarts

2. MongoDB (Database)

# 🏭 ACTUAL CONFIGURATION FROM YOUR CODEBASE
# freeleaps-ops/freeleaps/helm-pkg/3rd/mongo/values.yaml
persistence:
  enabled: true
  size: 8Gi
  accessModes:
    - ReadWriteOnce
  storageClass: ""  # Uses default Azure storage class

What this means:

  • MongoDB uses PVC for database files
  • 8GB storage for database data
  • Data persists when MongoDB pod restarts
  • Critical for data integrity

3. Jenkins (CI/CD)

# 🏭 ACTUAL CONFIGURATION FROM YOUR CODEBASE
# freeleaps-ops/cluster/manifests/freeleaps-devops-system/jenkins/values.yaml
persistence:
  enabled: true
  storageClass: azure-blob-fuse-2-std-lrs
  accessMode: "ReadWriteOnce"
  size: "50Gi"

What this means:

  • Jenkins uses PVC for build artifacts, workspace data
  • 50GB storage for build history and artifacts
  • Azure Blob Storage (cost-effective for large files)
  • Build history preserved across pod restarts

4. Central Storage (Logs)

# 🏭 ACTUAL CONFIGURATION FROM YOUR CODEBASE
# freeleaps-ops/freeleaps/helm-pkg/centralStorage/templates/central-storage/pvc.yaml
persistence:
  enabled: true
  size: 1Gi
  accessModes:
    - ReadWriteOnce

What this means:

  • Central storage uses PVC for log ingestion
  • 1GB storage for log processing
  • Logs preserved during processing

📋 PVC Usage Summary

Application PVC Name Size Storage Class Purpose Critical?
Gitea gitea-shared-storage 15Gi azure-disk-std-lrs Git repositories, user data 🔴 Critical
MongoDB mongodb-datadir 8Gi Default Database files 🔴 Critical
Jenkins jenkins-pvc 50Gi azure-blob-fuse-2-std-lrs Build artifacts, workspace 🟡 Important
Central Storage central-storage-logs-pvc 1Gi Default Log processing 🟢 Nice to have

🤷‍♂️ Does Each Pod Need PVC? NO!

Common Misconception

"Every pod needs a PVC" - This is WRONG!

Reality: PVCs Are Optional

┌─────────────────────────────────────────────────────────────────────────────┐
│                              PVC DECISION TREE                              │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                    DOES YOUR APP NEED PERSISTENT DATA?             │   │
│  │                                                                     │   │
│  │  ┌─────────────┐                    ┌─────────────┐                │   │
│  │  │     YES     │                    │     NO      │                │   │
│  │  │             │                    │             │                │   │
│  │  │ ┌─────────┐ │                    │ ┌─────────┐ │                │   │
│  │  │ │  USE    │ │                    │ │  DON'T   │ │                │   │
│  │  │ │  PVC    │ │                    │ │   USE    │ │                │   │
│  │  │ │         │ │                    │ │   PVC    │ │                │   │
│  │  │ └─────────┘ │                    │ └─────────┘ │                │   │
│  │  └─────────────┘                    └─────────────┘                │   │
│  │                                                                     │   │
│  │  Examples:                                                          │   │
│  │  • Databases (PostgreSQL, MongoDB)                                  │   │
│  │  • File storage (Gitea, Jenkins)                                    │   │
│  │  • Application data (user uploads)                                  │   │
│  │  • Logs (if you want to keep them)                                 │   │
│  │                                                                     │   │
│  │  Examples:                                                          │   │
│  │  • Web servers (nginx, static content)                              │   │
│  │  • API servers (stateless applications)                             │   │
│  │  • Cache servers (Redis, Memcached)                                 │   │
│  │  • Load balancers                                                    │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘

📊 Your Current Setup Analysis

Looking at your applications:

Applications WITH PVCs (Need Persistent Data)

  • Gitea: Git repositories, user data, configuration
  • MongoDB: Database files
  • Jenkins: Build artifacts, workspace data
  • Central Storage: Log processing

Applications WITHOUT PVCs (Stateless)

  • Nginx Ingress Controller: Stateless routing
  • ArgoCD: GitOps configuration (stored in Git)
  • Cert-manager: Certificate management (stateless)
  • Prometheus/Grafana: Metrics (can use PVC for data retention)

🎯 PVC Considerations: When to Use Them

Use PVCs When:

1. Database Applications

# Database needs persistent storage
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
spec:
  template:
    spec:
      containers:
      - name: postgres
        image: postgres:13
        volumeMounts:
        - name: db-storage
          mountPath: /var/lib/postgresql/data
      volumes:
      - name: db-storage
        persistentVolumeClaim:
          claimName: postgres-pvc

2. File Storage Applications

# File server needs persistent storage
apiVersion: apps/v1
kind: Deployment
metadata:
  name: file-server
spec:
  template:
    spec:
      containers:
      - name: file-server
        image: nginx:latest
        volumeMounts:
        - name: file-storage
          mountPath: /var/www/html
      volumes:
      - name: file-storage
        persistentVolumeClaim:
          claimName: file-storage-pvc

3. Application Data

# Application needs to store user data
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
      - name: my-app
        image: my-app:latest
        volumeMounts:
        - name: app-data
          mountPath: /app/data
      volumes:
      - name: app-data
        persistentVolumeClaim:
          claimName: app-data-pvc

Don't Use PVCs When:

1. Stateless Web Servers

# Web server doesn't need persistent storage
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
spec:
  template:
    spec:
      containers:
      - name: web-server
        image: nginx:latest
        # No volumeMounts needed - stateless

2. API Servers

# API server doesn't need persistent storage
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  template:
    spec:
      containers:
      - name: api-server
        image: my-api:latest
        # No volumeMounts needed - stateless

3. Cache Servers

# Cache server doesn't need persistent storage
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-cache
spec:
  template:
    spec:
      containers:
      - name: redis
        image: redis:latest
        # No volumeMounts needed - cache is temporary

🔧 PVC Configuration Options

1. Access Modes

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  accessModes:
    - ReadWriteOnce    # Single node read/write (most common)
    - ReadOnlyMany     # Multiple nodes read-only
    - ReadWriteMany    # Multiple nodes read/write (rare)
  resources:
    requests:
      storage: 10Gi

2. Storage Classes

# Azure Storage Classes Available
storageClass: azure-disk-std-lrs      # Standard HDD (cheapest)
storageClass: azure-disk-premium-lrs  # Premium SSD (fastest)
storageClass: azure-blob-fuse-2-std-lrs  # Blob storage (for large files)

3. Size Considerations

# Size your PVCs appropriately
resources:
  requests:
    storage: 1Gi    # Small: logs, config
    storage: 10Gi   # Medium: databases
    storage: 100Gi  # Large: file storage, backups

🚨 Common PVC Mistakes

Mistake 1: Using PVC for Everything

# ❌ DON'T DO THIS
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  template:
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        volumeMounts:
        - name: temp-storage  # ❌ Unnecessary PVC
          mountPath: /tmp
      volumes:
      - name: temp-storage
        persistentVolumeClaim:
          claimName: temp-pvc  # ❌ Waste of resources

Mistake 2: Not Setting Resource Limits

# ❌ DON'T DO THIS
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: unlimited-pvc
spec:
  accessModes:
    - ReadWriteOnce
  # ❌ No size limit - could consume all storage

Correct Approach

# ✅ DO THIS
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: limited-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi  # ✅ Set appropriate size

📚 Best Practices

1. Size Appropriately

  • Start small and scale up
  • Monitor actual usage
  • Use storage quotas

2. Choose Right Storage Class

  • Standard HDD: Cost-effective for backups, logs
  • Premium SSD: Performance-critical databases
  • Blob Storage: Large files, archives

3. Use Labels and Annotations

metadata:
  name: my-pvc
  labels:
    app: my-app
    environment: production
    storage-type: database
  annotations:
    helm.sh/resource-policy: keep  # Don't delete on helm uninstall

4. Monitor Usage

# Check PVC usage
kubectl get pvc
kubectl describe pvc <pvc-name>

# Check storage classes
kubectl get storageclass

# Monitor disk usage in pods
kubectl exec <pod-name> -- df -h

🔍 Your Setup Recommendations

Current State: Good!

Your current setup uses PVCs appropriately:

  • Gitea: 15Gi for repositories (appropriate)
  • MongoDB: 8Gi for database (appropriate)
  • Jenkins: 50Gi for builds (appropriate)
  • Central Storage: 1Gi for logs (appropriate)

Potential Improvements

  1. Monitor usage: Check actual disk usage in these PVCs
  2. Consider backups: Implement PVC backup strategy
  3. Storage quotas: Set namespace storage limits
  4. Performance tuning: Use Premium SSD for databases if needed

📖 Next Steps

  1. Monitor your current PVCs:

    kubectl get pvc --all-namespaces
    kubectl describe pvc <pvc-name>
    
  2. Check storage usage:

    kubectl exec -it <pod-name> -- df -h
    
  3. Learn about backup strategies:

    • Azure Backup for PVCs
    • Velero for Kubernetes backups
  4. Consider storage optimization:

    • Right-size PVCs based on actual usage
    • Use appropriate storage classes for cost optimization

Last Updated: September 3, 2025 Version: 1.0 Maintainer: Infrastructure Team