277 lines
6.6 KiB
Markdown
277 lines
6.6 KiB
Markdown
|
|
# Freeleaps PVC Backup Job
|
||
|
|
|
||
|
|
This job creates daily snapshots of critical PVCs in the Freeleaps production environment using Azure Disk CSI Snapshot feature.
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
The backup job runs daily at 00:00 PST (Pacific Standard Time) and creates snapshots for the following PVCs:
|
||
|
|
- `gitea-shared-storage` in namespace `freeleaps-prod`
|
||
|
|
- `data-freeleaps-prod-gitea-postgresql-ha-postgresql-0` in namespace `freeleaps-prod`
|
||
|
|
|
||
|
|
## Components
|
||
|
|
|
||
|
|
- **backup_script.py**: Python script that creates snapshots and monitors their status
|
||
|
|
- **Dockerfile**: Container image definition
|
||
|
|
- **build.sh**: Script to build the Docker image
|
||
|
|
- **deploy-argocd.sh**: Script to deploy via ArgoCD
|
||
|
|
- **helm-pkg/**: Helm Chart for Kubernetes deployment
|
||
|
|
- **argo-app/**: ArgoCD Application configuration
|
||
|
|
|
||
|
|
## Features
|
||
|
|
|
||
|
|
- ✅ Creates snapshots with timestamp-based naming (YYYYMMDD format)
|
||
|
|
- ✅ Uses PST timezone for snapshot naming
|
||
|
|
- ✅ Monitors snapshot status until ready
|
||
|
|
- ✅ Comprehensive logging to console
|
||
|
|
- ✅ Error handling and retry logic
|
||
|
|
- ✅ RBAC permissions for secure operation
|
||
|
|
- ✅ Resource limits and security context
|
||
|
|
- ✅ Concurrency control (prevents overlapping jobs)
|
||
|
|
- ✅ Helm Chart for flexible configuration
|
||
|
|
- ✅ ArgoCD integration for GitOps deployment
|
||
|
|
- ✅ Incremental snapshots for cost efficiency
|
||
|
|
|
||
|
|
## Building and Deployment
|
||
|
|
|
||
|
|
### Option 1: ArgoCD Deployment (Recommended)
|
||
|
|
|
||
|
|
#### 1. Build and Push Docker Image
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Make build script executable
|
||
|
|
chmod +x build.sh
|
||
|
|
|
||
|
|
# Build the image
|
||
|
|
./build.sh
|
||
|
|
|
||
|
|
# Push to registry
|
||
|
|
docker push freeleaps-registry.azurecr.io/freeleaps-pvc-backup:latest
|
||
|
|
```
|
||
|
|
|
||
|
|
#### 2. Deploy via ArgoCD
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Deploy ArgoCD Application
|
||
|
|
./deploy-argocd.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
#### 3. Monitor in ArgoCD
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check ArgoCD application status
|
||
|
|
kubectl get applications -n freeleaps-devops-system
|
||
|
|
|
||
|
|
# Access ArgoCD UI
|
||
|
|
kubectl port-forward svc/argocd-server -n freeleaps-devops-system 8080:443
|
||
|
|
```
|
||
|
|
|
||
|
|
Then visit `https://localhost:8080` in your browser.
|
||
|
|
|
||
|
|
### Option 2: Direct Helm Deployment
|
||
|
|
|
||
|
|
#### 1. Build and Push Docker Image
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Build the image
|
||
|
|
./build.sh
|
||
|
|
|
||
|
|
# Push to registry
|
||
|
|
docker push freeleaps-registry.azurecr.io/freeleaps-pvc-backup:latest
|
||
|
|
```
|
||
|
|
|
||
|
|
#### 2. Deploy with Helm
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Deploy using Helm Chart
|
||
|
|
helm install freeleaps-data-backup ./helm-pkg/freeleaps-data-backup \
|
||
|
|
--values helm-pkg/freeleaps-data-backup/values.prod.yaml \
|
||
|
|
--namespace freeleaps-prod \
|
||
|
|
--create-namespace
|
||
|
|
```
|
||
|
|
|
||
|
|
## Monitoring
|
||
|
|
|
||
|
|
### Check CronJob Status
|
||
|
|
|
||
|
|
```bash
|
||
|
|
kubectl get cronjobs -n freeleaps-prod
|
||
|
|
```
|
||
|
|
|
||
|
|
### Check Job History
|
||
|
|
|
||
|
|
```bash
|
||
|
|
kubectl get jobs -n freeleaps-prod
|
||
|
|
```
|
||
|
|
|
||
|
|
### View Job Logs
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Get the latest job name
|
||
|
|
kubectl get jobs -n freeleaps-prod --sort-by=.metadata.creationTimestamp
|
||
|
|
|
||
|
|
# View logs
|
||
|
|
kubectl logs -n freeleaps-prod job/freeleaps-data-backup-<timestamp>
|
||
|
|
```
|
||
|
|
|
||
|
|
### Check Snapshots
|
||
|
|
|
||
|
|
```bash
|
||
|
|
kubectl get volumesnapshots -n freeleaps-prod
|
||
|
|
```
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
### Schedule
|
||
|
|
|
||
|
|
The job runs daily at 00:00 PST. To modify the schedule, edit the `cronjob.schedule` field in `helm-pkg/freeleaps-data-backup/values.prod.yaml`:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
cronjob:
|
||
|
|
schedule: "0 8 * * *" # UTC 08:00 = PST 00:00
|
||
|
|
```
|
||
|
|
|
||
|
|
### PVCs to Backup
|
||
|
|
|
||
|
|
To add or remove PVCs, modify the `backup.pvcs` list in `helm-pkg/freeleaps-data-backup/values.prod.yaml`:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
backup:
|
||
|
|
pvcs:
|
||
|
|
- "gitea-shared-storage"
|
||
|
|
- "data-freeleaps-prod-gitea-postgresql-ha-postgresql-0"
|
||
|
|
# Add more PVCs here
|
||
|
|
```
|
||
|
|
|
||
|
|
### Snapshot Class
|
||
|
|
|
||
|
|
The job uses the `csi-azuredisk-vsc` snapshot class with incremental snapshots enabled. This can be modified in `helm-pkg/freeleaps-data-backup/values.prod.yaml`:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
backup:
|
||
|
|
snapshotClass: "csi-azuredisk-vsc"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Resource Limits
|
||
|
|
|
||
|
|
Resource limits can be configured in `helm-pkg/freeleaps-data-backup/values.prod.yaml`:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
resources:
|
||
|
|
requests:
|
||
|
|
memory: "256Mi"
|
||
|
|
cpu: "200m"
|
||
|
|
limits:
|
||
|
|
memory: "512Mi"
|
||
|
|
cpu: "500m"
|
||
|
|
```
|
||
|
|
|
||
|
|
## How It Works
|
||
|
|
|
||
|
|
### Snapshot Naming
|
||
|
|
|
||
|
|
Snapshots are named using the format: `{PVC_NAME}-snapshot-{YYYYMMDD}`
|
||
|
|
|
||
|
|
Examples:
|
||
|
|
- `gitea-shared-storage-snapshot-20250805`
|
||
|
|
- `data-freeleaps-prod-gitea-postgresql-ha-postgresql-0-snapshot-20250805`
|
||
|
|
|
||
|
|
### Processing Flow
|
||
|
|
|
||
|
|
1. **PVC Verification**: Each PVC is verified to exist before processing
|
||
|
|
2. **Snapshot Creation**: Individual snapshots are created for each PVC
|
||
|
|
3. **Status Monitoring**: Each snapshot is monitored until ready
|
||
|
|
4. **Independent Processing**: PVCs are processed independently (one failure doesn't affect others)
|
||
|
|
|
||
|
|
### Incremental Snapshots
|
||
|
|
|
||
|
|
The job uses Azure Disk CSI incremental snapshots, which:
|
||
|
|
- Save storage costs by only storing changed data blocks
|
||
|
|
- Create faster than full snapshots
|
||
|
|
- Maintain full recovery capability
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Common Issues
|
||
|
|
|
||
|
|
1. **Permission Denied**: Ensure RBAC is properly configured
|
||
|
|
2. **PVC Not Found**: Verify PVC names and namespace
|
||
|
|
3. **Snapshot Creation Failed**: Check Azure Disk CSI driver status
|
||
|
|
4. **Job Timeout**: Increase timeout in the values file if needed
|
||
|
|
|
||
|
|
### Debug Mode
|
||
|
|
|
||
|
|
To run the script locally for testing:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Install dependencies
|
||
|
|
pip install -r requirements.txt
|
||
|
|
|
||
|
|
# Run with local kubeconfig
|
||
|
|
python3 backup_script.py
|
||
|
|
```
|
||
|
|
|
||
|
|
## Security
|
||
|
|
|
||
|
|
- The job runs with minimal required permissions
|
||
|
|
- Non-root user execution
|
||
|
|
- Dropped capabilities
|
||
|
|
- Resource limits enforced
|
||
|
|
- No privileged access
|
||
|
|
|
||
|
|
## Maintenance
|
||
|
|
|
||
|
|
### Cleanup Old Snapshots
|
||
|
|
|
||
|
|
Old snapshots can be cleaned up manually:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# List all snapshots
|
||
|
|
kubectl get volumesnapshots -n freeleaps-prod
|
||
|
|
|
||
|
|
# Delete specific snapshot
|
||
|
|
kubectl delete volumesnapshot <snapshot-name> -n freeleaps-prod
|
||
|
|
|
||
|
|
# Delete snapshots older than 30 days (example)
|
||
|
|
kubectl get volumesnapshots -n freeleaps-prod -o jsonpath='{.items[?(@.metadata.creationTimestamp<"2024-07-05T00:00:00Z")].metadata.name}' | xargs kubectl delete volumesnapshot -n freeleaps-prod
|
||
|
|
```
|
||
|
|
|
||
|
|
### Updating Configuration
|
||
|
|
|
||
|
|
To update the backup configuration:
|
||
|
|
|
||
|
|
1. Modify the appropriate values file in `helm-pkg/freeleaps-data-backup/`
|
||
|
|
2. Commit and push changes to the repository
|
||
|
|
3. ArgoCD will automatically sync the changes
|
||
|
|
4. Or manually upgrade with Helm: `helm upgrade freeleaps-data-backup ./helm-pkg/freeleaps-data-backup --values values.prod.yaml`
|
||
|
|
|
||
|
|
## Backup Data
|
||
|
|
|
||
|
|
### What Gets Backed Up
|
||
|
|
|
||
|
|
- **gitea-shared-storage**: Gitea repository data, attachments, and configuration
|
||
|
|
- **data-freeleaps-prod-gitea-postgresql-ha-postgresql-0**: PostgreSQL database data
|
||
|
|
|
||
|
|
### Recovery
|
||
|
|
|
||
|
|
To restore from a snapshot:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Create a PVC from snapshot
|
||
|
|
kubectl apply -f - <<EOF
|
||
|
|
apiVersion: v1
|
||
|
|
kind: PersistentVolumeClaim
|
||
|
|
metadata:
|
||
|
|
name: restored-pvc
|
||
|
|
namespace: freeleaps-prod
|
||
|
|
spec:
|
||
|
|
dataSource:
|
||
|
|
name: <snapshot-name>
|
||
|
|
kind: VolumeSnapshot
|
||
|
|
apiGroup: snapshot.storage.k8s.io
|
||
|
|
accessModes:
|
||
|
|
- ReadWriteOnce
|
||
|
|
resources:
|
||
|
|
requests:
|
||
|
|
storage: 10Gi
|
||
|
|
EOF
|
||
|
|
```
|