freeleaps-ops/docs/RabbitMQ_Management_Analysis.md
2025-09-04 00:58:59 -07:00

1016 lines
30 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 🐰 RabbitMQ Management Analysis & Production Guide
> **Complete Guide to Managing RabbitMQ in Your FreeLeaps Production Environment**
> *From configuration to monitoring to troubleshooting*
---
## 📋 **Table of Contents**
1. [🎯 **Quick Start**](#-quick-start)
2. [🏗️ **Your Production Setup**](#-your-production-setup)
3. [🔧 **Current Configuration Analysis**](#-current-configuration-analysis)
4. [📊 **Management UI Guide**](#-management-ui-guide)
5. [🔍 **Production Monitoring**](#-production-monitoring)
6. [🚨 **Troubleshooting Guide**](#-troubleshooting-guide)
7. [⚡ **Performance Optimization**](#-performance-optimization)
8. [🔒 **Security Best Practices**](#-security-best-practices)
9. [📈 **Scaling & High Availability**](#-scaling--high-availability)
10. [🛠️ **Maintenance Procedures**](#-maintenance-procedures)
---
## 🎯 **Quick Start**
### **🚀 First Day Checklist**
- [ ] **Access RabbitMQ Management UI**: Port forward to `http://localhost:15672`
- [ ] **Check your queues**: Verify `freeleaps.devops.reconciler.*` queues exist
- [ ] **Monitor connections**: Check if reconciler is connected
- [ ] **Review metrics**: Check message rates and queue depths
- [ ] **Test connectivity**: Verify RabbitMQ is accessible from your apps
### **🔑 Essential Commands**
```bash
# Access your RabbitMQ cluster
kubectl get pods -n freeleaps-alpha | grep rabbitmq
# Port forward to management UI
kubectl port-forward svc/rabbitmq-headless -n freeleaps-alpha 15672:15672
# Check RabbitMQ logs
kubectl logs -f deployment/rabbitmq -n freeleaps-alpha
# Access RabbitMQ CLI
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_queues
```
---
## 🏗️ **Your Production Setup**
### **🌐 Production Architecture**
```
┌─────────────────────────────────────────────────────────────┐
│ RABBITMQ PRODUCTION SETUP │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐ │
│ │ freeleaps- │ │ freeleaps- │ │ freeleaps- │ │
│ │ devops- │ │ apps │ │ monitoring │ │
│ │ reconciler │ │ (Your Apps) │ │ (Metrics) │ │
│ └─────────────────┘ └─────────────────┘ └──────────────┘ │
│ │ │ │ │
│ │ AMQP 5672 │ AMQP 5672 │ │
│ │ HTTP 15672 │ HTTP 15672 │ │
│ └────────────────────┼────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ RABBITMQ CLUSTER │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Node 1 │ │ Node 2 │ │ Node 3 │ │ │
│ │ │ (Primary) │ │ (Replica) │ │ (Replica) │ │ │
│ │ │ Port: 5672 │ │ Port: 5672 │ │ Port: 5672 │ │ │
│ │ │ UI: 15672 │ │ UI: 15672 │ │ UI: 15672 │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### **📊 Production Namespaces**
| **Environment** | **Namespace** | **Purpose** | **Status** |
|-----------------|---------------|-------------|------------|
| **Alpha** | `freeleaps-alpha` | Development & Testing | ✅ Active |
| **Production** | `freeleaps-prod` | Live Production | ✅ Active |
### **🔧 Production Services**
```bash
# Your actual RabbitMQ services
kubectl get svc -n freeleaps-alpha | grep rabbitmq
kubectl get svc -n freeleaps-prod | grep rabbitmq
# Service details:
# - rabbitmq-headless: Internal cluster communication
# - rabbitmq: External access (if needed)
# - rabbitmq-management: Management UI access
```
---
## 🔧 **Current Configuration Analysis**
### **📋 Configuration Sources**
#### **1. Helm Chart Configuration**
```yaml
# Location: freeleaps-ops/freeleaps/helm-pkg/3rd/rabbitmq/
# Primary configuration files:
# - values.yaml (base configuration)
# - values.alpha.yaml (alpha environment overrides)
# - values.prod.yaml (production environment overrides)
```
#### **2. Reconciler Configuration**
```yaml
# Location: freeleaps-devops-reconciler/helm/freeleaps-devops-reconciler/values.yaml
rabbitmq:
host: "rabbitmq-headless.freeleaps-alpha.svc.cluster.local"
port: 5672
username: "user"
password: "NjlhHFvnDuC7K0ir"
vhost: "/"
```
#### **3. Python Configuration**
```python
# Location: freeleaps-devops-reconciler/reconciler/config/config.py
RABBITMQ_HOST = os.getenv('RABBITMQ_HOST', 'localhost')
RABBITMQ_PORT = int(os.getenv('RABBITMQ_PORT', '5672'))
RABBITMQ_USERNAME = os.getenv('RABBITMQ_USERNAME', 'guest')
RABBITMQ_PASSWORD = os.getenv('RABBITMQ_PASSWORD', 'guest')
```
### **🔍 Configuration Analysis**
#### **✅ What's Working Well**
1. **Helm-based deployment** - Consistent and repeatable
2. **Environment separation** - Alpha vs Production
3. **Clustering enabled** - High availability
4. **Management plugin** - Web UI available
5. **Resource limits** - Proper resource management
#### **⚠️ Issues Identified**
##### **1. Configuration Mismatch**
```yaml
# ❌ PROBLEM: Different image versions
# Helm chart: bitnami/rabbitmq:4.0.6-debian-12-r0
# Reconciler: rabbitmq:3.12-management-alpine
# ❌ PROBLEM: Different credentials
# Alpha: username: "user", password: "NjlhHFvnDuC7K0ir"
# Production: Different credentials (not shown in config)
```
##### **2. Security Concerns**
```yaml
# ❌ PROBLEM: Hardcoded passwords in values files
auth:
username: user
password: "NjlhHFvnDuC7K0ir" # Should be in Kubernetes secrets
```
##### **3. Network Configuration**
```yaml
# ❌ PROBLEM: Inconsistent hostnames
# Reconciler uses: rabbitmq-headless.freeleaps-alpha.svc.cluster.local
# But should use service discovery
```
### **🎯 Recommended Improvements**
#### **1. Centralized Configuration**
```yaml
# Create a centralized RabbitMQ configuration
# Location: freeleaps-ops/config/rabbitmq/
rabbitmq-config:
image:
repository: bitnami/rabbitmq
tag: "4.0.6-debian-12-r0"
auth:
username: ${RABBITMQ_USERNAME}
password: ${RABBITMQ_PASSWORD}
clustering:
enabled: true
name: "freeleaps-${ENVIRONMENT}"
```
#### **2. Secret Management**
```yaml
# Use Kubernetes secrets instead of hardcoded values
apiVersion: v1
kind: Secret
metadata:
name: rabbitmq-credentials
namespace: freeleaps-alpha
type: Opaque
data:
username: dXNlcg== # base64 encoded
password: TmphbEhGdm5EdUM3SzBpcg== # base64 encoded
```
#### **3. Service Discovery**
```yaml
# Use consistent service discovery
# Instead of hardcoded hostnames, use:
RABBITMQ_HOST: "rabbitmq-headless.${NAMESPACE}.svc.cluster.local"
```
---
## 📊 **Management UI Guide**
### **🌐 Accessing the Management UI**
#### **Method 1: Port Forward (Recommended)**
```bash
# Port forward to RabbitMQ management UI
kubectl port-forward svc/rabbitmq-headless -n freeleaps-alpha 15672:15672
# Access: http://localhost:15672
# Username: user
# Password: NjlhHFvnDuC7K0ir
```
#### **Method 2: Ingress (If configured)**
```bash
# If you have ingress configured for RabbitMQ
# Access: https://rabbitmq.freeleaps.mathmast.com
```
### **📋 Management UI Features**
#### **1. Overview Dashboard**
- **Cluster status** and health indicators
- **Node information** and resource usage
- **Connection counts** and message rates
- **Queue depths** and performance metrics
#### **2. Queues Management**
```bash
# Your actual queues to monitor:
# - freeleaps.devops.reconciler.queue (heartbeat)
# - freeleaps.devops.reconciler.input (input messages)
# - freeleaps.devops.reconciler.output (output messages)
# Queue operations:
# - View queue details and metrics
# - Purge queues (remove all messages)
# - Delete queues (with safety confirmations)
# - Monitor message rates and consumer counts
```
#### **3. Exchanges Management**
```bash
# Your actual exchanges:
# - amq.default (default direct exchange)
# - amq.topic (topic exchange)
# - amq.fanout (fanout exchange)
# Exchange operations:
# - View exchange properties and bindings
# - Create new exchanges with custom types
# - Monitor message routing and performance
```
#### **4. Connections & Channels**
```bash
# Monitor your reconciler connections:
# - Check if reconciler is connected
# - Monitor connection health and performance
# - View channel details and limits
# - Force disconnect if needed
```
#### **5. Users & Permissions**
```bash
# Current user setup:
# - Username: user
# - Permissions: Full access to vhost "/"
# - Tags: management
# User management:
# - Create new users for different applications
# - Set up proper permissions and access control
# - Monitor user activity and connections
```
### **🔧 Practical UI Operations**
#### **Monitoring Your Reconciler**
```bash
# 1. Check if reconciler is connected
# Go to: Connections tab
# Look for: freeleaps-devops-reconciler connections
# 2. Monitor message flow
# Go to: Queues tab
# Check: freeleaps.devops.reconciler.* queues
# Monitor: Message rates and queue depths
# 3. Check cluster health
# Go to: Overview tab
# Monitor: Node status and resource usage
```
#### **Troubleshooting via UI**
```bash
# 1. Check for stuck messages
# Go to: Queues > freeleaps.devops.reconciler.input
# Look for: High message count or no consumers
# 2. Check connection issues
# Go to: Connections tab
# Look for: Disconnected or error states
# 3. Monitor resource usage
# Go to: Overview tab
# Check: Memory usage and disk space
```
---
## 🔍 **Production Monitoring**
### **📊 Key Metrics to Monitor**
#### **1. Cluster Health**
```bash
# Check cluster status
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl cluster_status
# Monitor node health
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_nodes
```
#### **2. Queue Metrics**
```bash
# Check queue depths
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_queues name messages consumers
# Monitor message rates
# Use Management UI: Queues tab > Queue details > Message rates
```
#### **3. Connection Metrics**
```bash
# Check active connections
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_connections
# Monitor connection health
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_channels
```
#### **4. Resource Usage**
```bash
# Check memory usage
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl status
# Monitor disk usage
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- df -h
```
### **🚨 Alerting Setup**
#### **1. Queue Depth Alerts**
```yaml
# Alert when queue depth exceeds threshold
# Queue: freeleaps.devops.reconciler.input
# Threshold: > 100 messages
# Action: Send Slack notification
```
#### **2. Connection Loss Alerts**
```yaml
# Alert when reconciler disconnects
# Monitor: freeleaps-devops-reconciler connections
# Threshold: Connection count = 0
# Action: Page on-call engineer
```
#### **3. Resource Usage Alerts**
```yaml
# Alert when memory usage is high
# Threshold: Memory usage > 80%
# Action: Scale up or investigate
```
### **📈 Monitoring Dashboard**
#### **Grafana Dashboard**
```yaml
# Your existing RabbitMQ dashboard
# Location: freeleaps-ops/cluster/manifests/freeleaps-monitoring-system/kube-prometheus-stack/dashboards/rabbitmq.yaml
# Access: https://grafana.mathmast.com
# Dashboard: RabbitMQ Management Overview
```
#### **Key Dashboard Panels**
1. **Queue Depth** - Monitor message accumulation
2. **Message Rates** - Track throughput
3. **Connection Count** - Monitor client connections
4. **Memory Usage** - Track resource consumption
5. **Error Rates** - Monitor failures
---
## 🚨 **Troubleshooting Guide**
### **🔍 Common Issues & Solutions**
#### **1. Reconciler Connection Issues**
##### **Problem**: Reconciler can't connect to RabbitMQ
```bash
# Symptoms:
# - Reconciler logs show connection errors
# - No connections in RabbitMQ UI
# - Pods restarting due to connection failures
# Diagnosis:
kubectl logs -f deployment/freeleaps-devops-reconciler -n freeleaps-devops-system
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_connections
# Solutions:
# 1. Check network connectivity
kubectl exec -it deployment/freeleaps-devops-reconciler -n freeleaps-devops-system -- ping rabbitmq-headless.freeleaps-alpha.svc.cluster.local
# 2. Verify credentials
kubectl get secret rabbitmq-credentials -n freeleaps-alpha -o yaml
# 3. Check RabbitMQ status
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl status
```
#### **2. Queue Message Accumulation**
##### **Problem**: Messages stuck in queues
```bash
# Symptoms:
# - High message count in queues
# - No consumers processing messages
# - Increasing queue depth
# Diagnosis:
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_queues name messages consumers
# Solutions:
# 1. Check consumer health
kubectl logs -f deployment/freeleaps-devops-reconciler -n freeleaps-devops-system
# 2. Restart consumers
kubectl rollout restart deployment/freeleaps-devops-reconciler -n freeleaps-devops-system
# 3. Purge stuck messages (if safe)
# Via Management UI: Queues > Queue > Purge
```
#### **3. Memory Pressure**
##### **Problem**: RabbitMQ running out of memory
```bash
# Symptoms:
# - High memory usage
# - Slow performance
# - Connection drops
# Diagnosis:
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl status
kubectl top pods -n freeleaps-alpha | grep rabbitmq
# Solutions:
# 1. Increase memory limits
kubectl patch deployment rabbitmq -n freeleaps-alpha -p '{"spec":{"template":{"spec":{"containers":[{"name":"rabbitmq","resources":{"limits":{"memory":"2Gi"}}}]}}}}'
# 2. Restart RabbitMQ
kubectl rollout restart deployment/rabbitmq -n freeleaps-alpha
# 3. Check for memory leaks
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_queues name memory
```
#### **4. Cluster Issues**
##### **Problem**: RabbitMQ cluster not healthy
```bash
# Symptoms:
# - Nodes not in sync
# - Replication lag
# - Split-brain scenarios
# Diagnosis:
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl cluster_status
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_nodes
# Solutions:
# 1. Check node connectivity
kubectl get pods -n freeleaps-alpha | grep rabbitmq
# 2. Restart problematic nodes
kubectl delete pod rabbitmq-0 -n freeleaps-alpha
# 3. Rejoin cluster if needed
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl join_cluster rabbit@rabbitmq-0
```
### **🛠️ Debugging Commands**
#### **Essential Debugging Commands**
```bash
# Check RabbitMQ status
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl status
# List all queues
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_queues
# List all exchanges
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_exchanges
# List all bindings
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_bindings
# List all connections
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_connections
# List all channels
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_channels
# Check user permissions
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_users
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_user_permissions user
```
#### **Advanced Debugging**
```bash
# Check RabbitMQ logs
kubectl logs -f deployment/rabbitmq -n freeleaps-alpha
# Check system logs
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- journalctl -u rabbitmq-server
# Check network connectivity
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- netstat -tlnp
# Check disk usage
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- df -h
# Check memory usage
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- free -h
```
---
## ⚡ **Performance Optimization**
### **🎯 Performance Tuning**
#### **1. Memory Optimization**
```yaml
# Optimize memory settings
# Location: values.alpha.yaml
configuration: |-
# Memory management
vm_memory_high_watermark.relative = 0.6
vm_memory_high_watermark_paging_ratio = 0.5
# Message store
msg_store_file_size_limit = 16777216
msg_store_credit_disc_bound = 4000
```
#### **2. Disk Optimization**
```yaml
# Optimize disk settings
configuration: |-
# Disk free space
disk_free_limit.relative = 2.0
# Queue master location
queue_master_locator = min-masters
# Message persistence
queue.default_consumer_prefetch = 50
```
#### **3. Network Optimization**
```yaml
# Optimize network settings
configuration: |-
# TCP settings
tcp_listen_options.backlog = 128
tcp_listen_options.nodelay = true
# Heartbeat
heartbeat = 60
# Connection limits
max_connections = 1000
max_connections_per_user = 100
```
### **📊 Performance Monitoring**
#### **Key Performance Indicators**
1. **Message Throughput** - Messages per second
2. **Latency** - Message processing time
3. **Queue Depth** - Messages waiting to be processed
4. **Memory Usage** - Heap and process memory
5. **Disk I/O** - Write and read operations
#### **Performance Benchmarks**
```bash
# Your expected performance:
# - Message rate: 1000+ messages/second
# - Latency: < 10ms for local messages
# - Queue depth: < 100 messages (normal operation)
# - Memory usage: < 80% of allocated memory
# - Disk usage: < 70% of allocated storage
```
---
## 🔒 **Security Best Practices**
### **🛡️ Current Security Analysis**
#### **✅ Security Strengths**
1. **Network isolation** - RabbitMQ runs in Kubernetes namespace
2. **Resource limits** - Memory and CPU limits set
3. **Non-root user** - Runs as non-root in container
4. **TLS support** - SSL/TLS configuration available
#### **⚠️ Security Weaknesses**
1. **Hardcoded passwords** - Passwords in YAML files
2. **Default permissions** - Overly permissive user access
3. **No audit logging** - Limited security event tracking
4. **No network policies** - No ingress/egress restrictions
### **🔧 Security Improvements**
#### **1. Secret Management**
```yaml
# Use Kubernetes secrets
apiVersion: v1
kind: Secret
metadata:
name: rabbitmq-credentials
namespace: freeleaps-alpha
type: Opaque
data:
username: dXNlcg== # base64 encoded
password: <base64-encoded-password>
---
# Reference in Helm values
auth:
existingSecret: rabbitmq-credentials
existingSecretPasswordKey: password
existingSecretUsernameKey: username
```
#### **2. User Access Control**
```yaml
# Create application-specific users
# Instead of one user with full access:
# - freeleaps-reconciler (reconciler access only)
# - freeleaps-monitoring (read-only access)
# - freeleaps-admin (full access, limited to admins)
```
#### **3. Network Policies**
```yaml
# Restrict network access
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: rabbitmq-network-policy
namespace: freeleaps-alpha
spec:
podSelector:
matchLabels:
app: rabbitmq
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: freeleaps-devops-system
ports:
- protocol: TCP
port: 5672
- protocol: TCP
port: 15672
```
#### **4. Audit Logging**
```yaml
# Enable audit logging
configuration: |-
# Audit logging
log.file.level = info
log.file.rotation.date = $D0
log.file.rotation.size = 10485760
# Security events
log.security = true
```
---
## 📈 **Scaling & High Availability**
### **🏗️ Current HA Setup**
#### **Cluster Configuration**
```yaml
# Your current clustering setup
clustering:
enabled: true
name: "freeleaps-alpha"
addressType: hostname
rebalance: false
forceBoot: false
partitionHandling: autoheal
```
#### **Replication Strategy**
```yaml
# Queue replication
# - Queues are replicated across cluster nodes
# - Automatic failover if primary node fails
# - Data consistency maintained across cluster
```
### **🚀 Scaling Strategies**
#### **1. Horizontal Scaling**
```bash
# Scale RabbitMQ cluster
kubectl scale statefulset rabbitmq -n freeleaps-alpha --replicas=5
# Verify scaling
kubectl get pods -n freeleaps-alpha | grep rabbitmq
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl cluster_status
```
#### **2. Vertical Scaling**
```yaml
# Increase resource limits
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi
```
#### **3. Queue Partitioning**
```yaml
# Partition large queues across nodes
# Strategy: Hash-based partitioning
# Benefits: Better performance and fault tolerance
```
### **🔧 High Availability Best Practices**
#### **1. Node Distribution**
```yaml
# Ensure nodes are distributed across availability zones
# Use pod anti-affinity to prevent single points of failure
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- rabbitmq
topologyKey: kubernetes.io/hostname
```
#### **2. Data Replication**
```yaml
# Configure proper replication
# - All queues should have at least 2 replicas
# - Use quorum queues for critical data
# - Monitor replication lag
```
#### **3. Backup Strategy**
```bash
# Backup RabbitMQ data
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl export_definitions /tmp/rabbitmq-definitions.json
# Restore from backup
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl import_definitions /tmp/rabbitmq-definitions.json
```
---
## 🛠️ **Maintenance Procedures**
### **📅 Regular Maintenance Tasks**
#### **Daily Tasks**
```bash
# 1. Check cluster health
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl cluster_status
# 2. Monitor queue depths
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_queues name messages
# 3. Check connection count
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_connections | wc -l
# 4. Review error logs
kubectl logs --tail=100 deployment/rabbitmq -n freeleaps-alpha | grep ERROR
```
#### **Weekly Tasks**
```bash
# 1. Review performance metrics
# Access Grafana dashboard: RabbitMQ Management Overview
# 2. Check disk usage
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- df -h
# 3. Review user permissions
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_users
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_user_permissions user
# 4. Backup configurations
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl export_definitions /tmp/weekly-backup-$(date +%Y%m%d).json
```
#### **Monthly Tasks**
```bash
# 1. Security audit
# Review user access and permissions
# Check for unused queues and exchanges
# Verify network policies
# 2. Performance review
# Analyze message rates and latency
# Review resource usage trends
# Optimize configurations
# 3. Capacity planning
# Project growth based on usage trends
# Plan for scaling if needed
# Review backup and disaster recovery procedures
```
### **🔧 Maintenance Scripts**
#### **Health Check Script**
```bash
#!/bin/bash
# scripts/rabbitmq-health-check.sh
NAMESPACE="freeleaps-alpha"
POD_NAME=$(kubectl get pods -n $NAMESPACE -l app=rabbitmq -o jsonpath='{.items[0].metadata.name}')
echo "🐰 RabbitMQ Health Check - $(date)"
echo "=================================="
# Check cluster status
echo "📊 Cluster Status:"
kubectl exec -it $POD_NAME -n $NAMESPACE -- rabbitmqctl cluster_status
# Check queue depths
echo "📋 Queue Depths:"
kubectl exec -it $POD_NAME -n $NAMESPACE -- rabbitmqctl list_queues name messages consumers
# Check connections
echo "🔗 Active Connections:"
kubectl exec -it $POD_NAME -n $NAMESPACE -- rabbitmqctl list_connections | wc -l
# Check resource usage
echo "💾 Resource Usage:"
kubectl top pods -n $NAMESPACE | grep rabbitmq
```
#### **Backup Script**
```bash
#!/bin/bash
# scripts/rabbitmq-backup.sh
NAMESPACE="freeleaps-alpha"
BACKUP_DIR="/tmp/rabbitmq-backups"
DATE=$(date +%Y%m%d_%H%M%S)
mkdir -p $BACKUP_DIR
echo "📦 Creating RabbitMQ backup..."
# Export definitions
kubectl exec -it deployment/rabbitmq -n $NAMESPACE -- rabbitmqctl export_definitions /tmp/rabbitmq-definitions-$DATE.json
# Copy backup file
kubectl cp $NAMESPACE/deployment/rabbitmq:/tmp/rabbitmq-definitions-$DATE.json $BACKUP_DIR/
echo "✅ Backup created: $BACKUP_DIR/rabbitmq-definitions-$DATE.json"
```
### **🚨 Emergency Procedures**
#### **1. RabbitMQ Node Failure**
```bash
# If a RabbitMQ node fails:
# 1. Check node status
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl list_nodes
# 2. Restart failed node
kubectl delete pod rabbitmq-1 -n freeleaps-alpha
# 3. Verify cluster health
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl cluster_status
```
#### **2. Data Loss Recovery**
```bash
# If data is lost:
# 1. Stop all consumers
kubectl scale deployment freeleaps-devops-reconciler -n freeleaps-devops-system --replicas=0
# 2. Restore from backup
kubectl cp backup-file.json freeleaps-alpha/deployment/rabbitmq:/tmp/
kubectl exec -it deployment/rabbitmq -n freeleaps-alpha -- rabbitmqctl import_definitions /tmp/backup-file.json
# 3. Restart consumers
kubectl scale deployment freeleaps-devops-reconciler -n freeleaps-devops-system --replicas=1
```
#### **3. Performance Emergency**
```bash
# If performance is severely degraded:
# 1. Check resource usage
kubectl top pods -n freeleaps-alpha | grep rabbitmq
# 2. Scale up resources
kubectl patch deployment rabbitmq -n freeleaps-alpha -p '{"spec":{"template":{"spec":{"containers":[{"name":"rabbitmq","resources":{"limits":{"memory":"4Gi","cpu":"2000m"}}}]}}}}'
# 3. Restart RabbitMQ
kubectl rollout restart deployment/rabbitmq -n freeleaps-alpha
```
---
## 🎯 **Summary & Next Steps**
### **📊 Current State Assessment**
#### **✅ Strengths**
1. **Production-ready setup** - Clustering, monitoring, management UI
2. **Helm-based deployment** - Consistent and repeatable
3. **Environment separation** - Alpha vs Production
4. **Integration working** - Reconciler successfully using RabbitMQ
5. **Monitoring available** - Grafana dashboards and metrics
#### **⚠️ Areas for Improvement**
1. **Security hardening** - Remove hardcoded passwords, implement secrets
2. **Configuration standardization** - Centralize configuration management
3. **Performance optimization** - Tune settings for your workload
4. **Documentation** - Create runbooks for common operations
5. **Automation** - Implement automated health checks and alerts
### **🚀 Recommended Actions**
#### **Immediate (This Week)**
1. **Implement secret management** - Move passwords to Kubernetes secrets
2. **Standardize configuration** - Create centralized RabbitMQ config
3. **Set up monitoring alerts** - Configure alerts for critical metrics
4. **Document procedures** - Create runbooks for common operations
#### **Short Term (Next Month)**
1. **Security audit** - Review and improve security posture
2. **Performance tuning** - Optimize settings based on usage patterns
3. **Automation** - Implement automated health checks and backups
4. **Training** - Train team on RabbitMQ management and troubleshooting
#### **Long Term (Next Quarter)**
1. **High availability** - Implement multi-zone deployment
2. **Disaster recovery** - Set up automated backup and recovery procedures
3. **Advanced monitoring** - Implement predictive analytics and alerting
4. **Capacity planning** - Plan for growth and scaling
### **📚 Additional Resources**
#### **Official Documentation**
- **[RabbitMQ Documentation](https://www.rabbitmq.com/documentation.html)** - Official guides
- **[RabbitMQ Management UI](https://www.rabbitmq.com/management.html)** - UI documentation
- **[RabbitMQ Clustering](https://www.rabbitmq.com/clustering.html)** - Cluster setup
#### **Community Resources**
- **[RabbitMQ Slack](https://rabbitmq-slack.herokuapp.com/)** - Community support
- **[RabbitMQ GitHub](https://github.com/rabbitmq/rabbitmq-server)** - Source code
- **[RabbitMQ Blog](https://blog.rabbitmq.com/)** - Latest updates and tips
#### **Books & Courses**
- **"RabbitMQ in Depth"** by Gavin M. Roy
- **"RabbitMQ Essentials"** by Lovisa Johansson
- **RabbitMQ Tutorials** - Official tutorial series
---
**🎉 You now have a comprehensive understanding of your RabbitMQ production environment! Use this guide to maintain, monitor, and optimize your message broker infrastructure.**
---
*Last updated: $(date)*
*Maintained by: FreeLeaps DevOps Team*