freeleaps-ops/docs/Kubernetes_Bootstrap_Guide.md

434 lines
9.1 KiB
Markdown
Raw Permalink Normal View History

2025-09-03 23:59:04 +00:00
# Kubernetes Bootstrap Guide
## 🎯 **Overview**
This guide explains how to bootstrap a complete Kubernetes cluster from scratch using Azure VMs and the `freeleaps-ops` repository. **Kubernetes does NOT create automatically** - you need to manually bootstrap the entire infrastructure.
## 📋 **Prerequisites**
### **1. Azure Infrastructure**
- ✅ Azure VMs (already provisioned)
- ✅ Network connectivity between VMs
- ✅ Azure AD tenant configured
- ✅ Resource group: `k8s`
### **2. Local Environment**
-`freeleaps-ops` repository cloned
- ✅ Ansible installed (`pip install ansible`)
- ✅ Azure CLI installed and configured
- ✅ SSH access to VMs
### **3. VM Requirements**
- **Master Nodes**: 2+ VMs for control plane
- **Worker Nodes**: 2+ VMs for workloads
- **Network**: All VMs in same subnet
- **OS**: Ubuntu 20.04+ recommended
---
## 🚀 **Step-by-Step Bootstrap Process**
### **Step 1: Verify Azure VMs**
```bash
# Check VM status
az vm list --resource-group k8s --query "[].{name:name,powerState:powerState,privateIP:privateIps}" -o table
# Ensure all VMs are running
az vm start --resource-group k8s --name <vm-name>
```
### **Step 2: Configure Inventory**
Edit the Ansible inventory file:
```bash
cd freeleaps-ops
vim cluster/ansible/manifests/inventory.ini
```
**Example inventory structure:**
```ini
[all:vars]
ansible_user=wwwadmin@mathmast.com
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
[kube_control_plane]
prod-usw2-k8s-freeleaps-master-01 ansible_host=10.10.0.4 etcd_member_name=freeleaps-etcd-01 host_name=prod-usw2-k8s-freeleaps-master-01
prod-usw2-k8s-freeleaps-master-02 ansible_host=10.10.0.5 etcd_member_name=freeleaps-etcd-02 host_name=prod-usw2-k8s-freeleaps-master-02
[kube_node]
prod-usw2-k8s-freeleaps-worker-nodes-01 ansible_host=10.10.0.6 host_name=prod-usw2-k8s-freeleaps-worker-nodes-01
prod-usw2-k8s-freeleaps-worker-nodes-02 ansible_host=10.10.0.7 host_name=prod-usw2-k8s-freeleaps-worker-nodes-02
[etcd]
prod-usw2-k8s-freeleaps-master-01
prod-usw2-k8s-freeleaps-master-02
[k8s_cluster:children]
kube_control_plane
kube_node
```
### **Step 3: Test Connectivity**
```bash
cd cluster/ansible/manifests
ansible -i inventory.ini all -m ping -kK
```
### **Step 4: Bootstrap Kubernetes Cluster**
```bash
cd ../../3rd/kubespray
ansible-playbook -i ../../cluster/ansible/manifests/inventory.ini ./cluster.yml -kK -b
```
**What this does:**
- Installs Docker/containerd on all nodes
- Downloads Kubernetes binaries (v1.31.4)
- Generates certificates and keys
- Bootstraps etcd cluster
- Starts Kubernetes control plane
- Joins worker nodes
- Configures Calico networking
- Sets up OIDC authentication
### **Step 5: Get Kubeconfig**
```bash
# Get kubeconfig from master node
ssh wwwadmin@mathmast.com@10.10.0.4 "sudo cat /etc/kubernetes/admin.conf" > ~/.kube/config
# Test cluster access
kubectl get nodes
kubectl get pods -n kube-system
```
### **Step 6: Deploy Infrastructure**
```bash
cd ../../cluster/manifests
# Deploy in order
kubectl apply -f freeleaps-controls-system/
kubectl apply -f freeleaps-devops-system/
kubectl apply -f freeleaps-monitoring-system/
kubectl apply -f freeleaps-logging-system/
kubectl apply -f freeleaps-data-platform/
```
### **Step 7: Setup Authentication**
```bash
cd ../../cluster/bin
./freeleaps-cluster-authenticator auth
```
---
## 🤖 **Automated Bootstrap Script**
Use the provided bootstrap script for automated deployment:
```bash
cd freeleaps-ops/docs
./bootstrap-k8s-cluster.sh
```
**Script Features:**
- ✅ Prerequisites verification
- ✅ Azure VM status check
- ✅ Connectivity testing
- ✅ Automated cluster bootstrap
- ✅ Infrastructure deployment
- ✅ Authentication setup
- ✅ Status verification
**Usage Options:**
```bash
# Full bootstrap
./bootstrap-k8s-cluster.sh
# Only verify prerequisites
./bootstrap-k8s-cluster.sh --verify
# Only bootstrap cluster (skip infrastructure)
./bootstrap-k8s-cluster.sh --bootstrap
```
---
## 🔧 **Manual Bootstrap Commands**
If you prefer manual control, here are the detailed commands:
### **1. Install Prerequisites**
```bash
# Install Ansible
pip install ansible
# Install Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
# Install kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
```
### **2. Configure Azure**
```bash
# Login to Azure
az login
# Set subscription
az account set --subscription <subscription-id>
```
### **3. Bootstrap Cluster**
```bash
# Navigate to kubespray
cd freeleaps-ops/3rd/kubespray
# Run cluster installation
ansible-playbook -i ../../cluster/ansible/manifests/inventory.ini ./cluster.yml -kK -b
```
### **4. Verify Installation**
```bash
# Get kubeconfig
ssh wwwadmin@mathmast.com@<master-ip> "sudo cat /etc/kubernetes/admin.conf" > ~/.kube/config
# Test cluster
kubectl get nodes
kubectl get pods -n kube-system
```
---
## 🔍 **Verification Steps**
### **1. Cluster Health**
```bash
# Check nodes
kubectl get nodes -o wide
# Check system pods
kubectl get pods -n kube-system
# Check cluster info
kubectl cluster-info
```
### **2. Network Verification**
```bash
# Check Calico pods
kubectl get pods -n kube-system | grep calico
# Check network policies
kubectl get networkpolicies --all-namespaces
```
### **3. Authentication Test**
```bash
# Test OIDC authentication
kubectl auth whoami
# Check permissions
kubectl auth can-i --list
```
---
## 🚨 **Troubleshooting**
### **Common Issues**
#### **1. Ansible Connection Failed**
```bash
# Check VM status
az vm show --resource-group k8s --name <vm-name> --query "powerState"
# Test SSH manually
ssh wwwadmin@mathmast.com@<vm-ip>
# Check network security groups
az network nsg rule list --resource-group k8s --nsg-name <nsg-name>
```
#### **2. Cluster Bootstrap Failed**
```bash
# Check Ansible logs
ansible-playbook -i inventory.ini cluster.yml -kK -b -vvv
# Check VM resources
kubectl describe node <node-name>
# Check system pods
kubectl get pods -n kube-system
kubectl describe pod <pod-name> -n kube-system
```
#### **3. Infrastructure Deployment Failed**
```bash
# Check CRDs
kubectl get crd
# Check operator pods
kubectl get pods --all-namespaces | grep operator
# Check events
kubectl get events --all-namespaces --sort-by='.lastTimestamp'
```
### **Recovery Procedures**
#### **If Bootstrap Fails**
1. **Clean up failed installation**
```bash
# Reset VMs to clean state
az vm restart --resource-group k8s --name <vm-name>
```
2. **Retry bootstrap**
```bash
cd freeleaps-ops/3rd/kubespray
ansible-playbook -i ../../cluster/ansible/manifests/inventory.ini ./cluster.yml -kK -b
```
#### **If Infrastructure Deployment Fails**
1. **Check prerequisites**
```bash
kubectl get nodes
kubectl get pods -n kube-system
```
2. **Redeploy components**
```bash
kubectl delete -f <component-directory>/
kubectl apply -f <component-directory>/
```
---
## 📊 **Post-Bootstrap Verification**
### **1. Core Components**
```bash
# ArgoCD
kubectl get pods -n freeleaps-devops-system | grep argocd
# Cert-manager
kubectl get pods -n freeleaps-controls-system | grep cert-manager
# Prometheus/Grafana
kubectl get pods -n freeleaps-monitoring-system | grep prometheus
kubectl get pods -n freeleaps-monitoring-system | grep grafana
# Logging
kubectl get pods -n freeleaps-logging-system | grep loki
```
### **2. Access Points**
```bash
# ArgoCD UI
kubectl port-forward svc/argocd-server -n freeleaps-devops-system 8080:80
# Grafana UI
kubectl port-forward svc/kube-prometheus-stack-grafana -n freeleaps-monitoring-system 3000:80
# Kubernetes Dashboard
kubectl port-forward svc/kubernetes-dashboard-kong-proxy -n freeleaps-infra-system 8443:443
```
### **3. Authentication Setup**
```bash
# Setup user authentication
cd freeleaps-ops/cluster/bin
./freeleaps-cluster-authenticator auth
# Test authentication
kubectl auth whoami
kubectl get nodes
```
---
## 🔒 **Security Considerations**
### **1. Network Security**
- Ensure VMs are in private subnets
- Configure network security groups properly
- Use VPN or bastion host for access
### **2. Access Control**
- Use Azure AD OIDC for authentication
- Implement RBAC for authorization
- Regular access reviews
### **3. Monitoring**
- Enable audit logging
- Monitor cluster health
- Set up alerts
---
## 📚 **Next Steps**
### **1. Application Deployment**
- Deploy applications via ArgoCD
- Configure CI/CD pipelines
- Set up monitoring and alerting
### **2. Maintenance**
- Regular security updates
- Backup etcd data
- Monitor resource usage
### **3. Scaling**
- Add more worker nodes
- Configure auto-scaling
- Optimize resource allocation
---
## 🆘 **Support**
### **Emergency Contacts**
- **Infrastructure Team**: [Contact Information]
- **Azure Support**: [Contact Information]
- **Kubernetes Community**: [Contact Information]
### **Useful Commands**
```bash
# Cluster status
kubectl get nodes
kubectl get pods --all-namespaces
# Logs
kubectl logs -n kube-system <pod-name>
# Events
kubectl get events --all-namespaces --sort-by='.lastTimestamp'
# Resource usage
kubectl top nodes
kubectl top pods --all-namespaces
```
---
**Last Updated**: September 3, 2025
**Version**: 1.0
**Maintainer**: Infrastructure Team