6.2 KiB
6.2 KiB
Azure Kubernetes Node Addition Runbook
Overview
This runbook provides step-by-step instructions for adding new Azure Virtual Machines to an existing Kubernetes cluster installed via Kubespray.
Prerequisites
- Access to Azure CLI with appropriate permissions
- SSH access to the new VM
- Access to the existing Kubernetes cluster
- Kubespray installation directory
Pre-Installation Checklist
1. Verify New VM Details
# Get VM details from Azure
az vm show --resource-group <RESOURCE_GROUP> --name <VM_NAME> --query "{name:name,ip:publicIps,privateIp:privateIps}" -o table
2. Verify SSH Access
# Test SSH connection to the new VM
ssh wwwadmin@mathmast.com@<VM_PRIVATE_IP>
# You will be prompted for password
3. Verify Network Connectivity
# From the new VM, test connectivity to existing cluster
ping <EXISTING_MASTER_IP>
Step-by-Step Process
Step 1: Update Ansible Inventory
- Navigate to Kubespray directory
cd freeleaps-ops/3rd/kubespray
- Edit the inventory file
vim ../cluster/ansible/manifests/inventory.ini
- Add the new node to the appropriate group
For a worker node:
[kube_node]
# Existing nodes...
prod-usw2-k8s-freeleaps-worker-nodes-06 ansible_host=<NEW_VM_PRIVATE_IP> ansible_user=wwwadmin@mathmast.com host_name=prod-usw2-k8s-freeleaps-worker-nodes-06
For a master node:
[kube_control_plane]
# Existing nodes...
prod-usw2-k8s-freeleaps-master-03 ansible_host=<NEW_VM_PRIVATE_IP> ansible_user=wwwadmin@mathmast.com etcd_member_name=freeleaps-etcd-03 host_name=prod-usw2-k8s-freeleaps-master-03
Step 2: Verify Inventory Configuration
- Check inventory syntax
ansible-inventory -i ../cluster/ansible/manifests/inventory.ini --list
- Test connectivity to new node
ansible -i ../cluster/ansible/manifests/inventory.ini kube_node -m ping -kK
Step 3: Run Kubespray Scale Playbook
- Execute the scale playbook
cd ../cluster/ansible/manifests
ansible-playbook -i inventory.ini ../../3rd/kubespray/scale.yml -kK -b
Note:
-kprompts for SSH password-Kprompts for sudo password-benables privilege escalation
Step 4: Verify Node Addition
- Check node status
kubectl get nodes
- Verify node is ready
kubectl describe node <NEW_NODE_NAME>
- Check node labels
kubectl get nodes --show-labels
Step 5: Post-Installation Verification
- Test pod scheduling
# Create a test pod to verify scheduling
kubectl run test-pod --image=nginx --restart=Never
kubectl get pod test-pod -o wide
- Check node resources
kubectl top nodes
- Verify node components
kubectl get pods -n kube-system -o wide | grep <NEW_NODE_NAME>
Troubleshooting
Common Issues
1. SSH Connection Failed
# Verify VM is running
az vm show --resource-group <RESOURCE_GROUP> --name <VM_NAME> --query "powerState"
# Check network security groups
az network nsg rule list --resource-group <RESOURCE_GROUP> --nsg-name <NSG_NAME>
2. Ansible Connection Failed
# Test with verbose output
ansible -i ../cluster/ansible/manifests/inventory.ini kube_node -m ping -kK -vvv
3. Node Not Ready
# Check node conditions
kubectl describe node <NEW_NODE_NAME>
# Check kubelet logs
kubectl logs -n kube-system kubelet-<NEW_NODE_NAME>
4. Pod Scheduling Issues
# Check node taints
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
# Check node capacity
kubectl describe node <NEW_NODE_NAME> | grep -A 10 "Capacity"
Recovery Procedures
If Scale Playbook Fails
- Clean up the failed node
kubectl delete node <NEW_NODE_NAME>
- Reset the VM
# Reset VM to clean state
az vm restart --resource-group <RESOURCE_GROUP> --name <VM_NAME>
- Retry the scale playbook
ansible-playbook -i inventory.ini ../../3rd/kubespray/scale.yml -kK -b
If Node is Stuck in NotReady State
- Check kubelet service
ssh wwwadmin@mathmast.com@<VM_PRIVATE_IP>
sudo systemctl status kubelet
- Restart kubelet
ssh wwwadmin@mathmast.com@<VM_PRIVATE_IP>
sudo systemctl restart kubelet
Security Considerations
1. Network Security
- Ensure the new VM is in the correct subnet
- Verify network security group rules allow cluster communication
- Check firewall rules if applicable
2. Access Control
- Use SSH key-based authentication when possible
- Limit sudo access to necessary commands
- Monitor node access logs
3. Compliance
- Ensure the new node meets security requirements
- Verify all required security patches are applied
- Check compliance with organizational policies
Monitoring and Maintenance
1. Node Health Monitoring
# Set up monitoring for the new node
kubectl get nodes -o wide
kubectl top nodes
2. Resource Monitoring
# Monitor resource usage
kubectl describe node <NEW_NODE_NAME> | grep -A 5 "Allocated resources"
3. Log Monitoring
# Monitor kubelet logs
kubectl logs -n kube-system kubelet-<NEW_NODE_NAME> --tail=100 -f
Rollback Procedures
If Node Addition Causes Issues
- Cordon the node
kubectl cordon <NEW_NODE_NAME>
- Drain the node
kubectl drain <NEW_NODE_NAME> --ignore-daemonsets --delete-emptydir-data
- Remove the node
kubectl delete node <NEW_NODE_NAME>
- Update inventory
# Remove the node from inventory.ini
vim ../cluster/ansible/manifests/inventory.ini
Documentation
Required Information
- VM name and IP address
- Resource group and subscription
- Node role (worker/master)
- Date and time of addition
- Person performing the addition
Post-Addition Checklist
- Node appears in
kubectl get nodes - Node status is Ready
- Pods can be scheduled on the node
- All node components are running
- Monitoring is configured
- Documentation is updated
Emergency Contacts
- Infrastructure Team: [Contact Information]
- Kubernetes Administrators: [Contact Information]
- Azure Support: [Contact Information]
Last Updated: [Date] Version: 1.0 Author: [Name]