EC2 Recovery
AWS EC2 Instance Recovery Guide: When SSH and Serial Console Fail
This guide demonstrates how to recover access to an EC2 instance when both SSH and Serial Console access are unavailable. We'll use a Proxmox instance as an example, but this method works for any Linux-based EC2 instance.
Prerequisites
- AWS Console access
- Basic understanding of Linux commands
- A working EC2 instance to use as rescue system (Amazon Linux 2 recommended)
Recovery Steps
1. Create a Snapshot of the Affected Volume
2. Stop the Affected Instance
3. Detach the Root Volume
- Select the stopped instance
- Scroll to 'Storage' tab
- Note the volume ID of the root volume
- Right-click the volume → Detach Volume
- Confirm detach
4. Launch a Rescue Instance
- Launch a new EC2 instance
- Use Amazon Linux 2 AMI
- Same availability zone as affected volume
- Configure security group to allow SSH access
5. Attach Problem Volume to Rescue Instance
- Select the detached volume
- Actions → Attach Volume
- Select rescue instance
- Note the device name (e.g., /dev/sdb or /dev/xvdb)
6. Access and Mount the Volume
# Connect to rescue instance
ssh -i your-key.pem ec2-user@rescue-instance-ip
# List available disks to find attached volume
sudo fdisk -l
# or
lsblk
# Create mount point
sudo mkdir -p /mnt/rescue
# Mount the root partition
sudo mount /dev/xvdb1 /mnt/rescue # Adjust device name as needed
7. Troubleshoot and Fix Issues
Common File Locations
# Network Configuration
sudo nano /mnt/rescue/etc/network/interfaces # Debian/Ubuntu/Proxmox
sudo nano /mnt/rescue/etc/sysconfig/network-scripts/ifcfg-eth0 # RHEL/CentOS
# SSH Configuration
sudo nano /mnt/rescue/etc/ssh/sshd_config
# System Logs
sudo less /mnt/rescue/var/log/syslog # Debian/Ubuntu
sudo less /mnt/rescue/var/log/messages # RHEL/CentOS
Example: Fixing Proxmox Network Configuration
# View current network config
sudo cat /mnt/rescue/etc/network/interfaces
# Edit if needed
sudo nano /mnt/rescue/etc/network/interfaces
# Example of working basic config:
auto lo
iface lo inet loopback
iface eth0 inet manual
auto vmbr0
iface vmbr0 inet dhcp
bridge-ports eth0
bridge-stp off
bridge-fd 0
8. Cleanup and Restore
# Unmount volume
cd ~ # Ensure you're not in mounted directory
sudo umount /mnt/rescue
After unmounting:
- Detach volume from rescue instance in AWS Console
- Reattach to original instance as root volume
- Start original instance
- Test connectivity
Common Issues and Solutions
Network Configuration Issues
- Check for correct interface names (eth0, ens5, etc.)
- Verify gateway configuration
- Ensure no conflicting network bridges
- Check for valid IP addressing
Boot Issues
- Check /boot partition isn't full
- Verify fstab entries are correct
- Check for kernel issues in grub configuration
Permission Issues
- Verify SSH key permissions
- Check SELinux/AppArmor settings
- Validate root access configuration
Prevention Tips
- Always maintain a snapshot of working system
- Document working network configuration
- Use AWS Systems Manager Session Manager as backup access method
- Keep serial console access enabled
- Document network changes before implementing
- Test changes in staging environment first
Additional Resources
Remember: Always maintain current backups and document your system configuration to make recovery easier when needed.