Day 24 - Highly Available and Scalable Django Application on AWS using Terraform
Today I worked on deploying a highly available and scalable Django application on AWS using Terraform. The goal of this project was to understand how production-style AWS infrastructure is designed across multiple Availability Zones while keeping the application secure, scalable, and resilient.
Instead of deploying a single EC2 instance in a public subnet, this setup used private EC2 instances behind an Application Load Balancer. The infrastructure also included Auto Scaling Groups, NAT Gateways, route tables, security groups, and multi-AZ networking.
Architecture Overview
The infrastructure was deployed inside a custom VPC across two Availability Zones.
Main components used:
- VPC with public and private subnets
- Internet Gateway
- NAT Gateways for outbound internet access
- Application Load Balancer
- Private EC2 instances
- Auto Scaling Group
- Dockerized Django application
- Terraform Infrastructure as Code
VPC and Networking Design
The VPC CIDR block used was:
10.0.0.0/16
Two public subnets were created for the ALB and NAT Gateways:
10.0.1.0/24
10.0.2.0/24
Two private subnets were created for EC2 instances:
10.0.11.0/24
10.0.12.0/24
This design allowed the ALB to receive public traffic while keeping application servers private.
Security Design
The Application Load Balancer security group allowed inbound HTTP traffic from the internet.
The EC2 security group only allowed traffic from the ALB security group on port 8000.
This meant the application instances could not be accessed directly from the internet.
Application Load Balancer
The ALB distributed traffic across EC2 instances running in different Availability Zones.
Health checks were configured so unhealthy instances would automatically stop receiving traffic.
Auto Scaling Group
The Auto Scaling Group maintained:
- Minimum instances: 1
- Desired instances: 2
- Maximum instances: 5
CPU-based scaling policies allowed the environment to automatically scale during higher load.
Dockerized Django Application
The EC2 instances used a user data script to install Docker and run the Django container automatically during startup.
Docker image used:
itsbaivab/django-app
This simplified deployment consistency across instances
NAT Gateway Usage
Because the EC2 instances were deployed in private subnets, they needed outbound internet access to download packages and Docker images.
NAT Gateways solved this problem while still keeping the instances private.
One NAT Gateway was deployed per Availability Zone for high availability.
Validation and Testing Steps
1. Validate Terraform Deployment
terraform output
Expected:
load_balancer_dns = "day24-alb-xxxxx.us-east-1.elb.amazonaws.com"
nat_gateway_1_ip = "x.x.x.x"
nat_gateway_2_ip = "x.x.x.x"
2. Test Application Access
Open the ALB using HTTP, not HTTPS:
http://day24-alb-1985674148.us-east-1.elb.amazonaws.com/
Expected:
Django application loads successfully
3. Validate Load Balancer
Go to:
AWS Console → EC2 → Load Balancers → day24-alb
Check:
State = Active
Scheme = internet-facing
Listener = HTTP : 80
Availability Zones = us-east-1a and us-east-1b
4. Validate Target Group Health
Go to:
AWS Console → EC2 → Target Groups → day24-tg → Targets
Expected:
2 targets registered
Health status = Healthy
5. Validate Private EC2 Instances
Go to:
AWS Console → EC2 → Instances
Check:
Instances are running
Instances are in private subnets
Public IPv4 address is blank
6. Validate Auto Scaling Group
Go to:
AWS Console → EC2 → Auto Scaling Groups → day24-asg
Check:
Min capacity = 1
Desired capacity = 2
Max capacity = 5
Health check type = ELB
7. Test High Availability
Before failure test:
Target Group → Targets
Confirm:
2 Healthy targets
Then terminate one EC2 instance manually:
EC2 → Instances → Select one day24 instance → Instance state → Terminate
Expected:
Application should still load through ALB
ALB routes traffic to remaining healthy instance
ASG launches a replacement instance
Refresh:
http://day24-alb-1985674148.us-east-1.elb.amazonaws.com/
One instance terminated.
Application still accessible.
ASG launching replacement instance.
Target group back to 2 healthy targets.
8. Validate Self Healing
After a few minutes, check:
EC2 → Auto Scaling Groups → Activity
Expected:
ASG detected terminated instance
ASG launched replacement instance
Desired capacity restored to 2
Cost Considerations
This project is closer to a production architecture, so NAT Gateway pricing becomes noticeable.
Estimated monthly cost:
- EC2 instances: ~$17
- ALB: ~$16
- NAT Gateways: ~$65
- Data transfer: ~$5 to $10
Total estimated cost:
~$103 to $108 per month
Resources should be destroyed after testing.
Cleanup
terraform destroy -auto-approve
What I Learned
This project helped me better understand how AWS networking, load balancing, scaling, and security work together in a production-style environment.
The biggest learning was seeing how private EC2 instances can still function correctly through NAT Gateways while remaining protected from direct internet access.
I also learned how Auto Scaling Groups and ALBs work together to improve both scalability and high availability.
Key Takeaways
- Multi-AZ deployment improves availability
- Private subnets improve security
- NAT Gateways provide outbound internet access
- ALB distributes traffic only to healthy targets
- Auto Scaling Groups improve resilience
- Terraform simplifies repeatable deployments
- Docker standardizes application deployment
Comments
Post a Comment