Day 24 - Highly Available and Scalable Django Application on AWS using Terraform

Today I worked on deploying a highly available and scalable Django application on AWS using Terraform. The goal of this project was to understand how production-style AWS infrastructure is designed across multiple Availability Zones while keeping the application secure, scalable, and resilient.

Instead of deploying a single EC2 instance in a public subnet, this setup used private EC2 instances behind an Application Load Balancer. The infrastructure also included Auto Scaling Groups, NAT Gateways, route tables, security groups, and multi-AZ networking.

Architecture Overview

The infrastructure was deployed inside a custom VPC across two Availability Zones.

Main components used:

  • VPC with public and private subnets
  • Internet Gateway
  • NAT Gateways for outbound internet access
  • Application Load Balancer
  • Private EC2 instances
  • Auto Scaling Group
  • Dockerized Django application
  • Terraform Infrastructure as Code

VPC and Networking Design

The VPC CIDR block used was:

10.0.0.0/16

Two public subnets were created for the ALB and NAT Gateways:

10.0.1.0/24
10.0.2.0/24

Two private subnets were created for EC2 instances:

10.0.11.0/24
10.0.12.0/24

This design allowed the ALB to receive public traffic while keeping application servers private.


Security Design

The Application Load Balancer security group allowed inbound HTTP traffic from the internet.

The EC2 security group only allowed traffic from the ALB security group on port 8000.

This meant the application instances could not be accessed directly from the internet.

Application Load Balancer

The ALB distributed traffic across EC2 instances running in different Availability Zones.

Health checks were configured so unhealthy instances would automatically stop receiving traffic.

Auto Scaling Group

The Auto Scaling Group maintained:

  • Minimum instances: 1
  • Desired instances: 2
  • Maximum instances: 5

CPU-based scaling policies allowed the environment to automatically scale during higher load.


Dockerized Django Application

The EC2 instances used a user data script to install Docker and run the Django container automatically during startup.

Docker image used:

itsbaivab/django-app

This simplified deployment consistency across instances

NAT Gateway Usage

Because the EC2 instances were deployed in private subnets, they needed outbound internet access to download packages and Docker images.

NAT Gateways solved this problem while still keeping the instances private.

One NAT Gateway was deployed per Availability Zone for high availability.

Validation and Testing Steps

1. Validate Terraform Deployment

terraform output

Expected:

load_balancer_dns = "day24-alb-xxxxx.us-east-1.elb.amazonaws.com"
nat_gateway_1_ip = "x.x.x.x"
nat_gateway_2_ip = "x.x.x.x"

2. Test Application Access

Open the ALB using HTTP, not HTTPS:

http://day24-alb-1985674148.us-east-1.elb.amazonaws.com/

Expected:

Django application loads successfully

3. Validate Load Balancer

Go to:

AWS Console → EC2 → Load Balancers → day24-alb

Check:

State = Active
Scheme = internet-facing
Listener = HTTP : 80
Availability Zones = us-east-1a and us-east-1b

4. Validate Target Group Health

Go to:

AWS Console → EC2 → Target Groups → day24-tg → Targets

Expected:

2 targets registered
Health status = Healthy

5. Validate Private EC2 Instances

Go to:

AWS Console → EC2 → Instances

Check:

Instances are running
Instances are in private subnets
Public IPv4 address is blank

6. Validate Auto Scaling Group

Go to:

AWS Console → EC2 → Auto Scaling Groups → day24-asg

Check:

Min capacity = 1
Desired capacity = 2
Max capacity = 5
Health check type = ELB

7. Test High Availability

Before failure test:

Target Group → Targets

Confirm:

2 Healthy targets

Then terminate one EC2 instance manually:

EC2 → Instances → Select one day24 instance → Instance state → Terminate

Expected:

Application should still load through ALB
ALB routes traffic to remaining healthy instance
ASG launches a replacement instance

Refresh:

http://day24-alb-1985674148.us-east-1.elb.amazonaws.com/

One instance terminated.



Application still accessible.




ASG launching replacement instance.

Target group back to 2 healthy targets.



8. Validate Self Healing

After a few minutes, check:

EC2 → Auto Scaling Groups → Activity

Expected:

ASG detected terminated instance
ASG launched replacement instance
Desired capacity restored to 2

Cost Considerations

This project is closer to a production architecture, so NAT Gateway pricing becomes noticeable.

Estimated monthly cost:

  • EC2 instances: ~$17
  • ALB: ~$16
  • NAT Gateways: ~$65
  • Data transfer: ~$5 to $10

Total estimated cost:

~$103 to $108 per month

Resources should be destroyed after testing.


Cleanup

terraform destroy -auto-approve

What I Learned

This project helped me better understand how AWS networking, load balancing, scaling, and security work together in a production-style environment.

The biggest learning was seeing how private EC2 instances can still function correctly through NAT Gateways while remaining protected from direct internet access.

I also learned how Auto Scaling Groups and ALBs work together to improve both scalability and high availability.


Key Takeaways

  • Multi-AZ deployment improves availability
  • Private subnets improve security
  • NAT Gateways provide outbound internet access
  • ALB distributes traffic only to healthy targets
  • Auto Scaling Groups improve resilience
  • Terraform simplifies repeatable deployments
  • Docker standardizes application deployment

Video Reference


    Jay

    Comments

    Popular posts from this blog

    ASM Integrity check failed with PRCT-1225 and PRCT-1011 errors while creating database using DBCA on Exadata 3 node RAC

    Life is beautiful

    Lock Tables in MariaDB