AWS is the world's largest cloud platform — offering over 200 services across compute, storage, networking, databases, security, and AI. The flexibility that makes AWS powerful also makes it easy to build insecure, over-engineered, or unnecessarily expensive infrastructure if you do not follow a deliberate design process.
This guide walks through building a production-ready AWS environment using the Well-Architected Framework principles — security, reliability, performance efficiency, cost optimization, and operational excellence built in from day one, not bolted on afterwards.
1 Account Setup & Root Security
The AWS root account has unrestricted access to every resource and service in your account — including the ability to close the account, change billing details, and bypass all IAM policies. It must be treated like the most sensitive credential in your organization.
Root Account Hardening — Non-Negotiable Steps
- Enable MFA on root account immediately — use a hardware MFA key (YubiKey) or TOTP authenticator app, not SMS
- Create a strong root password (32+ characters) — store in a dedicated password vault, not a personal manager
- Delete root access keys if they exist — go to IAM → Security Credentials → Delete Access Keys under root
- Enable AWS Organizations — even for single accounts. Creates a management boundary and enables Service Control Policies (SCPs)
- Create a separate admin IAM user for daily operations — never use root for routine tasks
- Set billing alerts: go to Billing → Budgets → Create Budget — alert at 50%, 80%, and 100% of monthly budget
- Enable AWS CloudTrail in all regions — logs every API call made in your account for audit and incident response
- Enable AWS Config — tracks resource configuration changes and compliance over time
⚠️ Warning: AWS access key leaks are the number one cause of unexpected AWS bills — sometimes reaching tens of thousands of dollars in hours from cryptocurrency mining attacks. Never commit AWS access keys to Git repositories, hardcode them in application code, or share them in chat messages. Use IAM roles for EC2 instances, Lambda functions, and all AWS services instead of long-lived access keys wherever possible.
Multi-Account Strategy
| Account | Purpose | Who Has Access |
| Management | AWS Organizations root, billing, SCPs only | CFO / IT Director only |
| Security | CloudTrail logs, GuardDuty, Security Hub, Config | Security team only |
| Production | Live customer-facing workloads | Senior engineers, read-only for others |
| Staging | Pre-production testing — mirrors production | Engineering team |
| Development | Developer sandboxes — time-limited resources | All developers |
2 VPC Design & Subnets
The VPC (Virtual Private Cloud) is your private network within AWS — defining IP address ranges, subnets, routing, and network access controls. A well-designed VPC is the foundation of everything that runs inside it.
Recommended VPC CIDR Design
# Production VPC
VPC CIDR: 10.0.0.0/16 (65,536 addresses — room to grow)
# Availability Zone A (Primary)
Public Subnet A: 10.0.1.0/24 (ALB, NAT Gateway, Bastion)
Private Subnet A: 10.0.10.0/24 (EC2 App Servers)
DB Subnet A: 10.0.20.0/24 (RDS, ElastiCache)
# Availability Zone B (Secondary — redundancy)
Public Subnet B: 10.0.2.0/24
Private Subnet B: 10.0.11.0/24
DB Subnet B: 10.0.21.0/24
# Availability Zone C (Tertiary — optional)
Public Subnet C: 10.0.3.0/24
Private Subnet C: 10.0.12.0/24
DB Subnet C: 10.0.22.0/24
✅ Pro Tip: Always create subnets in at least 2 Availability Zones — even if you only use one AZ today. Expanding to a second AZ later requires replacing or reconfiguring load balancers and RDS instances, which causes downtime. The cost of running empty subnets in a second AZ is zero — it is just IP address space reservation.
Internet Gateway, NAT Gateway & Routing
# Public Subnet Route Table
Destination Target
0.0.0.0/0 → igw-xxxxxxxx (Internet Gateway — direct internet access)
10.0.0.0/16 → local (VPC-internal routing)
# Private Subnet Route Table
Destination Target
0.0.0.0/0 → nat-xxxxxxxx (NAT Gateway — outbound only, no inbound)
10.0.0.0/16 → local
# DB Subnet Route Table
Destination Target
10.0.0.0/16 → local (No internet access — DB subnets are fully isolated)
⚠️ Warning: NAT Gateways cost approximately $0.045/hour ($32.40/month) plus $0.045 per GB of processed data — they are one of the most overlooked AWS cost items. For development environments, consider using a NAT Instance (EC2 t3.nano ~$3.80/month) instead. For production, NAT Gateway is recommended for reliability, but monitor data transfer costs carefully.
Security Groups vs NACLs
- Security Groups: Stateful firewall at the instance level — preferred method. Allow rules only, deny is implicit. Changes apply instantly without disrupting existing connections
- Network ACLs: Stateless firewall at the subnet level — both allow and deny rules, evaluated in order. Requires explicit rules for both inbound and outbound (stateless)
- Recommendation: Use Security Groups as your primary control — they are more intuitive and cover 95% of use cases. Use NACLs only for broad subnet-level blocks (e.g., blocking an entire country IP range)
3 IAM Roles & Least Privilege
AWS Identity and Access Management (IAM) controls who can do what to which resources. Every access to AWS — human or machine — should follow the principle of least privilege: grant only the minimum permissions required to perform the task.
IAM Best Practices
- No long-lived access keys: Use IAM roles for EC2, Lambda, ECS tasks — the instance gets temporary credentials automatically via Instance Metadata Service
- Groups, not individual users: Attach policies to IAM Groups (Developers, DevOps, ReadOnly) and add users to groups — never attach policies directly to individual users
- MFA enforcement: Create an SCP or IAM policy that denies all actions except MFA management if MFA is not enrolled
- Permission boundaries: Set maximum permission boundaries on developer IAM roles to prevent privilege escalation
- Access Analyzer: Enable IAM Access Analyzer to identify resources shared externally and unused permissions
EC2 Instance Role Example
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowS3AppBucketOnly",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::my-app-bucket-prod/*"
},
{
"Sid": "AllowSSMParameterStore",
"Effect": "Allow",
"Action": [
"ssm:GetParameter",
"ssm:GetParameters"
],
"Resource": "arn:aws:ssm:ap-south-1:123456789:parameter/myapp/prod/*"
}
]
}
✅ Pro Tip: Use AWS Systems Manager Parameter Store or AWS Secrets Manager to store database credentials, API keys, and connection strings — never hardcode them in application code or environment variables on EC2. The IAM role on your EC2 instance gives it permission to fetch secrets at runtime. This means a leaked AMI or Git commit never exposes production credentials.
4 EC2 Instances & AMI Hardening
EC2 instances are the compute backbone of most AWS architectures. Selecting the right instance type, hardening the OS, and managing access securely are critical for both security and cost efficiency.
Instance Type Selection
| Family | Use Case | Example Types | India Region Price |
| t3/t4g | Dev, low-traffic web, burst workloads | t3.micro, t4g.small | $0.009–$0.042/hr |
| m6i/m7g | General-purpose production apps | m6i.large, m7g.xlarge | $0.077–$0.308/hr |
| c6i/c7g | CPU-intensive — web serving, encoding | c6i.large, c7g.xlarge | $0.068–$0.272/hr |
| r6i/r7g | Memory-intensive — caching, in-memory DB | r6i.large, r7g.xlarge | $0.101–$0.404/hr |
| Graviton (g suffix) | Any workload — 20% cheaper, better perf | t4g, m7g, c7g | 20% less than x86 |
EC2 Security Hardening Checklist
- Disable root SSH login:
PermitRootLogin no in /etc/ssh/sshd_config
- Disable password SSH authentication — key pairs only:
PasswordAuthentication no
- Change SSH port from 22 to a non-standard port in Security Group and sshd_config
- Enable automatic security updates:
sudo apt install unattended-upgrades && sudo dpkg-reconfigure unattended-upgrades
- Install and enable UFW or firewalld as host-based firewall in addition to Security Groups
- Enable AWS Systems Manager Session Manager — replace SSH with SSM for all admin access, no inbound ports needed
- Enable CloudWatch Agent — collect system metrics (memory, disk) and application logs
- Enable Amazon Inspector — automated vulnerability scanning of EC2 instances
✅ Pro Tip: Use AWS Systems Manager Session Manager instead of SSH for all EC2 administrative access. SSM requires zero inbound security group rules — no port 22 open at all. Access is logged to CloudTrail, sessions can be audited, and access is controlled entirely by IAM. This eliminates the single largest attack surface on most EC2 deployments.
5 RDS Database Setup
Amazon RDS provides managed relational databases — handling backups, patching, replication, and failover automatically. Proper configuration ensures high availability, security, and performance.
RDS Production Configuration
- Multi-AZ deployment: Always enable for production — automatic failover to standby in 60–120 seconds with no data loss
- Private subnets only: RDS instances must be in DB subnets with no route to internet gateway — never assign a public IP
- Encryption at rest: Enable KMS encryption at creation — cannot be enabled after the fact without snapshot restore
- Encryption in transit: Force SSL connections — set
rds.force_ssl=1 parameter for PostgreSQL/MySQL
- Automated backups: Set retention to 7 days minimum, 35 days for compliance-sensitive data
- Parameter groups: Create custom parameter groups — never use default parameter groups in production
- Read replicas: Create read replicas for heavy reporting/analytics queries — offload from primary
RDS Security Group Configuration
# RDS Security Group — Allow ONLY from App Server Security Group
Inbound Rules:
Type: MySQL/Aurora (3306)
Source: sg-app-servers ← Security Group of EC2 app instances
Description: App-to-DB only
Type: PostgreSQL (5432)
Source: sg-app-servers
Description: App-to-DB only
Outbound Rules:
All traffic: Deny (RDS does not need outbound)
# NEVER add:
Source: 0.0.0.0/0 ← Public access
Source: 10.0.0.0/8 ← Broad internal access
⚠️ Warning: Never enable "Publicly Accessible" on an RDS instance — even temporarily for debugging. Once enabled, the RDS instance gets a public DNS record and is reachable from the internet on the database port. Even with a strong password, this exposes your database to brute-force, credential stuffing, and zero-day exploit attempts 24/7. Use an SSH tunnel or SSM port forwarding for remote database access instead.
6 S3 Buckets & Storage
Amazon S3 is the most versatile AWS storage service — used for static assets, application data, backups, log archives, and static website hosting. S3 misconfigurations have caused some of the largest data breaches in cloud history.
S3 Security Non-Negotiables
- Block All Public Access: Enable at the account level — go to S3 → Block Public Access settings for this account → Block all. This prevents any bucket from becoming public even if misconfigured
- Bucket versioning: Enable on all buckets containing important data — protects against accidental deletion and ransomware
- Server-Side Encryption: Enable SSE-S3 or SSE-KMS on all buckets — data encrypted at rest by default
- S3 Access Logs: Enable access logging on sensitive buckets — logs who accessed what and when
- Object Lock: Enable for compliance/backup buckets — prevents deletion for defined retention period even by admins
- Lifecycle policies: Automatically transition infrequently accessed data to S3 Glacier to reduce costs
S3 Lifecycle Policy Example
{
"Rules": [
{
"ID": "LogArchiveLifecycle",
"Status": "Enabled",
"Filter": {"Prefix": "logs/"},
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER"
},
{
"Days": 365,
"StorageClass": "DEEP_ARCHIVE"
}
],
"Expiration": {"Days": 2555}
}
]
}
✅ Pro Tip: Use S3 Intelligent-Tiering for data with unpredictable access patterns — it automatically moves objects between frequent, infrequent, and archive tiers based on actual access patterns at no retrieval fee. For log archives and backups with predictable access patterns (rarely accessed after 30 days), use explicit lifecycle rules with STANDARD_IA → GLACIER transitions for maximum cost savings.
7 Load Balancer & Auto Scaling
Application Load Balancers (ALB) and Auto Scaling Groups (ASG) together provide the horizontal scalability and high availability that makes cloud infrastructure fundamentally different from on-premises servers.
Application Load Balancer Setup
- Create ALB in public subnets across all AZs — set scheme to "internet-facing"
- Create HTTPS listener on port 443 — attach SSL certificate from AWS Certificate Manager (free)
- Create HTTP listener on port 80 — redirect to HTTPS with 301 permanent redirect
- Create Target Group pointing to EC2 instances in private subnets
- Configure health check path:
/health — application must return HTTP 200
- Enable access logs to S3 — essential for debugging and security analysis
- Enable AWS WAF on ALB — protects against OWASP Top 10 attacks
Auto Scaling Group Configuration
# Launch Template
Instance Type: m6i.large (or Graviton: m7g.large)
AMI: Custom hardened AMI (not latest Amazon Linux)
Key Pair: None (use SSM Session Manager)
Security Group: sg-app-servers
IAM Role: ec2-app-role
User Data: Bootstrap script to configure application
# Auto Scaling Group
Min Instances: 2 (always maintain 2 for HA)
Desired Instances: 2
Max Instances: 10 (scale up to 10 under load)
Health Check: ELB (use ALB health check, not EC2 status)
Cooldown: 300s (wait 5 min between scaling events)
# Scaling Policies
Scale Out: Add 1 instance when CPU > 70% for 2 consecutive minutes
Scale In: Remove 1 instance when CPU < 30% for 10 consecutive minutes
⚠️ Warning: Set your Auto Scaling maximum carefully — an aggressive scaling policy combined with a traffic spike or DDoS attack can generate enormous AWS bills in hours. Set a conservative maximum, configure AWS Budgets with action-based alerts that notify you (or even pause scaling) when costs exceed thresholds. Always test your scaling behavior in a staging environment before applying to production.
8 Monitoring & Cost Optimization
AWS provides exceptional visibility into infrastructure health and costs — but only if you configure it. Default monitoring covers the basics; production environments need custom metrics, composite alarms, and cost allocation tags from day one.
CloudWatch Essential Alarms
- EC2 CPU Utilization: Alert at >85% sustained for 5 minutes — indicates under-provisioning
- RDS Free Storage Space: Alert when below 20% — databases that run out of storage crash immediately
- RDS CPU: Alert at >80% — indicates query optimization or instance upgrade needed
- ALB 5xx Error Rate: Alert when >1% of responses are 5xx — indicates application errors
- ALB Target Response Time: Alert when P99 latency >2 seconds — user experience degradation
- Billing: Alert at 50%, 80%, 100% of monthly budget — catch runaway costs early
Cost Optimization Quick Wins
- Reserved Instances / Savings Plans: Commit to 1-year for production workloads — saves 30–40% vs On-Demand pricing
- Graviton instances: Switch to ARM-based Graviton3 instances (m7g, c7g, r7g) — 20% cheaper AND faster than x86 equivalents
- S3 Lifecycle policies: Move old logs and backups to Glacier — typically saves 70–80% vs STANDARD storage class
- Right-sizing: Use AWS Compute Optimizer recommendations — many EC2 instances are significantly over-provisioned
- Delete unattached EBS volumes: Snapshots and unattached volumes silently accumulate costs — audit monthly
- Spot Instances: Use for stateless, fault-tolerant workloads (batch jobs, dev environments) — 70–90% cheaper than On-Demand
✅ Pro Tip: Enable AWS Cost Anomaly Detection — it uses machine learning to identify unexpected spending patterns and alerts you via email or SNS within hours of a cost anomaly occurring. This has saved organizations thousands of dollars by catching accidental resource creation, runaway Auto Scaling, or data transfer spikes before they compound into large bills at month-end.
Need Help Building Your AWS Infrastructure?
EnterWeb IT Firm architects, deploys, and manages production AWS environments — from initial account setup and VPC design to multi-region failover, security hardening, and ongoing cost optimization. AWS and Azure certified engineers on your project.