AWS Infrastructure Setup Guide

📋 In This Guide

1. Account Setup & Root Security
2. VPC Design & Subnets
3. IAM Roles & Least Privilege
4. EC2 Instances & AMI Hardening
5. RDS Database Setup
6. S3 Buckets & Storage
7. Load Balancer & Auto Scaling
8. Monitoring & Cost Optimization

AWS is the world's largest cloud platform — offering over 200 services across compute, storage, networking, databases, security, and AI. The flexibility that makes AWS powerful also makes it easy to build insecure, over-engineered, or unnecessarily expensive infrastructure if you do not follow a deliberate design process.

This guide walks through building a production-ready AWS environment using the Well-Architected Framework principles — security, reliability, performance efficiency, cost optimization, and operational excellence built in from day one, not bolted on afterwards.

1 Account Setup & Root Security

The AWS root account has unrestricted access to every resource and service in your account — including the ability to close the account, change billing details, and bypass all IAM policies. It must be treated like the most sensitive credential in your organization.

Root Account Hardening — Non-Negotiable Steps

Enable MFA on root account immediately — use a hardware MFA key (YubiKey) or TOTP authenticator app, not SMS
Create a strong root password (32+ characters) — store in a dedicated password vault, not a personal manager
Delete root access keys if they exist — go to IAM → Security Credentials → Delete Access Keys under root
Enable AWS Organizations — even for single accounts. Creates a management boundary and enables Service Control Policies (SCPs)
Create a separate admin IAM user for daily operations — never use root for routine tasks
Set billing alerts: go to Billing → Budgets → Create Budget — alert at 50%, 80%, and 100% of monthly budget
Enable AWS CloudTrail in all regions — logs every API call made in your account for audit and incident response
Enable AWS Config — tracks resource configuration changes and compliance over time

⚠️ Warning: AWS access key leaks are the number one cause of unexpected AWS bills — sometimes reaching tens of thousands of dollars in hours from cryptocurrency mining attacks. Never commit AWS access keys to Git repositories, hardcode them in application code, or share them in chat messages. Use IAM roles for EC2 instances, Lambda functions, and all AWS services instead of long-lived access keys wherever possible.

Multi-Account Strategy

Account	Purpose	Who Has Access
Management	AWS Organizations root, billing, SCPs only	CFO / IT Director only
Security	CloudTrail logs, GuardDuty, Security Hub, Config	Security team only
Production	Live customer-facing workloads	Senior engineers, read-only for others
Staging	Pre-production testing — mirrors production	Engineering team
Development	Developer sandboxes — time-limited resources	All developers

2 VPC Design & Subnets

The VPC (Virtual Private Cloud) is your private network within AWS — defining IP address ranges, subnets, routing, and network access controls. A well-designed VPC is the foundation of everything that runs inside it.

Recommended VPC CIDR Design

# Production VPC
VPC CIDR:     10.0.0.0/16   (65,536 addresses — room to grow)

# Availability Zone A (Primary)
Public  Subnet A:   10.0.1.0/24    (ALB, NAT Gateway, Bastion)
Private Subnet A:   10.0.10.0/24   (EC2 App Servers)
DB      Subnet A:   10.0.20.0/24   (RDS, ElastiCache)

# Availability Zone B (Secondary — redundancy)
Public  Subnet B:   10.0.2.0/24
Private Subnet B:   10.0.11.0/24
DB      Subnet B:   10.0.21.0/24

# Availability Zone C (Tertiary — optional)
Public  Subnet C:   10.0.3.0/24
Private Subnet C:   10.0.12.0/24
DB      Subnet C:   10.0.22.0/24

✅ Pro Tip: Always create subnets in at least 2 Availability Zones — even if you only use one AZ today. Expanding to a second AZ later requires replacing or reconfiguring load balancers and RDS instances, which causes downtime. The cost of running empty subnets in a second AZ is zero — it is just IP address space reservation.

Internet Gateway, NAT Gateway & Routing

# Public Subnet Route Table
Destination     Target
0.0.0.0/0  →   igw-xxxxxxxx    (Internet Gateway — direct internet access)
10.0.0.0/16 →  local           (VPC-internal routing)

# Private Subnet Route Table
Destination     Target
0.0.0.0/0  →   nat-xxxxxxxx    (NAT Gateway — outbound only, no inbound)
10.0.0.0/16 →  local

# DB Subnet Route Table
Destination     Target
10.0.0.0/16 →  local           (No internet access — DB subnets are fully isolated)

⚠️ Warning: NAT Gateways cost approximately $0.045/hour ($32.40/month) plus $0.045 per GB of processed data — they are one of the most overlooked AWS cost items. For development environments, consider using a NAT Instance (EC2 t3.nano ~$3.80/month) instead. For production, NAT Gateway is recommended for reliability, but monitor data transfer costs carefully.

Security Groups vs NACLs

Security Groups: Stateful firewall at the instance level — preferred method. Allow rules only, deny is implicit. Changes apply instantly without disrupting existing connections
Network ACLs: Stateless firewall at the subnet level — both allow and deny rules, evaluated in order. Requires explicit rules for both inbound and outbound (stateless)
Recommendation: Use Security Groups as your primary control — they are more intuitive and cover 95% of use cases. Use NACLs only for broad subnet-level blocks (e.g., blocking an entire country IP range)

3 IAM Roles & Least Privilege

AWS Identity and Access Management (IAM) controls who can do what to which resources. Every access to AWS — human or machine — should follow the principle of least privilege: grant only the minimum permissions required to perform the task.

IAM Best Practices

No long-lived access keys: Use IAM roles for EC2, Lambda, ECS tasks — the instance gets temporary credentials automatically via Instance Metadata Service
Groups, not individual users: Attach policies to IAM Groups (Developers, DevOps, ReadOnly) and add users to groups — never attach policies directly to individual users
MFA enforcement: Create an SCP or IAM policy that denies all actions except MFA management if MFA is not enrolled
Permission boundaries: Set maximum permission boundaries on developer IAM roles to prevent privilege escalation
Access Analyzer: Enable IAM Access Analyzer to identify resources shared externally and unused permissions

EC2 Instance Role Example

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowS3AppBucketOnly",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::my-app-bucket-prod/*"
    },
    {
      "Sid": "AllowSSMParameterStore",
      "Effect": "Allow",
      "Action": [
        "ssm:GetParameter",
        "ssm:GetParameters"
      ],
      "Resource": "arn:aws:ssm:ap-south-1:123456789:parameter/myapp/prod/*"
    }
  ]
}

✅ Pro Tip: Use AWS Systems Manager Parameter Store or AWS Secrets Manager to store database credentials, API keys, and connection strings — never hardcode them in application code or environment variables on EC2. The IAM role on your EC2 instance gives it permission to fetch secrets at runtime. This means a leaked AMI or Git commit never exposes production credentials.

4 EC2 Instances & AMI Hardening

EC2 instances are the compute backbone of most AWS architectures. Selecting the right instance type, hardening the OS, and managing access securely are critical for both security and cost efficiency.

Instance Type Selection

Family	Use Case	Example Types	India Region Price
t3/t4g	Dev, low-traffic web, burst workloads	t3.micro, t4g.small	$0.009–$0.042/hr
m6i/m7g	General-purpose production apps	m6i.large, m7g.xlarge	$0.077–$0.308/hr
c6i/c7g	CPU-intensive — web serving, encoding	c6i.large, c7g.xlarge	$0.068–$0.272/hr
r6i/r7g	Memory-intensive — caching, in-memory DB	r6i.large, r7g.xlarge	$0.101–$0.404/hr
Graviton (g suffix)	Any workload — 20% cheaper, better perf	t4g, m7g, c7g	20% less than x86

EC2 Security Hardening Checklist

Disable root SSH login: PermitRootLogin no in /etc/ssh/sshd_config
Disable password SSH authentication — key pairs only: PasswordAuthentication no
Change SSH port from 22 to a non-standard port in Security Group and sshd_config
Enable automatic security updates: sudo apt install unattended-upgrades && sudo dpkg-reconfigure unattended-upgrades
Install and enable UFW or firewalld as host-based firewall in addition to Security Groups
Enable AWS Systems Manager Session Manager — replace SSH with SSM for all admin access, no inbound ports needed
Enable CloudWatch Agent — collect system metrics (memory, disk) and application logs
Enable Amazon Inspector — automated vulnerability scanning of EC2 instances

✅ Pro Tip: Use AWS Systems Manager Session Manager instead of SSH for all EC2 administrative access. SSM requires zero inbound security group rules — no port 22 open at all. Access is logged to CloudTrail, sessions can be audited, and access is controlled entirely by IAM. This eliminates the single largest attack surface on most EC2 deployments.

5 RDS Database Setup

Amazon RDS provides managed relational databases — handling backups, patching, replication, and failover automatically. Proper configuration ensures high availability, security, and performance.

RDS Production Configuration

Multi-AZ deployment: Always enable for production — automatic failover to standby in 60–120 seconds with no data loss
Private subnets only: RDS instances must be in DB subnets with no route to internet gateway — never assign a public IP
Encryption at rest: Enable KMS encryption at creation — cannot be enabled after the fact without snapshot restore
Encryption in transit: Force SSL connections — set rds.force_ssl=1 parameter for PostgreSQL/MySQL
Automated backups: Set retention to 7 days minimum, 35 days for compliance-sensitive data
Parameter groups: Create custom parameter groups — never use default parameter groups in production
Read replicas: Create read replicas for heavy reporting/analytics queries — offload from primary

RDS Security Group Configuration

# RDS Security Group — Allow ONLY from App Server Security Group
Inbound Rules:
  Type: MySQL/Aurora (3306)
  Source: sg-app-servers   ← Security Group of EC2 app instances
  Description: App-to-DB only

  Type: PostgreSQL (5432)
  Source: sg-app-servers
  Description: App-to-DB only

Outbound Rules:
  All traffic: Deny (RDS does not need outbound)

# NEVER add:
  Source: 0.0.0.0/0    ← Public access
  Source: 10.0.0.0/8   ← Broad internal access

⚠️ Warning: Never enable "Publicly Accessible" on an RDS instance — even temporarily for debugging. Once enabled, the RDS instance gets a public DNS record and is reachable from the internet on the database port. Even with a strong password, this exposes your database to brute-force, credential stuffing, and zero-day exploit attempts 24/7. Use an SSH tunnel or SSM port forwarding for remote database access instead.

6 S3 Buckets & Storage

Amazon S3 is the most versatile AWS storage service — used for static assets, application data, backups, log archives, and static website hosting. S3 misconfigurations have caused some of the largest data breaches in cloud history.

S3 Security Non-Negotiables

Block All Public Access: Enable at the account level — go to S3 → Block Public Access settings for this account → Block all. This prevents any bucket from becoming public even if misconfigured
Bucket versioning: Enable on all buckets containing important data — protects against accidental deletion and ransomware
Server-Side Encryption: Enable SSE-S3 or SSE-KMS on all buckets — data encrypted at rest by default
S3 Access Logs: Enable access logging on sensitive buckets — logs who accessed what and when
Object Lock: Enable for compliance/backup buckets — prevents deletion for defined retention period even by admins
Lifecycle policies: Automatically transition infrequently accessed data to S3 Glacier to reduce costs

S3 Lifecycle Policy Example

{
  "Rules": [
    {
      "ID": "LogArchiveLifecycle",
      "Status": "Enabled",
      "Filter": {"Prefix": "logs/"},
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "Expiration": {"Days": 2555}
    }
  ]
}

✅ Pro Tip: Use S3 Intelligent-Tiering for data with unpredictable access patterns — it automatically moves objects between frequent, infrequent, and archive tiers based on actual access patterns at no retrieval fee. For log archives and backups with predictable access patterns (rarely accessed after 30 days), use explicit lifecycle rules with STANDARD_IA → GLACIER transitions for maximum cost savings.

7 Load Balancer & Auto Scaling

Application Load Balancers (ALB) and Auto Scaling Groups (ASG) together provide the horizontal scalability and high availability that makes cloud infrastructure fundamentally different from on-premises servers.

Application Load Balancer Setup

Create ALB in public subnets across all AZs — set scheme to "internet-facing"
Create HTTPS listener on port 443 — attach SSL certificate from AWS Certificate Manager (free)
Create HTTP listener on port 80 — redirect to HTTPS with 301 permanent redirect
Create Target Group pointing to EC2 instances in private subnets
Configure health check path: /health — application must return HTTP 200
Enable access logs to S3 — essential for debugging and security analysis
Enable AWS WAF on ALB — protects against OWASP Top 10 attacks

Auto Scaling Group Configuration

# Launch Template
Instance Type:  m6i.large (or Graviton: m7g.large)
AMI:            Custom hardened AMI (not latest Amazon Linux)
Key Pair:       None (use SSM Session Manager)
Security Group: sg-app-servers
IAM Role:       ec2-app-role
User Data:      Bootstrap script to configure application

# Auto Scaling Group
Min Instances:     2    (always maintain 2 for HA)
Desired Instances: 2
Max Instances:     10   (scale up to 10 under load)
Health Check:      ELB  (use ALB health check, not EC2 status)
Cooldown:          300s (wait 5 min between scaling events)

# Scaling Policies
Scale Out:  Add 1 instance when CPU > 70% for 2 consecutive minutes
Scale In:   Remove 1 instance when CPU < 30% for 10 consecutive minutes

⚠️ Warning: Set your Auto Scaling maximum carefully — an aggressive scaling policy combined with a traffic spike or DDoS attack can generate enormous AWS bills in hours. Set a conservative maximum, configure AWS Budgets with action-based alerts that notify you (or even pause scaling) when costs exceed thresholds. Always test your scaling behavior in a staging environment before applying to production.

8 Monitoring & Cost Optimization

AWS provides exceptional visibility into infrastructure health and costs — but only if you configure it. Default monitoring covers the basics; production environments need custom metrics, composite alarms, and cost allocation tags from day one.

CloudWatch Essential Alarms

EC2 CPU Utilization: Alert at >85% sustained for 5 minutes — indicates under-provisioning
RDS Free Storage Space: Alert when below 20% — databases that run out of storage crash immediately
RDS CPU: Alert at >80% — indicates query optimization or instance upgrade needed
ALB 5xx Error Rate: Alert when >1% of responses are 5xx — indicates application errors
ALB Target Response Time: Alert when P99 latency >2 seconds — user experience degradation
Billing: Alert at 50%, 80%, 100% of monthly budget — catch runaway costs early

Cost Optimization Quick Wins

Reserved Instances / Savings Plans: Commit to 1-year for production workloads — saves 30–40% vs On-Demand pricing
Graviton instances: Switch to ARM-based Graviton3 instances (m7g, c7g, r7g) — 20% cheaper AND faster than x86 equivalents
S3 Lifecycle policies: Move old logs and backups to Glacier — typically saves 70–80% vs STANDARD storage class
Right-sizing: Use AWS Compute Optimizer recommendations — many EC2 instances are significantly over-provisioned
Delete unattached EBS volumes: Snapshots and unattached volumes silently accumulate costs — audit monthly
Spot Instances: Use for stateless, fault-tolerant workloads (batch jobs, dev environments) — 70–90% cheaper than On-Demand

✅ Pro Tip: Enable AWS Cost Anomaly Detection — it uses machine learning to identify unexpected spending patterns and alerts you via email or SNS within hours of a cost anomaly occurring. This has saved organizations thousands of dollars by catching accidental resource creation, runaway Auto Scaling, or data transfer spikes before they compound into large bills at month-end.

Need Help Building Your AWS Infrastructure?

EnterWeb IT Firm architects, deploys, and manages production AWS environments — from initial account setup and VPC design to multi-region failover, security hardening, and ongoing cost optimization. AWS and Azure certified engineers on your project.

Schedule Free Consultation View Cloud Services