CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-pulumi--aws

A Pulumi package for creating and managing Amazon Web Services (AWS) cloud resources with infrastructure-as-code.

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

overview.mddocs/database/

AWS Database Services Overview

Guide to selecting and using AWS database services with Pulumi.

Service Categories

Relational Databases (SQL)

Structured data with ACID transactions

  • RDS - Managed relational databases (MySQL, PostgreSQL, MariaDB, Oracle, SQL Server)
  • Aurora - MySQL/PostgreSQL-compatible, up to 5x performance
  • Redshift - Petabyte-scale data warehouse

NoSQL Databases

Flexible schemas, horizontal scaling

  • DynamoDB - Key-value and document database, millisecond latency
  • DocumentDB - MongoDB-compatible document database
  • Keyspaces - Apache Cassandra-compatible

In-Memory Caching

Microsecond latency, session stores

  • ElastiCache - Redis and Memcached
  • MemoryDB for Redis - Redis-compatible, durable in-memory database
  • DAX - DynamoDB Accelerator, microsecond latency

Specialized Databases

Purpose-built for specific workloads

  • Neptune - Graph database (Gremlin, SPARQL)
  • QLDB - Ledger database, immutable transaction log
  • Timestream - Time-series database
  • OpenSearch - Search and analytics

Decision Tree

Choose Your Database Service

Start: What's your data model?

┌─ Relational (SQL)
│  ├─ Need AWS-optimized? → Yes: Aurora | No: RDS
│  ├─ Analytics/warehousing? → Redshift
│  └─ Specific engine?
│     ├─ PostgreSQL → Aurora PostgreSQL or RDS PostgreSQL
│     ├─ MySQL → Aurora MySQL or RDS MySQL
│     ├─ Oracle → RDS Oracle
│     └─ SQL Server → RDS SQL Server
│
┌─ NoSQL
│  ├─ Key-value? → DynamoDB (serverless, millisecond latency)
│  ├─ Document?
│  │  ├─ MongoDB compatible? → DocumentDB
│  │  └─ Simple key-value → DynamoDB
│  ├─ Wide-column? → Keyspaces (Cassandra)
│  └─ Graph? → Neptune
│
┌─ Caching
│  ├─ DynamoDB acceleration? → DAX
│  ├─ Redis features? → ElastiCache Redis or MemoryDB
│  └─ Simple caching? → ElastiCache Memcached
│
└─ Specialized
   ├─ Time-series data → Timestream
   ├─ Search/analytics → OpenSearch
   ├─ Immutable ledger → QLDB
   └─ Graph relationships → Neptune

Service Selection Guide

Use RDS When

  • Need traditional relational database
  • ACID transactions required
  • Complex queries and joins
  • Existing application using MySQL, PostgreSQL, Oracle, or SQL Server
  • Want managed service with automated backups, patches

Engine selection:

  • PostgreSQL - Advanced features, JSON, full-text search
  • MySQL - Most popular, wide compatibility
  • MariaDB - MySQL fork, additional features
  • Oracle - Enterprise features, Oracle compatibility
  • SQL Server - Windows applications, .NET integration

Instance classes:

  • db.t3/t4g - Burstable, development/test
  • db.m5/m6g - General purpose, balanced
  • db.r5/r6g - Memory-optimized, large datasets
  • db.x2 - Extreme memory, in-memory databases

When to use Aurora instead:

  • Need 5x MySQL or 3x PostgreSQL performance
  • High availability requirements
  • Read-heavy workloads (15 read replicas)
  • Global database requirements
  • Serverless workloads (Aurora Serverless)

Use Aurora When

  • Need high performance (5x MySQL, 3x PostgreSQL)
  • High availability critical (6 copies across 3 AZs)
  • Read-heavy workloads (up to 15 read replicas)
  • Global applications (Aurora Global Database)
  • Variable workloads (Aurora Serverless v2)
  • Want automatic scaling storage (up to 128 TB)

Aurora vs RDS:

  • Aurora: Higher performance, better HA, auto-scaling storage, more expensive
  • RDS: Standard performance, manual scaling, lower cost

Aurora Serverless v2:

  • Automatically scales compute capacity
  • Scales in fine-grained increments
  • Good for variable workloads
  • Instant scaling without warmup

Use DynamoDB When

  • Need single-digit millisecond latency
  • Serverless, fully managed database
  • Massive scale (10 trillion requests/day)
  • Simple access patterns (key-value, simple queries)
  • Variable workload patterns
  • Want automatic scaling

Access patterns:

  • Point lookups - Get item by primary key
  • Range queries - Query items by partition key + sort key
  • Secondary indexes - Alternative access patterns (GSI, LSI)
  • Avoid complex joins and aggregations

Capacity modes:

  • On-demand - Pay per request, automatic scaling
  • Provisioned - Reserve capacity, predictable cost

Best practices:

  • Design partition keys for even data distribution
  • Use composite sort keys for flexibility
  • Implement access patterns with GSIs
  • Use DynamoDB Streams for change data capture
  • Consider DynamoDB Accelerator (DAX) for caching

Use ElastiCache When

  • Need sub-millisecond latency
  • Session store for web applications
  • Real-time analytics dashboard
  • Caching database query results
  • Leaderboards and gaming

Redis vs Memcached:

Redis:

  • Rich data structures (strings, hashes, lists, sets, sorted sets)
  • Persistence (snapshots, AOF)
  • Replication and automatic failover
  • Pub/sub messaging
  • Lua scripting
  • Transactions

Memcached:

  • Simple key-value caching
  • Multi-threaded performance
  • Less memory overhead
  • No persistence
  • Good for simple caching needs

When to use MemoryDB instead:

  • Need Redis durability (writes persisted)
  • Primary database with Redis compatibility
  • Microsecond read, single-digit millisecond writes
  • Multi-AZ durability

Use Redshift When

  • Data warehousing and analytics
  • OLAP (Online Analytical Processing)
  • Business intelligence queries
  • Petabyte-scale datasets
  • Complex aggregations and joins
  • Historical data analysis

Redshift vs RDS:

  • Redshift: Columnar storage, analytics, complex queries, petabyte scale
  • RDS: Row-based storage, OLTP, simple queries, smaller datasets

Deployment options:

  • Provisioned clusters - Reserved capacity, predictable cost
  • Serverless - Automatic scaling, pay per use

Best practices:

  • Use columnar compression
  • Define sort and distribution keys
  • Leverage Redshift Spectrum for S3 queries
  • Use materialized views for repeated queries
  • Implement workload management

Use DocumentDB When

  • Need MongoDB compatibility
  • Migrating from MongoDB
  • Document-based data model
  • JSON/BSON documents
  • Want managed MongoDB-compatible service

DocumentDB vs DynamoDB:

  • DocumentDB: MongoDB API, complex queries, transactions
  • DynamoDB: AWS-native, simpler API, higher scale

Use Neptune When

  • Graph data model (nodes and relationships)
  • Social networks
  • Recommendation engines
  • Knowledge graphs
  • Fraud detection patterns
  • Network analysis

Query languages:

  • Apache TinkerPop Gremlin - Graph traversal
  • SPARQL - RDF graphs
  • openCypher - Property graph queries

Use Timestream When

  • Time-series data (IoT, DevOps, analytics)
  • High write throughput
  • Time-based queries and aggregations
  • Automatic data lifecycle management
  • Built-in time-series analytics

Use cases:

  • IoT sensor data
  • Application monitoring
  • DevOps metrics
  • Financial tick data

Common Patterns

Pattern 1: Web Application Database

RDS Multi-AZ with Read Replicas

import * as aws from "@pulumi/aws";

// Subnet group for RDS
const subnetGroup = new aws.rds.SubnetGroup("db-subnet", {
    subnetIds: privateSubnetIds,
    tags: { Name: "Main DB subnet group" },
});

// Security group
const dbSg = new aws.ec2.SecurityGroup("db-sg", {
    vpcId: vpc.id,
    ingress: [{
        protocol: "tcp",
        fromPort: 5432,
        toPort: 5432,
        securityGroups: [appSg.id],
    }],
});

// Primary database instance
const db = new aws.rds.Instance("primary-db", {
    engine: "postgres",
    engineVersion: "15.4",
    instanceClass: "db.t3.medium",
    allocatedStorage: 100,
    storageType: "gp3",
    dbName: "myapp",
    username: "admin",
    password: dbPassword.result, // From secrets manager
    multiAz: true, // High availability
    dbSubnetGroupName: subnetGroup.name,
    vpcSecurityGroupIds: [dbSg.id],
    backupRetentionPeriod: 7,
    backupWindow: "03:00-04:00",
    maintenanceWindow: "sun:04:00-sun:05:00",
    enabledCloudwatchLogsExports: ["postgresql", "upgrade"],
    storageEncrypted: true,
    skipFinalSnapshot: false,
    finalSnapshotIdentifier: "final-snapshot",
});

// Read replica for scaling reads
const readReplica = new aws.rds.Instance("read-replica", {
    replicateSourceDb: db.id,
    instanceClass: "db.t3.medium",
    publiclyAccessible: false,
});

export const dbEndpoint = db.endpoint;
export const replicaEndpoint = readReplica.endpoint;

Use when: Traditional web applications, ACID requirements

Pattern 2: Serverless API Backend

DynamoDB + Lambda

// DynamoDB table
const table = new aws.dynamodb.Table("users", {
    attributes: [
        { name: "userId", type: "S" },
        { name: "email", type: "S" },
        { name: "createdAt", type: "N" },
    ],
    hashKey: "userId",
    billingMode: "PAY_PER_REQUEST", // On-demand scaling
    globalSecondaryIndexes: [{
        name: "EmailIndex",
        hashKey: "email",
        projectionType: "ALL",
    }],
    streamEnabled: true,
    streamViewType: "NEW_AND_OLD_IMAGES",
    pointInTimeRecovery: { enabled: true },
    serverSideEncryption: { enabled: true },
    tags: { Name: "Users table" },
});

// Lambda function
const handler = new aws.lambda.Function("api", {
    runtime: "nodejs20.x",
    handler: "index.handler",
    role: lambdaRole.arn,
    code: new pulumi.asset.FileArchive("./app"),
    environment: {
        variables: {
            TABLE_NAME: table.name,
        },
    },
});

// DynamoDB stream processor
const streamProcessor = new aws.lambda.Function("stream-processor", {
    runtime: "nodejs20.x",
    handler: "stream.handler",
    role: streamRole.arn,
    code: new pulumi.asset.FileArchive("./stream"),
});

const eventSourceMapping = new aws.lambda.EventSourceMapping("dynamodb-stream", {
    eventSourceArn: table.streamArn,
    functionName: streamProcessor.arn,
    startingPosition: "LATEST",
});

export const tableName = table.name;

Use when: Serverless applications, variable workloads, high scale

Pattern 3: High-Performance API

Aurora Serverless v2 + Connection Pooling

// Aurora Serverless v2 cluster
const cluster = new aws.rds.Cluster("aurora-cluster", {
    engine: "aurora-postgresql",
    engineMode: "provisioned",
    engineVersion: "15.4",
    databaseName: "myapp",
    masterUsername: "admin",
    masterPassword: dbPassword.result,
    dbSubnetGroupName: subnetGroup.name,
    vpcSecurityGroupIds: [dbSg.id],
    serverlessv2ScalingConfiguration: {
        minCapacity: 0.5, // 0.5 ACU minimum
        maxCapacity: 16,  // 16 ACU maximum
    },
    enabledCloudwatchLogsExports: ["postgresql"],
    backupRetentionPeriod: 7,
    storageEncrypted: true,
});

// Serverless v2 instances
const writer = new aws.rds.ClusterInstance("writer", {
    clusterIdentifier: cluster.id,
    instanceClass: "db.serverless",
    engine: cluster.engine,
    engineVersion: cluster.engineVersion,
});

const reader = new aws.rds.ClusterInstance("reader", {
    clusterIdentifier: cluster.id,
    instanceClass: "db.serverless",
    engine: cluster.engine,
    engineVersion: cluster.engineVersion,
});

// RDS Proxy for connection pooling
const proxy = new aws.rds.Proxy("db-proxy", {
    name: "aurora-proxy",
    engineFamily: "POSTGRESQL",
    auths: [{
        authScheme: "SECRETS",
        secretArn: dbSecret.arn,
    }],
    roleArn: proxyRole.arn,
    vpcSubnetIds: privateSubnetIds,
    vpcSecurityGroupIds: [proxySg.id],
    requireTls: true,
});

const proxyTarget = new aws.rds.ProxyDefaultTargetGroup("target", {
    dbProxyName: proxy.name,
    connectionPoolConfig: {
        maxConnectionsPercent: 100,
        maxIdleConnectionsPercent: 50,
        connectionBorrowTimeout: 120,
    },
});

const proxyTargetGroupAttachment = new aws.rds.ProxyTarget("attachment", {
    dbProxyName: proxy.name,
    targetGroupName: proxyTarget.name,
    dbClusterIdentifier: cluster.id,
});

export const proxyEndpoint = proxy.endpoint;

Use when: Variable workloads, Lambda functions, high connection counts

Pattern 4: Caching Layer

ElastiCache Redis + RDS

// ElastiCache subnet group
const cacheSubnetGroup = new aws.elasticache.SubnetGroup("cache-subnet", {
    subnetIds: privateSubnetIds,
});

// Redis cluster
const redis = new aws.elasticache.ReplicationGroup("redis-cluster", {
    replicationGroupId: "app-cache",
    description: "Application cache layer",
    engine: "redis",
    engineVersion: "7.0",
    nodeType: "cache.t3.micro",
    numCacheClusters: 2, // Primary + 1 replica
    automaticFailoverEnabled: true,
    multiAzEnabled: true,
    subnetGroupName: cacheSubnetGroup.name,
    securityGroupIds: [cacheSg.id],
    atRestEncryptionEnabled: true,
    transitEncryptionEnabled: true,
    authToken: cachePassword.result,
    snapshotRetentionLimit: 5,
    snapshotWindow: "03:00-05:00",
    maintenanceWindow: "sun:05:00-sun:07:00",
    autoMinorVersionUpgrade: true,
});

// Application uses cache-aside pattern
export const redisEndpoint = redis.primaryEndpointAddress;
export const redisPort = redis.port;

// Example application code pattern:
// 1. Check Redis for cached data
// 2. If miss, query RDS
// 3. Store result in Redis with TTL
// 4. Return data

Use when: Read-heavy workloads, reduce database load, session storage

Pattern 5: Analytics Data Warehouse

Redshift + S3 Data Lake

// Redshift subnet group
const redshiftSubnetGroup = new aws.redshift.SubnetGroup("redshift-subnet", {
    subnetIds: privateSubnetIds,
    tags: { Name: "Redshift subnet group" },
});

// Redshift cluster
const dataWarehouse = new aws.redshift.Cluster("analytics", {
    clusterIdentifier: "data-warehouse",
    databaseName: "analytics",
    masterUsername: "admin",
    masterPassword: redshiftPassword.result,
    nodeType: "dc2.large",
    numberOfNodes: 2,
    clusterSubnetGroupName: redshiftSubnetGroup.name,
    vpcSecurityGroupIds: [redshiftSg.id],
    encrypted: true,
    enhancedVpcRouting: true,
    automatedSnapshotRetentionPeriod: 7,
    skipFinalSnapshot: false,
    finalSnapshotIdentifier: "final-snapshot",
});

// S3 bucket for data lake
const dataLake = new aws.s3.BucketV2("data-lake", {
    bucket: "analytics-data-lake",
});

// IAM role for Redshift to access S3
const redshiftRole = new aws.iam.Role("redshift-role", {
    assumeRolePolicy: JSON.stringify({
        Version: "2012-10-17",
        Statement: [{
            Effect: "Allow",
            Principal: { Service: "redshift.amazonaws.com" },
            Action: "sts:AssumeRole",
        }],
    }),
});

const s3Policy = new aws.iam.RolePolicyAttachment("s3-access", {
    role: redshiftRole.name,
    policyArn: "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess",
});

const clusterIamRole = new aws.redshift.ClusterIamRoles("cluster-roles", {
    clusterIdentifier: dataWarehouse.id,
    iamRoleArns: [redshiftRole.arn],
});

// Query S3 data with Redshift Spectrum
// CREATE EXTERNAL SCHEMA spectrum_schema
// FROM DATA CATALOG DATABASE 'spectrum_db'
// IAM_ROLE 'arn:aws:iam::account:role/role'
// CREATE EXTERNAL DATABASE IF NOT EXISTS;

export const redshiftEndpoint = dataWarehouse.endpoint;

Use when: Business intelligence, analytics, data warehousing

Performance Guidelines

RDS Performance

  • Use provisioned IOPS (io1/io2) for high-performance workloads
  • Enable Performance Insights for monitoring
  • Use read replicas to scale read traffic
  • Optimize queries with proper indexing
  • Consider Aurora for higher performance

DynamoDB Performance

  • Design partition keys for even distribution
  • Use eventually consistent reads when possible
  • Implement caching with DAX for read-heavy workloads
  • Use batch operations for multiple items
  • Monitor consumed capacity with CloudWatch

ElastiCache Performance

  • Use cluster mode for Redis (horizontal scaling)
  • Choose appropriate node types for workload
  • Use connection pooling in applications
  • Monitor cache hit ratio
  • Set appropriate TTLs

Aurora Performance

  • Use reader endpoints for read traffic
  • Enable query cache
  • Use parallel query for analytics
  • Implement connection pooling with RDS Proxy
  • Monitor with Performance Insights

Cost Optimization

RDS Cost Reduction

  1. Right-size instances - Use Performance Insights
  2. Use Reserved Instances - 1 or 3-year commitments (up to 69% savings)
  3. Delete unused snapshots - Set retention policies
  4. Stop dev/test instances - Save ~70% when not in use
  5. Use Aurora Serverless - For variable workloads
  6. Leverage read replicas - Scale reads instead of upgrading primary

DynamoDB Cost Reduction

  1. Use on-demand for variable workloads - Pay per request
  2. Use provisioned for predictable traffic - Lower cost per request
  3. Enable auto-scaling - Match capacity to demand
  4. Archive old data - Export to S3 + DynamoDB archival
  5. Use DynamoDB Standard-IA - For infrequently accessed data
  6. Delete unused tables and GSIs - Reduce storage costs

ElastiCache Cost Reduction

  1. Right-size nodes - Monitor memory and CPU
  2. Use Reserved Nodes - Up to 55% savings
  3. Use t3/t4g for dev/test - Burstable instances
  4. Delete unused clusters - Monitor idle resources
  5. Use Graviton nodes - Better price/performance

Redshift Cost Reduction

  1. Use Reserved Nodes - Up to 75% savings
  2. Use Redshift Serverless - Pay only for usage
  3. Compress data - Reduce storage costs
  4. Delete old snapshots - Set retention policies
  5. Use S3 for cold data - Redshift Spectrum queries

Quick Links

Core Services

  • RDS - Relational Databases
  • DynamoDB - NoSQL Database
  • ElastiCache - In-Memory Cache
  • Redshift - Data Warehouse
  • Neptune - Graph Database
  • DocumentDB - MongoDB Compatible

Related Services

Guides

See Also

Install with Tessl CLI

npx tessl i tessl/npm-pulumi--aws

docs

index.md

quickstart.md

README.md

tile.json