Expert guidance for distributed NoSQL databases (Cassandra, DynamoDB). Focuses on mental models, query-first modeling, single-table design, and avoiding hot partitions in high-scale systems.
73
73%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
This skill provides professional mental models and design patterns for distributed wide-column and key-value stores (specifically Apache Cassandra and Amazon DynamoDB).
Unlike SQL (where you model data entities), or document stores (like MongoDB), these distributed systems require you to model your queries first.
| Feature | SQL (Relational) | Distributed NoSQL (Cassandra/DynamoDB) |
|---|---|---|
| Data modeling | Model Entities + Relationships | Model Queries (Access Patterns) |
| Joins | CPU-intensive, at read time | Pre-computed (Denormalized) at write time |
| Storage cost | Expensive (minimize duplication) | Cheap (duplicate data for read speed) |
| Consistency | ACID (Strong) | BASE (Eventual) / Tunable |
| Scalability | Vertical (Bigger machine) | Horizontal (More nodes/shards) |
The Golden Rule: In SQL, you design the data model to answer any query. In NoSQL, you design the data model to answer specific queries efficiently.
You typically cannot "add a query later" without migration or creating a new table/index.
Process:
Data is distributed across physical nodes based on the Partition Key (PK).
status="active" or gender="m") creates Hot Partitions, limiting throughput to a single node's capacity.Within a partition, data is sorted on disk by the Clustering Key (Cassandra) or Sort Key (DynamoDB).
WHERE user_id=X AND date > Y).Primary use: DynamoDB (but concepts apply elsewhere)
Storing multiple entity types in one table to enable pre-joined reads.
| PK (Partition) | SK (Sort) | Data Fields... |
|---|---|---|
USER#123 | PROFILE | { name: "Ian", email: "..." } |
USER#123 | ORDER#998 | { total: 50.00, status: "shipped" } |
USER#123 | ORDER#999 | { total: 12.00, status: "pending" } |
PK="USER#123"Don't be afraid to store the same data in multiple tables to serve different query patterns.
users_by_id (PK: uuid)users_by_email (PK: email)Trade-off: You must manage data consistency across tables (often using eventual consistency or batch writes).
((Partition Key), Clustering Columns)JOIN or GROUP BY. Pre-calculate aggregates in a separate counter table.ALLOW FILTERING: If you see this in production, your data model is wrong. It implies a full cluster scan.Before finalizing your NoSQL schema:
USER#123#2024-01).❌ Scatter-Gather: Querying all partitions to find one item (Scan).
❌ Hot Keys: Putting all "Monday" data into one partition.
❌ Relational Modeling: Creating Author and Book tables and trying to join them in code. (Instead, embed Book summaries in Author, or duplicate Author info in Books).