Advanced Cassandra

Cassandra Data Modeling Best Practices

Master query-driven data modeling in Cassandra, including denormalization strategies, partition sizing, and schema design patterns.

8 min read Tutorial

Query-Driven Design

Cassandra data modeling is fundamentally different from relational design. Instead of normalizing data and joining tables, you start with your queries and design tables to serve each query efficiently. This means the same data is often stored in multiple tables with different primary keys -- a strategy called denormalization. The guiding principle is: one table per query pattern.

The Modeling Workflow

Follow this process when designing your Cassandra schema:

  • Define your application queries (what data you need and how you filter/sort it).
  • Create one table per query pattern with the appropriate primary key.
  • Denormalize: duplicate data across tables to avoid multi-partition reads.
  • Validate partition sizes to ensure they stay within recommended limits.
-- Query 1: Get user by ID
CREATE TABLE users (
  user_id UUID PRIMARY KEY,
  name TEXT,
  email TEXT,
  department TEXT
);

-- Query 2: Get users by department
CREATE TABLE users_by_department (
  department TEXT,
  user_id UUID,
  name TEXT,
  email TEXT,
  PRIMARY KEY (department, user_id)
);

-- Query 3: Look up user by email
CREATE TABLE users_by_email (
  email TEXT PRIMARY KEY,
  user_id UUID,
  name TEXT,
  department TEXT
);

Partition Sizing

A partition that grows too large causes performance problems. Keep partitions under 100 MB and ideally under 100,000 rows. For time-series or event data, use time bucketing to bound partition growth.

-- BAD: unbounded partition growth
-- All events for a device go into one partition forever
CREATE TABLE device_events_bad (
  device_id TEXT,
  event_time TIMESTAMP,
  data TEXT,
  PRIMARY KEY (device_id, event_time)
);

-- GOOD: time-bucketed partition key
-- Partition size is bounded by the bucket (one month)
CREATE TABLE device_events (
  device_id TEXT,
  month TEXT,       -- e.g., '2025-06'
  event_time TIMESTAMP,
  data TEXT,
  PRIMARY KEY ((device_id, month), event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);

-- Query: get recent events for a device in June 2025
SELECT * FROM device_events
WHERE device_id = 'sensor-42'
  AND month = '2025-06'
LIMIT 100;

Denormalization Strategies

When maintaining multiple tables with the same data, you need a strategy to keep them in sync. Use logged batches for tables sharing the same partition key, or application-level writes for tables with different partition keys.

-- Logged batch: atomic write to multiple tables
-- Use ONLY when tables share the same partition key
BEGIN BATCH
  INSERT INTO users (user_id, name, email, department)
  VALUES (uuid(), 'Alice', 'alice@co.com', 'Engineering');

  INSERT INTO users_by_email (email, user_id, name, department)
  VALUES ('alice@co.com', 550e8400-e29b-41d4-a716-446655440000, 'Alice', 'Engineering');
APPLY BATCH;

For cross-partition batches, the coordinator must contact multiple nodes, which adds latency. Prefer individual writes from the application layer for better performance.

Anti-Patterns to Avoid

  • Relational thinking: Do not normalize data and expect to join. Cassandra has no JOIN.
  • Unbounded partitions: Always bucket time-series data by a time window.
  • Too many secondary indexes: Use dedicated tables instead of indexes for primary query patterns.
  • Large batches across partitions: Cross-partition batches hurt performance. Keep batches to a single partition.
  • Using ALLOW FILTERING: If you need it, your data model likely needs redesigning.

Key Takeaways

  • Start with queries, then design tables -- not the other way around.
  • Denormalize by creating one table per query pattern with duplicated data.
  • Keep partitions bounded: use time bucketing or composite keys to limit growth.
  • Avoid relational patterns like joins, normalization, and heavy use of secondary indexes.

Try this query in UnifySQL

Write, optimize, and collaborate on Cassandra queries with AI assistance.

Start Free