Why Primary Keys Matter in Cassandra

In Cassandra, the primary key is the most important design decision you will make. Unlike relational databases where the primary key simply ensures uniqueness, Cassandra's primary key determines how data is distributed across the cluster, how it is stored on disk, and which queries are efficient. A poorly chosen primary key leads to hot spots, unbalanced clusters, and slow queries.

Anatomy of a Primary Key

A Cassandra primary key has two parts: the partition key (determines which node stores the data) and optional clustering columns (determine sort order within a partition).

-- Single partition key, no clustering columns
-- PRIMARY KEY (user_id)
CREATE TABLE users (
  user_id UUID PRIMARY KEY,
  name TEXT,
  email TEXT
);

-- Single partition key + clustering columns
-- Partition key: customer_id
-- Clustering columns: order_date, order_id
CREATE TABLE orders (
  customer_id TEXT,
  order_date TIMESTAMP,
  order_id UUID,
  total DECIMAL,
  PRIMARY KEY (customer_id, order_date, order_id)
);

-- Composite partition key + clustering column
-- Partition key: (country, city)
-- Clustering column: created_at
CREATE TABLE users_by_location (
  country TEXT,
  city TEXT,
  created_at TIMESTAMP,
  user_id UUID,
  name TEXT,
  PRIMARY KEY ((country, city), created_at)
);

The Partition Key

The partition key is hashed by Cassandra's partitioner to determine which node in the cluster stores that data. All rows sharing the same partition key are stored together on the same node, making reads within a single partition very fast.

-- All orders for 'cust-123' are on the same node
-- This query hits only ONE partition = fast
SELECT * FROM orders WHERE customer_id = 'cust-123';

-- Composite partition key requires ALL parts
SELECT * FROM users_by_location
WHERE country = 'US' AND city = 'Austin';

A good partition key distributes data evenly. Avoid keys with low cardinality (like boolean or status fields) as they create large, unbalanced partitions.

Composite vs. Compound Keys

These terms are often confused. A compound primary key has a single partition key plus clustering columns. A composite partition key uses multiple columns as the partition key (wrapped in double parentheses).

-- Compound key: partition=user_id, clustering=post_date
PRIMARY KEY (user_id, post_date)

-- Composite partition key: partition=(user_id, bucket)
PRIMARY KEY ((user_id, bucket), post_date)

Choosing the Right Partition Key

Pick columns that distribute data evenly across nodes (high cardinality).
All queries should include the full partition key in the WHERE clause.
Keep partition sizes under 100 MB for optimal performance.
Use composite partition keys or time bucketing to split large partitions.

Key Takeaways

The partition key determines data distribution -- it is the most critical schema decision.
Clustering columns define sort order within a partition and enable range queries.
Composite partition keys (double parentheses) group multiple columns into a single partition key.
Design your primary key around your query patterns, not your data relationships.

Try this query in UnifySQL

Write, optimize, and collaborate on Cassandra queries with AI assistance.

Start Free