Intermediate Cassandra

Clustering Columns and Sort Order in Cassandra

Deep dive into clustering columns, how they determine row ordering within partitions, and how to leverage them for efficient queries.

7 min read Tutorial

What Are Clustering Columns?

Clustering columns are the columns in a primary key that come after the partition key. They determine the physical sort order of rows within a partition on disk. Because data is stored sorted by clustering columns, range queries on these columns are extremely efficient -- Cassandra reads a contiguous slice of data rather than scanning.

Defining Clustering Order

By default, clustering columns sort in ascending order. You can change this with the CLUSTERING ORDER BY clause at table creation time. The sort order cannot be changed after the table is created.

-- Default ascending order
CREATE TABLE user_posts (
  user_id UUID,
  post_date TIMESTAMP,
  post_id UUID,
  content TEXT,
  PRIMARY KEY (user_id, post_date, post_id)
);

-- Descending order on post_date (newest first)
CREATE TABLE user_posts_desc (
  user_id UUID,
  post_date TIMESTAMP,
  post_id UUID,
  content TEXT,
  PRIMARY KEY (user_id, post_date, post_id)
) WITH CLUSTERING ORDER BY (post_date DESC, post_id ASC);

Range Queries on Clustering Columns

Clustering columns enable efficient range queries because data is stored contiguously. You must provide the partition key and respect the clustering column order -- you cannot skip a clustering column.

-- Range query on first clustering column (efficient)
SELECT * FROM user_posts
WHERE user_id = 550e8400-e29b-41d4-a716-446655440000
  AND post_date >= '2025-01-01'
  AND post_date < '2025-07-01';

-- Equality on first, range on second clustering column (valid)
SELECT * FROM sensor_readings
WHERE sensor_id = 'temp-001'
  AND reading_date = '2025-06-15'
  AND reading_time >= '08:00:00'
  AND reading_time <= '17:00:00';

-- INVALID: skipping the first clustering column
-- This will fail:
-- SELECT * FROM user_posts
-- WHERE user_id = ... AND post_id = ...;

Multiple Clustering Columns

When a table has multiple clustering columns, data is sorted hierarchically. Think of it like a nested sort: first by the first clustering column, then within each value by the second, and so on.

CREATE TABLE chat_messages (
  channel_id TEXT,
  message_date DATE,
  message_time TIMESTAMP,
  message_id UUID,
  sender TEXT,
  body TEXT,
  PRIMARY KEY (channel_id, message_date, message_time, message_id)
) WITH CLUSTERING ORDER BY (message_date DESC, message_time DESC, message_id ASC);

-- Get today's messages for a channel (newest first)
SELECT * FROM chat_messages
WHERE channel_id = 'general'
  AND message_date = '2025-06-15'
LIMIT 50;

-- Get messages for a date range
SELECT * FROM chat_messages
WHERE channel_id = 'general'
  AND message_date >= '2025-06-01'
  AND message_date <= '2025-06-15';

ORDER BY in Queries

CQL supports ORDER BY, but only on clustering columns and only in the same or exact reverse direction as the defined clustering order. You cannot sort on arbitrary columns.

-- Valid: matches defined clustering order (DESC)
SELECT * FROM user_posts_desc
WHERE user_id = 550e8400-e29b-41d4-a716-446655440000
ORDER BY post_date DESC;

-- Valid: exact reverse of defined order
SELECT * FROM user_posts_desc
WHERE user_id = 550e8400-e29b-41d4-a716-446655440000
ORDER BY post_date ASC;

-- INVALID: cannot order by non-clustering column
-- SELECT * FROM user_posts ORDER BY content;

Key Takeaways

  • Clustering columns define physical sort order on disk, making range queries fast.
  • Set CLUSTERING ORDER BY at table creation -- it cannot be changed later.
  • You must query clustering columns in order; you cannot skip one in a WHERE clause.
  • ORDER BY in CQL only works on clustering columns in the defined or exact reverse direction.

Try this query in UnifySQL

Write, optimize, and collaborate on Cassandra queries with AI assistance.

Start Free