Why Secondary Indexes?

By default, Cassandra only allows efficient queries on partition key and clustering columns. Secondary indexes let you query on non-key columns without using ALLOW FILTERING. However, they come with significant trade-offs and should be used carefully.

Creating a Secondary Index

The CREATE INDEX statement adds an index on a non-key column. The index is stored locally on each node, meaning queries on indexed columns must fan out to every node in the cluster.

-- Create a secondary index on email
CREATE INDEX idx_users_email ON users (email);

-- Now you can query by email
SELECT * FROM users WHERE email = 'alice@example.com';

-- Index on a collection column (values)
CREATE INDEX idx_article_tags ON articles (VALUES(tags));

-- Index on map keys
CREATE INDEX idx_prefs_keys ON user_profiles (KEYS(preferences));

-- Index on map entries
CREATE INDEX idx_prefs_entries ON user_profiles (ENTRIES(preferences));

When Secondary Indexes Work Well

Secondary indexes perform best under specific conditions. They are acceptable when combined with the partition key, and when the indexed column has moderate cardinality.

-- Good: index query combined with partition key
-- Only hits one node's local index
SELECT * FROM orders
WHERE customer_id = 'cust-123'
  AND status = 'shipped';

-- Acceptable: moderate cardinality column
-- e.g., status with ~5 distinct values across millions of rows
CREATE INDEX idx_orders_status ON orders (status);

When to Avoid Secondary Indexes

Avoid secondary indexes on high-cardinality columns (like email or UUID) when querying without the partition key, because the query must contact every node. Also avoid them on very low cardinality columns (like boolean) as a large percentage of rows will match.

High-cardinality columns without partition key: fan-out to all nodes is expensive.
Very low cardinality (boolean): returns too many rows, not selective enough.
Frequently updated columns: index maintenance adds write overhead.
Tables with heavy delete patterns: tombstones in the index degrade performance.

Materialized Views as an Alternative

Instead of secondary indexes, you can create a materialized view that duplicates data with a different primary key. This is the query-driven approach to data modeling that Cassandra favors.

-- Create a materialized view for querying users by email
CREATE MATERIALIZED VIEW users_by_email AS
  SELECT * FROM users
  WHERE email IS NOT NULL AND user_id IS NOT NULL
  PRIMARY KEY (email, user_id);

-- Now query efficiently by email
SELECT * FROM users_by_email
WHERE email = 'alice@example.com';

Materialized views are maintained automatically by Cassandra but add write amplification. The most common and recommended alternative is to manually maintain a denormalized lookup table.

Managing Indexes

-- Drop an index
DROP INDEX IF EXISTS idx_users_email;

-- View existing indexes
DESC INDEX idx_users_email;

Key Takeaways

Secondary indexes are local to each node -- queries without a partition key fan out to all nodes.
They work best on moderate-cardinality columns combined with the partition key.
Avoid indexing high-cardinality, low-cardinality, or frequently updated columns.
Prefer denormalized tables or materialized views for production query patterns.

Try this query in UnifySQL

Write, optimize, and collaborate on Cassandra queries with AI assistance.

Start Free