Tuesday, August 15, 2017

Couchbase Primary vs Secondary Indexes

Couchbase supports key-value as well as JSON based data model. In Key-value model you don't care about the type of value. In JSON model you have ability to perform queries on the individual attributes using N1QL queries.


Key-Value Model 

Without Index
Key-value store is schema less where the object gets mapped to a given key (Just like a HashMap or Dictionary).   Couchbase is more like a distributed HashMap. The value could be any supported data type (JSON, CSV, or BLOB). You perform any operation using the key or Document Id. In this case, Couchbase looks up the value corresponding to a given document id. In simple terms, it's just like a key lookup in a HashMap. Index doesn't play any role here.


Querybucket.get(docId);


With Index
Now what if you want number of documents in your bucket ?

QuerySELECT COUNT(*) FROM `bucket-name`

Above query is going to do full Bucket scan (similar to full table scan in SQL world). In SQL world, index on primary key gets created by default so you can easily perform above operation. But, in Couchbase, that's not the case. You will have to create explicit index to perform above query or any other ad-hoc query. So, if you want to create an index on the the key or document id, we can create primary index. 

QueryCreate PRIMARY INDEX index_name on `bucket-name`


JSON Model (Secondary Indexes)

If you want to complete control on your data and queries, Json model is going to be your choice. In above approaches you can't say like give me all the objects which has certain attribute value. 

In JSON based model, we can query through a SQL like expressive language named as N1QL(pronounced as nickel). This is much more flexible model, we can look for a document(s) through the keys contained inside JSON. Obviously, to optimise lookup/search we can create index on those attributes. These indexes are named as secondary indexes or more precisely Global Seconday Indexes.

QueryCREATE INDEX type_index ON `bucket-name`(type) USING GSI



Primary vs Secondary Indexes


  • Primary indexes index all the keys in a given bucket and are used when a secondary index cannot be used to satisfy a query and a full bucket scan is required. 
  • Secondary indexes can index a subset of the items in a given bucket and are used to make queries targeting a specific subset of fields more efficiently. 


--- happy learning !

No comments:

Post a Comment