Wednesday, November 16, 2016

Why Multi-Dimensional Scaling in Couchbase

Couchbase has been supporting horizontal scaling in a monolithic fashion since its inception. You keep adding more nodes to the cluster to scale and improve performance (all nodes being exactly the same). This single dimension scaling works to a great extent as all services - Query, Index, and Data scale at the same rate. But, they all are unique and have specific resource requirement.

Let's profile these services in detail and their specific hardware requirements to drive home the point - why MDS is required! 

This feature got added in Couchbase 4.0.

Query Service primarily executes Couchbase native queries, N1QL(similar to SQL, pronounced as nickel - leverages the flexibility of JSON and power of SQL). The query engine parses the query, generates execution plan and then executes the query in collaboration with index service and data service. The faster queries are executed, the better the performance.

Faster query processing requires more CPU or fast processor (and less memory & HDD). More cores will help in processing queries in parallel. 

Reference on  - Query Data with n1ql

Index Service performs indexing with Global Secondary Indexes (GSI - similar to B+tree used commonly in relational DBs). Index is a data structure which provides quick and efficient means to access data.  Index service creates and maintains secondary indexes and also performs an index scan for N1QL queries. GSI/indexes are global across the cluster and are defined using CREATE INDEX statement in N1QL. 

Index service is disk intensive so Optimized storage / SSD  will help in boosting performance. It needs a basic processor and less RAM/memory.  As an administrator, you can configure GSI with either the standard GSI storage, which uses ForestDB underneath (since version 4.0), for indexes that cannot fit in memory or can pick the memory optimized GSI for faster in-memory indexing and queries. 

Data Service is central for Couchbase as data is the reason for any DB.  It stores all data and handles all fetch and update requests on data.  Data service is also responsible for creating and managing MapReduce views.   Active documents that exist only on the disk take much longer to access, which creates a bottleneck for both reading and writing data. Couchbase tries to keep as much data as possible in memory.
Data refers to (document) keys, metadata, and the working set or the actual document.   Couchbase relies on extensive caching to achieve high throughput and low read/write latency. In a perfect world, all data will be sitting in memory.

Data Service: Managed Cache (based on Memcached) + Storage Engine + View Query Engine

Memory and the speed of storage device affect performance (IO operations are queued by the server so faster storage helps to drain the queue faster). 


So, each type of service has it's own resource constraints. Couchbase introduced multi-dimensional scaling in version 4.0 so that these services can be independently optimized and assigned the kind of hardware which will help them excel. One size fits all is not going to work (especially when you are looking for higher throughput i.e. sub-milliseconds response times).  For example, storing data and executing queries on the same node will cause CPU contention. Similarly, storing data and indexes on the same node will cause disk IO contention.

Through MDS, we can separate, isolate and scale these three services independent of each other which will improve resource utilization as well as performance.


happy learning !!!

No comments:

Post a Comment