Tuesday 23 January 2018

Multi-DB design for high performance web applications


Applications with a small user base employ a design with a single centralised database. The rate of data growth and load do not create contention for such applications. So a single db design will work well. However, each database has a maximum connection limit. PostgreSQL is 'good for a few hundred connections, but for 1000s it would be better to look at a connection pooling solution'. Caching and db connection pooling can be utilised to postpone hitting the limit. This single db approach is easy to maintain, test and orchestrate with developer operations. 

When there are specific performance requirements for #users to support, #api requests per second, read/write patterns and data isolation the single database design would be broken by design. Hitting its upper limit on performance/load will impact users. A competition that has a better design to address this will win. A multi-database design addresses the shortcomings by separating data across multiple databases.

A) Data can be logically partitioned. For example, user authentication and profile information can reside on one database while other application data like say catalog, reviews and feedbacks can reside in a different database. That way authentication connections are routed to that specific server while the other servers can utilise connections fully to provision data.



B) Another approach is to partition based on read and write operations. All writes will go to one database. This database will be periodically synced to another read-only database. This helps when there are regular writes to the db but, the read density is too high. Multiple read-only slaves which mirror the primary can be used as needed.



C) A third approach is to use dedicated databases for different regions/locations. This is useful when usage varies across locations. Each database can be configured differently to handle its own access/load patterns. For example, utilise separate databases to store data from each state, New South Wales, Victoria, WA and so on.



D) To partition all data with redundancy, it would be better to use a database that has sharding built into its design. MongoDB is a good choice. This is described in a previous post.

Multiple databases mean more effort in development, operations, support and maintenance. So tangible performance benefits must justify their use.