26 May 2017

Data security in multi-tenant SaaS applications

Data is the core of SaaS and having shipped two SaaS products on production in last four years I feel developing SaaS application requires extra safety measures on data security as compared to developing a general purpose application such as Chatting apps, client based solution, etc. Since in SaaS application resources are shared among clients so data security in term of data leakage between clients is a real challenge, In this post I would explain how proper measures should be taken to make sure data security in SaaS application.

Choosing what database structure you will go with is important question and it should be decided after carefully going through you requirements like what type of clients you will have enterprise/small business, how large your application is, etc. because database structure in SaaS application impact directly to the security as well performance issues. Normally you have three choices to structure your database for SaaS

  1. Single Tenant (Each client will have a physically different database)
  2. Multi-Tenant (Single database for all the clients where each client data is shared)
  3. Multi-Tenant with multi-schema (Single database for all clients, but each client will have a separate, but homogeneous schema structure in the single database)

Single Tenant for each client gives you perfect solution to data security as data of each client is physically separated, but it is not cost efficient solution as each client has to pay cost of a fully managed database, also at application level you have to maintain connections to each client’s database separately and managing backups across all tenants is a real deal. It could be a good choice where you have Banks or enterprise level clients demanding their data to be separated on physically different server.

I was in huge favour of Multi-Tenant with multiple schema when I was developing SaaS, because it offers you same kind of data transparency among tenants as Single Tenant on a single database and it is cost efficient solution, but there were four things I didn’t choose Multi-Tenant with multiple schema and those were

  • Our application has more then 60 tables so having N client means database will have n*60 tables and sooner or later this limit will eventually reach the maximum number of tables in a database
  • Managing backups would have been a real deal at production
  •  It becomes difficult to maintain homogeneous schema structure as your application grow broken migration cause huge pain (I have practically experience these bugs on production)
  • Practically you have to switch schema with each request you make to Database

So Multi-Tenant with multiple schema is not a good choice for an application where you have large number of tables and large number of clients, still I would recommend it if you have small number of tables in your application and measured number of clients and data security is your priority.

Multi-Tenant isn’t good fit if data security is your main concern at first look since data of all clients will be shared, but you can take safety measures at your application level to make use of this cost effective and easy to manage solution. Since all clients are using same database same schema and same tables so making sure that each client can access, update and delete only its own data become high priority challenge, As many a time you are playing directly with incoming parameters which can be manipulated easily to mitigate records of other clients/tenants if not ensured properly. There are two widely known approaches to correctly identifying the tenant against each HTTP request

  1. Sub-domain many SaaS application use sub-domain feature like if your application is hosted at www.application.com then client named client1 will have client1.application.com and you can easily have sub-domain to know which client’s HTTP request it is and go to database for this particular client only
  2. Maintain tenant in sessions

After successfully identifying tenant, next challenge is to make sure you shield your database logically such as it only reduced to that tenant. At the ground level you have to add a condition in WHERE caluse every time you query to database. At first look it seems pretty straight forward but any negligence in this would expose data of one tenant to the other tenants. I would never recommend appending condition in WHERE cause manually this should be done right way either via using library such as acts_as_tenant or writing a function in your database class which automatically append this WHERE clause for tenant every time you query for data.

External Storage Apart from you primary database it is quite common that you have to use external storage such as Redis or even at times you depend on external services like Firebase, Amazon S3. The same above principle that each tenant can access, update and delete only its own data applies here too and there should be no way to reverse engineer and mitigate data present in these external storage and services such that it affects other tenants.

Feel free to contact me at @alihaider907 about this article or building SaaS products.

Thanks