26 May

Data security in multi-tenant SaaS applications

Data is the core of SaaS and having shipped two SaaS products on production in last four years I feel developing SaaS application requires extra safety measures on data security as compared to developing a general purpose application such as Chatting apps, client based solution, etc. Since in SaaS application resources are shared among clients so data security in term of data leakage between clients is a real challenge, In this post I would explain how proper measures should be taken to make sure data security in SaaS application.

Choosing what database structure you will go with is important question and it should be decided after carefully going through you requirements like what type of clients you will have enterprise/small business, how large your application is, etc. because database structure in SaaS application impact directly to the security as well performance issues. Normally you have three choices to structure your database for SaaS

  1. Single Tenant (Each client will have a physically different database)
  2. Multi-Tenant (Single database for all the clients where each client data is shared)
  3. Multi-Tenant with multi-schema (Single database for all clients, but each client will have a separate, but homogeneous schema structure in the single database)

Single Tenant for each client gives you perfect solution to data security as data of each client is physically separated, but it is not cost efficient solution as each client has to pay cost of a fully managed database, also at application level you have to maintain connections to each client’s database separately and managing backups across all tenants is a real deal. It could be a good choice where you have Banks or enterprise level clients demanding their data to be separated on physically different server.

I was in huge favour of Multi-Tenant with multiple schema when I was developing SaaS, because it offers you same kind of data transparency among tenants as Single Tenant on a single database and it is cost efficient solution, but there were four things I didn’t choose Multi-Tenant with multiple schema and those were

  • Our application has more then 60 tables so having N client means database will have n*60 tables and sooner or later this limit will eventually reach the maximum number of tables in a database
  • Managing backups would have been a real deal at production
  •  It becomes difficult to maintain homogeneous schema structure as your application grow broken migration cause huge pain (I have practically experience these bugs on production)
  • Practically you have to switch schema with each request you make to Database

So Multi-Tenant with multiple schema is not a good choice for an application where you have large number of tables and large number of clients, still I would recommend it if you have small number of tables in your application and measured number of clients and data security is your priority.

Multi-Tenant isn’t good fit if data security is your main concern at first look since data of all clients will be shared, but you can take safety measures at your application level to make use of this cost effective and easy to manage solution. Since all clients are using same database same schema and same tables so making sure that each client can access, update and delete only its own data become high priority challenge, As many a time you are playing directly with incoming parameters which can be manipulated easily to mitigate records of other clients/tenants if not ensured properly. There are two widely known approaches to correctly identifying the tenant against each HTTP request

  1. Sub-domain many SaaS application use sub-domain feature like if your application is hosted at www.application.com then client named client1 will have client1.application.com and you can easily have sub-domain to know which client’s HTTP request it is and go to database for this particular client only
  2. Maintain tenant in sessions

After successfully identifying tenant, next challenge is to make sure you shield your database logically such as it only reduced to that tenant. At the ground level you have to add a condition in WHERE caluse every time you query to database. At first look it seems pretty straight forward but any negligence in this would expose data of one tenant to the other tenants. I would never recommend appending condition in WHERE cause manually this should be done right way either via using library such as acts_as_tenant or writing a function in your database class which automatically append this WHERE clause for tenant every time you query for data.

External Storage Apart from you primary database it is quite common that you have to use external storage such as Redis or even at times you depend on external services like Firebase, Amazon S3. The same above principle that each tenant can access, update and delete only its own data applies here too and there should be no way to reverse engineer and mitigate data present in these external storage and services such that it affects other tenants.

Feel free to contact me at @alihaider907 about this article or building SaaS products.

Thanks

20 Jun

Rails faker gem customised for Pakistani locale

Ever since I am working in open source I always have had wished to contribute towards open source community and thank GOD today I have submitted my first ever pull request to Faker gem. I have added customised locale for Pakistani names, provinces, telephone numbers, postal codes etc.
It is successfully merged to the original repository. I am the happiest person at the moment YAHOOOOOOO

12 Nov

Everyday terminal commands for Backend Developers

Today I am going to share most commonly used terminal commands for Backend Developers. I am a Backend developer and has been using these commands on a daily basis during development. I hope you will find some of them handy

grep

grep is one of the most commonly used terminal command, grep can be used in multiple ways to filter live logs, to filter text with in file or files, to filter process followings are mine daily uses of grep command

filter words from log

tail -f /path/to/log/file | grep 'word1 word2 word3'

filter process

ps aux | grep process_name

filter previous commands

history | grep command

grep can also be used to find a string within files in a folder. Although today you can search string within a folder with your favorite text editor as well. But you should also know this power of grep

grep -rnw 'path_to_file' -e "string_to_search"

For detail understanding of grep string search with in a folder please read this answer

Kill process

kill a process safely

kill -9 PID

Ctrl + R

Ctrl + R is not a command, but a very handy reverse search keyboard shortcut to filter commands you have previously typed in the terminal. keep this shortcut in your bag it might save your precious time while you work on production mode

chmod

while permissions is very broad topic in linux and unix operating system. You must have adequate know how of chmod to set permissions of files and folders
basic syntax of chmod is

chmod permissions file_or_folder_path

permissions in the chmod command requires special attention. Permission parameter is three digit integer first digit represents user, second digit represents group and third digit represents other. Each digit of three digits is further sum of read+write+executable where

4 stands for read
2 stands for write
1 stands for execute
0 stands for no permission

so permission 655 means user can read and write and can not execute ( 4+2+0), group can read but can not write and can execute ( 4+0+1) and others can read but can not write and can execute (4+0+1)

ssh-copy-id

If you consistently need to switch from local to server and vice versa, set up ssh-copy-id once to ssh to remote server again and again without password ( you can do same with by adding you ssh keys to your remote server )

ssh-copy-id username@remote_ip

apt-cache search

You may have used apt-get install a lot but apt-cache search let you search for apt packages

apt-cache search pacakage_to_search
20 Sep

Let’s debug nginx, unicorn errors

This tutorial is particularly intended for nginx, unicorn and rails environment. But you can replace unicorn with any Rake web server i.e. puma, thin, passenger etc. which runs behind nginx since they all communicate with nginx through sock files and these sock files most of the time become root cause of errors.

Hold a mug of coffee/tea and let’s debug your configurations.

Before digging into configurations make sure your nginx and unicorn are running properly. For nginx run following command and check nginx process is running or not

ps aux | grep nginx

For unicorn

ps aux | grep unicorn

If either of nginx or unicorn not running, make them run and check if this was all you needed.

Lets now go through with errors

502 Bad Gateway

One of the most common problem in unicorn nginx configurations is 502 bad gateway. Followings are possible reasons of 502 bad gateway

Sock file path

Root cause of 502 bad gateway is no communication between nginx and unicorn through a shared socket which means nginx cannot find sock file on which unicorn is listening on, check your nginx configurations

...
upstream unicorn_server { 
server unix:/path/to/your/unicorn.sock; 
}
...

sock file path in upstream block should exactly match listen sock file path in your unicorn conf file

....
listen '/path/to/your/unicorn.sock', :backlog => 64
....

If this is different for your configurations, make them same and restart your nginx and unicorn then check error.

Buffer Size

nginx buffer size could be another reason of bad gateway. Open your nginx log with tail and check whether it’s a buffer size issue

tail -f /var/log/nginx/error.log

Reload your home page and see if you get

upstream sent too big header while reading response header from upstream client 

in your nginx log. If Yes then open your nginx conf /etc/nginx/nginx.conf (default path) and add following to in http block

proxy_buffer_size   128k;
proxy_buffers   4 256k;
proxy_busy_buffers_size   256k;

restart nginx and reload page and check error

Permission Denied

If you are getting following response

pm

then probably it is a sock file permission denied issue. Root cause of this error is when nginx cannot read unicorn’s sock file (i.e. when your unicorn sock file is owned by a user who has root or higher permissions then the nginx user)

This could be either solved by changing permission of the sock file so that nginx can read it or increase the permissions of nginx so that it can read it (but this is a bad way). Best way is to create your sock file inside /tmp directory and point nginx to the sock file inside /tmp directory ( if you are on fedora then sock file should be in /var/run/ )

Restart nginx and unicorn and check

if your on centos then you can be victim of running nginx as httpd_t or unconfined_t follow Nginx + Rails + Unicorn Permission Error: ‘sudo nginx’ vs ‘sudo service nginx start’ for details.

If you still facing same error please comment below with your nginx and unicorn logs. I will surely reply at earliest.

my mug is finished … Happy Deployment 😉