20 Oct 2018

AWS S3 file upload from client side

Last week I pushed a new feature on production which involve file upload on AWS S3. When you are uploading file to S3 from frontend there is always a risk of exposing your AWS secrets to user, so you have following options to avoid this risk

  1. Involve server as middleware and upload via server API, you will have more control over it
  2. Allow users to upload directly to S3 anonymously

I don’t like both of above methods, as involving middleware you are uploading file twice frontend → server, server → s3 bucket and I don’t really want server to do this heavy lifting of uploading files to S3 on user behalf, while allowing user anonymous upload you are giving a blank check to you S3 bucket i.e. anyone in the world can upload to your s3 bucket, this was also not acceptable to me as this could have very swear consequences. I wanted to have bit more control over uploads while not compromising security issues, so I used AWS Security Token Service to generate secrets (access_key, access_secret) which lived temporarily and have limited access as you define it.

Configure an IAM user on AWS console to generate temporary secrets

In order to generate temporary secrets you need following configurations on AWS console

  1. create a new IAM user
  2. create a policy to allow upload on S3 bucket
  3. create a role for you IAM user to assume Security Token Service

Watch this video for configurations

Now, you can generate AWS STS with this ruby code

aws_sts = AWS::STS::Client.new(
          access_key_id: ENV['AWS_ACCESS_KEY_ID'],
          secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
        ).assume_role(
          role_arn: ENV['AWS_ROLE_ARN'],
          role_session_name: 'session_name',
          duration_seconds: 12*60*60)
// this will generate access_key, access_secret
aws_sts.credentials

AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are credentials you download after creating user and AWS_ROLE_ARN is value which I copied at the end of video

Enable ACL uploading to S3

You will be able to successfully upload files with secrets generated with above code, Since we have used a separate user to generate STS then only this user will be able to access file. To make this file accessible by other user as well you must send ACL: 'public-read' in request body or header while uploading file to S3. you can read more about ACL

I hope this help you improve your file upload as well. If you need any help please do ping me on twitter @alihaider907.

26 May 2017

Data security in multi-tenant SaaS applications

Data is the core of SaaS and having shipped two SaaS products on production in last four years I feel developing SaaS application requires extra safety measures on data security as compared to developing a general purpose application such as Chatting apps, client based solution, etc. Since in SaaS application resources are shared among clients so data security in term of data leakage between clients is a real challenge, In this post I would explain how proper measures should be taken to make sure data security in SaaS application.

Choosing what database structure you will go with is important question and it should be decided after carefully going through you requirements like what type of clients you will have enterprise/small business, how large your application is, etc. because database structure in SaaS application impact directly to the security as well performance issues. Normally you have three choices to structure your database for SaaS

  1. Single Tenant (Each client will have a physically different database)
  2. Multi-Tenant (Single database for all the clients where each client data is shared)
  3. Multi-Tenant with multi-schema (Single database for all clients, but each client will have a separate, but homogeneous schema structure in the single database)

Single Tenant for each client gives you perfect solution to data security as data of each client is physically separated, but it is not cost efficient solution as each client has to pay cost of a fully managed database, also at application level you have to maintain connections to each client’s database separately and managing backups across all tenants is a real deal. It could be a good choice where you have Banks or enterprise level clients demanding their data to be separated on physically different server.

I was in huge favour of Multi-Tenant with multiple schema when I was developing SaaS, because it offers you same kind of data transparency among tenants as Single Tenant on a single database and it is cost efficient solution, but there were four things I didn’t choose Multi-Tenant with multiple schema and those were

  • Our application has more then 60 tables so having N client means database will have n*60 tables and sooner or later this limit will eventually reach the maximum number of tables in a database
  • Managing backups would have been a real deal at production
  •  It becomes difficult to maintain homogeneous schema structure as your application grow broken migration cause huge pain (I have practically experience these bugs on production)
  • Practically you have to switch schema with each request you make to Database

So Multi-Tenant with multiple schema is not a good choice for an application where you have large number of tables and large number of clients, still I would recommend it if you have small number of tables in your application and measured number of clients and data security is your priority.

Multi-Tenant isn’t good fit if data security is your main concern at first look since data of all clients will be shared, but you can take safety measures at your application level to make use of this cost effective and easy to manage solution. Since all clients are using same database same schema and same tables so making sure that each client can access, update and delete only its own data become high priority challenge, As many a time you are playing directly with incoming parameters which can be manipulated easily to mitigate records of other clients/tenants if not ensured properly. There are two widely known approaches to correctly identifying the tenant against each HTTP request

  1. Sub-domain many SaaS application use sub-domain feature like if your application is hosted at www.application.com then client named client1 will have client1.application.com and you can easily have sub-domain to know which client’s HTTP request it is and go to database for this particular client only
  2. Maintain tenant in sessions

After successfully identifying tenant, next challenge is to make sure you shield your database logically such as it only reduced to that tenant. At the ground level you have to add a condition in WHERE caluse every time you query to database. At first look it seems pretty straight forward but any negligence in this would expose data of one tenant to the other tenants. I would never recommend appending condition in WHERE cause manually this should be done right way either via using library such as acts_as_tenant or writing a function in your database class which automatically append this WHERE clause for tenant every time you query for data.

External Storage Apart from you primary database it is quite common that you have to use external storage such as Redis or even at times you depend on external services like Firebase, Amazon S3. The same above principle that each tenant can access, update and delete only its own data applies here too and there should be no way to reverse engineer and mitigate data present in these external storage and services such that it affects other tenants.

Feel free to contact me at @alihaider907 about this article or building SaaS products.

Thanks