Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM
Architecture design is a must while developing a SaaS application to ensure its scalability and optimising infrastructure costs. In this blog, Lets discuss the implementation of one such architecture with Quarkus java framework and Hibernate ORM
By Vishal Bihani , CoffeeBeans Consulting
Suppose you have a great product idea, you started working on it and developed a cool SaaS product. You get new tenants to use your product and they liked it. In a nutshell, your product is a hit.
Everything seems to be going well until you realize you are bleeding money with every increasing tenant. You want to scale up but you are unable to do so because you have already exhausted all your cash.
You try to figure out what is going wrong and eventually look into your application infrastructure costs.
You understood that every time a new tenant onboards your application you deploy a new server, database, etc. for that tenant. Separate servers and/or databases are only useful for computing or data-intensive applications but yours is neither of that.
In such scenarios, multi-tenant architecture can be very useful in designing applications with shared resources. Multiple combinations are possible some of which are:
1. Separate servers and shared database
2. Shared server and separate databases
3. Shared server and shared database
How should I decide which resource to keep separate and which to be shared?
The decision should be made on each resource after looking at its constraints for scaling up. It’s good if each of your resources scales individually based on the utilization rather than complete application scaling up and remaining underutilized.
There are a few things one should keep in mind while deciding.
For servers:
If your business logic is very customized for each tenant then shared servers are not an optimal solution.
OR
If your application is compute-intensive or each tenant is limited by their respective compute then separate servers are a good solution.
For databases:
Generally, it’s a trade-off between data isolation and resource utilization. Development complexity and scaling constraints are also key decision factors.
A separate database provides tenant-specific customization freedom and ease in monitoring per tenant usage and billing them for backup and disaster recovery.
Multi-tenant database patterns
There are three types of patterns:
1. Database per tenant
2. Schema per tenant
3. Row per tenant
Database per tenant
Separated databases will be provisioned for each tenant. It achieves maximum data isolation and will give you freedom for high customization and tenant-specific operations to the database.
The development complexity and cost of this approach are high.
Schema per tenant
Separate schemas are created for each tenant. It achieves data isolation at schema level.
High customization and tenant-specific operations like backup, monitoring, and encryption are not supported in many databases.
The development complexity is high but the cost is low.
Row per tenant
In this approach, all the tenants’ data are stored in the same table. Each row is identified by the tenant identifier. Some databases like PostgreSQL support RLS (Row-Level Security) where the rows are implicitly filtered based on the tenant and there is no need to explicitly write the WHERE clause with the tenant identifier.
There is no data isolation and more care has to be put into writing database queries at the application level.
The development complexity and cost are low.
In this blog, we will discuss the implementation of the shared server and schema-based shared database architecture.
What we will build?
We will build an app that will store the information of the users of the tenants.
Each tenant will have its separate schema which will store its users’ details. A public schema that will be accessed by every tenant will contain the pricing detail of the product.
Database pattern
Tables
Implementation
One important factor while approaching this solution is whether or not frameworks or tools are available that support this architecture and pattern.
For the implementation part, we will be using the Quarkus framework to deploy the REST APIs and perform the business logic.
The Quarkus is an open-source java framework developed by Red Hat which empowers the developers to build cloud-native and modern java applications. You can learn more about Quarkus here.
PostgreSQL will be used as the database. It is an open-source SQL database.
Hibernate-ORM supports schema-based multi-tenancy.
Let’s start
Go to https://code.quarkus.io/ and fill in the application configuration.
Add the following extensions
Open in an IDE and run the following command from the terminal:
mvn compile quarkus:dev
It will download all the dependencies and start the server with the dev profile activated. In dev, profile Quarkus provides the support for live reloading.
After the server has started, perform a GET request to the following URL:
http://localhost:8080/hello
It should return ‘Hello RESTEasy’
Create entities
Package information is important, all the entities of the public (shared) schema should reside in the same package. Similarly for the tenant schema. We will discuss this in detail later while discussing multiple persistence units.
Hibernate does not support database generation (create or update) in the multi-tenant mode so we need to create the databases ourselves.
Create pricing_details table on public schema
CREATE TABLE public.pricing_details
(
id integer,
product_name character varying NOT NULL,
price integer NOT NULL,
created_at date,
updated_at date,
PRIMARY KEY (id)
);
Create schema for Tenant A (GOOGL) and Tenant B (AAPL)
CREATE SCHEMA googl
AUTHORIZATION postgres;
CREATE SCHEMA aapl
AUTHORIZATION postgres;
Create user_details table on both schema
CREATE TABLE googl.user_details
(
id integer,
first_name character varying NOT NULL,
last_name character varying NOT NULL,
created_at date,
updated_at date,
PRIMARY KEY (id)
);CREATE TABLE aapl.user_details
(
id integer,
first_name character varying NOT NULL,
last_name character varying NOT NULL,
created_at date,
updated_at date,
PRIMARY KEY (id)
);
Insert values in the tables.
Create repositories for entities
Configuring application properties
write the following configuration properties in the application.properties
Only a single data source is configured because we have only one database to connect to.
A persistence unit defines a set of all entity classes that are managed by EntityManager instances in an application
We have separate persistence units for public schema (<default>) and tenant-specific schema (tenant)
The data source is set to <default> so that they connect to the default data source. Multi-tenant mode is set to SCHEMA
But we need to configure which persistence unit to manage which entities. For that, we need to create package-info.java files in the packages containing the entity files of public and tenant-specific files.
Okay, we configured which persistence should handle which entities but what about schema information?
Each tenant will have its own schema name then how will the persistence unit know which schema to go for. For this, we will create a Tenant Resolver which will resolve the schema name based on the incoming request.
We need to create a Tenant Resolver for all the persistence units.
We will implement the hibernate’s TenantResolver interface which provides a method to resolve schema names.
The @PersistenceUnitExtension takes the name of the persistence unit for which this TenantResolver implementation is.
@PersistenceUnitExtension is for the default persistence unit.
The @RequestScoped is for defining the lifecycle of this bean same to that of the request the application receives (Bean is created when the application receives the request and is destroyed when the request is completed). Once the request is completed it will be destroyed and will be recreated with every incoming request.
what about the below code?
public String resolveTenantId() {
String tenantName = context.request().getHeader("tenant");
return tenantName;
}
To keep the implementation simpler we will pass the tenant identifier as the header in the HTTP request with the key as “tenant” and the value as the identifier. The above code will retrieve that value and return it as the schema name for that tenant.
Now all the configuration is done but where are the APIs for fetching data?
‘/pricing/all’ endpoint will fetch the list of pricing details from the public schema and return the list.
‘/user/all’ endpoint requires a tenant identifier as a header and will fetch the data from that schema and will return the list of users.
Let’s insert some data into the tables
Let’s hit the APIs and see the results
We can use postman to check the APIs
Great! we finally achieved schema-based multi-tenancy for our application.
Important
For the simplicity of the implementation, we passed the tenant identifier in the header. It should not be passed in the header. A better approach will be to include it in the JWT tokens which can’t be modified and retrieve that by intercepting the HTTP request and storing the identifier in a TenantContext.
Now I hope you will apply this architecture to your SaaS product and save on your infrastructure costs.
Source code
You can find the source code here.
Pros
- Efficient use of resources.
- A better architecture for multi-tenancy compared to the row per tenant approach in terms of data isolation.
- Low infrastructure costs.
- Can easily scale.
Cons
- Development complexity is high.
- High customization per tenant will be difficult.
- Data backup for each tenant is not possible.
- Monitoring for each tenant may not be supported.
- Frameworks supporting this pattern may not be available in all languages.
Thanks for reading.