True cloud services all use some mode of multitenancy — the ability for multiple customers (tenants) to share
the same applications and/or compute resources. It is through multitenant architectures that cloud services achieve high cost
efficiencies and can deliver low costs.
Multitenant architectures must balance these cost benefits with the need for individual tenants to secure their data
and applications. Cloud finds that few application development and delivery (AD&D) pros understand how multitenant architectures
balance sharing with security, and many have other concerns as well.
Our definition: Multitenancy defines IT architectures that let multiple customers
(tenants) share the same applications and/or compute resources with security, reliability, and consistent performance.
Our research yielded three major findings
about multitenant architectures. These are:
Multitenant architectures must strike a balance between sharing and security. To deliver cost savings and scalability,
a multitenant architecture must be able to manage dynamic resource consumption by its tenants without violating their security.
These two goals ultimately conflict with one another, since shared resources and individual security rarely go hand in hand.
Two
common multitenant architecture models have arisen. Dedicated resource models stake boundaries within shared infrastructure,
defining the resources a tenant can access, allowing for tangible and secure walls but lower flexibility. Metadata map models
chart protected pathways to shared resources, allowing for increased flexibility, but they ultimately may feel less secure.
Despite
resource sharing, multitenancy will often improve security. Most current enterprise security models are perimeter-based, making
you vulnerable to inside attacks. Multitenant services secure all assets at all times, since those within the main perimeter
are all different clients. Leveraging a mix of dedicated resources and metadata map architectures, these services can deliver
stronger security.
Our recommendation: Know how cloud multitenancy works before using it (or dismissing it).
Multitenancy
is here to stay. Our research and analysis indicates that multitenancy is not a less secure model — quite the opposite!
But you should still understand the type of multitenancy used by any service you consume, the security responsibilities you
must take on, and whether any supplemental responsibilities are yours to ensure strong security, service reliability, and
good performance.
Trust,
or the lack thereof, is the number one factor blocking the adoption of software as a service (SaaS). A case could be made
that data is the most important asset of any business—data about products, customers, employees, suppliers, and more.
And data, of course, is at the heart of SaaS.
SaaS applications provide customers with centralized, network-based
access to data with less overhead than is possible when using a locally-installed application. But in order to take advantage
of the benefits of SaaS, an organization must surrender a level of control over its own data, trusting the SaaS vendor to
keep it safe and away from prying eyes.
To earn this trust, one of the highest priorities for a prospective SaaS
architect is creating a SaaS data architecture that is both robust and secure enough to satisfy tenants or clients who are
concerned about surrendering control of vital business data to a third party, while also being efficient and cost-effective
to administer and maintain.
This is the second article in our series about designing multi-tenant applications. The first article, Architecture Strategies for Catching the
Long Tail, introduced the SaaS
model at a high level and discussed its challenges and benefits. It is available on MSDN. Other articles in the series will
focus on topics such as workflow and user interface design, overall security, and others.
In this article, we'll look at the continuum
between isolated data and shared data, and identify three distinct approaches for creating data architectures that fall at
different places along the continuum. Next, we'll explore some of the technical and business factors to consider when deciding
which approach to use. Finally, we'll present design patterns for ensuring security, creating an extensible data model, and
scaling the data infrastructure.
Three Approaches to Managing Multi-Tenant Data
The distinction between shared data and isolated data isn't binary.
Instead, it's more of a continuum, with many variations that are possible between the two extremes.
Data architecture
is an area in which the optimal degree of isolation for a SaaS application can vary significantly depending on technical and
business considerations. Experienced data architects are used to considering a broad spectrum of choices when designing an
architecture to meet a specific set of challenges, and SaaS is certainly no exception. We shall examine three broad approaches,
each of which lies at a different location in the continuum between isolation and sharing.
Separate
Databases
Storing
tenant data in separate databases is the simplest approach to data isolation.
Figure 1. This approach uses a different database for each tenant
Below Video....See
super rich white guy, Mr M.B. that started Salesforce.com...he worked at Apple way back in the day...then moved onto Oracle.....now...yeah...he's
got money, dude.....how 'bout you? You "brave" or "engaged in the cloud?
Computing resources
and application code are generally shared between all the tenants on a server, but each tenant has its own set of data that
remains logically isolated from data that belongs to all other tenants. Metadata associates each database with the correct
tenant, and database security prevents any tenant from accidentally or maliciously accessing other tenants' data.
Giving each tenant its
own database makes it easy to extend the application's data model (discussed later) to meet tenants' individual needs, and
restoring a tenant's data from backups in the event of a failure is a relatively simple procedure. Unfortunately, this approach
tends to lead to higher costs for maintaining equipment and backing up tenant data.
Hardware costs are also higher than they are under alternative
approaches, as the number of tenants that can be housed on a given database server is limited by the number of databases that
the server can support. (Using autoclose to unload databases from memory when there are no active connections can make an
application more scalable by increasing the number of databases each server can support.)
Separating tenant data into individual databases
is the "premium" approach, and the relatively high hardware and maintenance requirements and costs make it appropriate
for customers that are willing to pay extra for added security and customizability. For example, customers in fields such
as banking or medical records management often have very strong data isolation requirements, and may not even consider an
application that does not supply each tenant with its own individual database.
Shared Database, Separate Schemas
Another approach involves
housing multiple tenants in the same database, with each tenant having its own set of tables that are grouped into a schema
created specifically for the tenant.
Figure 2. In this approach each tenant has its own separate set of tables in a common database
When a customer first
subscribes to the service, the provisioning subsystem creates a discrete set of tables for the tenant and associates it with
the tenant's own schema. You can use the SQL CREATE command to create a schema and authorize a user account to access it.
For example, in Microsoft SQL Server 2005:
CREATE SCHEMA ContosoSchema
AUTHORIZATION Contoso
The application can then create and access tables within the tenant's schema using the SchemaName.TableName
convention:
CREATE
TABLE ContosoSchema.Resumes (EmployeeID int identity primary key, Resume nvarchar(MAX))
After the schema is created, it is set as
the default schema for the tenant account:
ALTER USER Contoso WITH DEFAULT_SCHEMA = ContosoSchema
A tenant account can access tables within
its default schema by specifying just the table name, instead of using the SchemaName.TableName convention.
This way, a single set of SQL statements can be created for all tenants, which each tenant can use to access its own data:
SELECT
* FROM Resumes
Like the isolated approach, the separate-schema approach is relatively easy to implement, and tenants can extend
the data model as easily as with the separate-database approach. (Tables are created from a standard default set, but once
they are created they no longer need to conform to the default set, and tenants may add or modify columns and even tables
as desired.) This approach offers a moderate degree of logical data isolation for security-conscious tenants, though not as
much as a completely isolated system would, and can support a larger number of tenants per database server.
A significant drawback
of the separate-schema approach is that tenant data is harder to restore in the event of a failure. If each tenant has its
own database, restoring a single tenant's data means simply restoring the database from the most recent backup.
With a separate-schema
application, restoring the entire database would mean overwriting the data of every tenant on the same database with backup
data, regardless of whether each one has experienced any loss or not. Therefore, to restore a single customer's data, the
database administrator may have to restore the database to a temporary server, and then import the customer's tables into
the production server—a complicated and potentially time-consuming task.
The separate schema approach is appropriate for applications that
use a relatively small number of database tables, on the order of about 100 tables per tenant or fewer. This approach can
typically accommodate more tenants per server than the separate-database approach can, so you can offer the application at
a lower cost, as long as your customers will accept having their data co-located with that of other tenants.
Below..Mr MB...his intelligence level
is WAY higher than mine...yours too-2-too?
Shared
Database, Shared Schema
A third approach involves using the same database and the same set of tables to host multiple tenants' data.
A given table can include records from multiple tenants stored in any order; a Tenant ID column associates every record with
the appropriate tenant.
Figure 3. In this approach, all tenants share the same set of tables, and a Tenant ID associates each tenant with
the rows that it owns
Of the three approaches explained here, the shared schema approach has the lowest hardware and backup costs, because
it allows you to serve the largest number of tenants per database server. However, because multiple tenants share the same
database tables, this approach may incur additional development effort in the area of security, to ensure that tenants can
never access other tenants' data, even in the event of unexpected bugs or attacks.
The procedure for restoring data for a tenant is similar to that
for the shared-schema approach, with the additional complication that individual rows in the production database must be deleted
and then reinserted from the temporary database. If there are a very large number of rows in the affected tables, this can
cause performance to suffer noticeably for all the tenants that the database serves.
The shared-schema approach is appropriate when it is important
that the application be capable of serving a large number of tenants with a small number of servers, and prospective customers
are willing to surrender data isolation in exchange for the lower costs that this approach makes possible.
Choosing an Approach
Each of the three approaches
described above offers its own set of benefits and tradeoffs that make it an appropriate model to follow in some cases and
not in others, as determined by a number of business and technical considerations. Some of these considerations are listed
below.
Economic Considerations
Applications optimized for a shared approach tend to require a larger development effort than applications designed
using a more isolated approach (because of the relative complexity of developing a shared architecture), resulting in higher
initial costs. Because they can support more tenants per server, however, their ongoing operational costs tend to be lower.
Figure 4. Cost over time for a hypothetical pair of SaaS applications; one uses a more isolated approach, while the
other uses a more shared approach
Your development effort can be constrained by business and economic factors, which can influence
your choice of approach. The shared schema approach can end up saving you money over the long run, but it does require a larger
initial development effort before it can start producing revenue. If you are unable to fund a development effort of the size
necessary to build a shared schema application, or if you need to bring your application to market more quickly than a large-scale
development effort would allow, you may have to consider a more isolated approach.
Security Considerations
As your application will store sensitive
tenant data, prospective customers will have high expectations about security, and your service level agreements (SLAs) will
need to provide strong data safety guarantees. A common misconception holds that only physical isolation can provide an appropriate
level of security. In fact, data stored using a shared approach can also provide strong data safety, but requires the use
of more sophisticated design patterns.
Tenant Considerations
The number, nature, and needs of the tenants you expect to serve all affect
your data architecture decision in different ways. Some of the following questions may bias you toward a more isolated approach,
while others may bias you toward a more shared approach.
How many prospective tenants do you expect to target? You may be
nowhere near being able to estimate prospective use with authority, but think in terms of orders of magnitude: are you building
an application for hundreds of tenants? Thousands? Tens of thousands? More? The larger you expect your tenant base to be,
the more likely you will want to consider a more shared approach.
How much storage space do you expect the
average tenant's data to occupy? If you expect some or all tenants to store very large amounts of data, the separate-database
approach is probably best. (Indeed, data storage requirements may force you to adopt a separate-database model anyway. If
so, it will be much easier to design the application that way from the beginning than to move to a separate-database approach
later on.)
How many concurrent end users do you expect the average tenant to support? The larger the number, the more appropriate
a more isolated approach will be to meet end-user requirements.
Do you expect to offer any per-tenant value-added
services, such as per-tenant backup and restore capability? Such services are easier to offer through a more isolated approach.
Figure 5. Tenant-related factors and how they affect "isolated versus shared" data architecture decisions
Companies, organizations, and governments are often subject to regulatory law that can affect their security and
record storage needs. Investigate the regulatory environments that your prospective customers occupy in the markets in which
you expect to operate, and determine whether they present any considerations that will affect your decision.
Skill Set
Considerations
Designing single-instance, multi-tenant architecture is still a very new skill, so subject matter expertise can be
hard to come by. If your architects and support staff do not have a great deal of experience building SaaS applications, they
will need to acquire the necessary knowledge, or you will have to hire people that already have it. In some cases, a more
isolated approach may allow your staff to leverage more of its existing knowledge of traditional software development than
a more shared approach would.
Realizing Multi-Tenant Data Architecture
The remainder of this article details a number of patterns that
can help you plan and build your SaaS application. As we discussed in our introductory article, a well-designed SaaS application is distinguished by three qualities: scalability, configurability,
and multi-tenant efficiency. The table below lists the patterns appropriate for each of the three approaches, divided
into sections representing these three qualities.
Optimizing for multi-tenant efficiency in a shared environment must not compromise the level
of security safeguarding data access. The security patterns listed below demonstrate how you can design an application with
"virtual isolation" through mechanisms such as permissions, SQL views, and encryption.
Configurability allows SaaS tenants to alter
the way the application appears and behaves without requiring a separate application instance for each individual tenant.
The extensibility patterns describe possible ways you can implement a data model that tenants can extend and configure individually
to meet their needs.
The approach you choose for your SaaS application's data architecture will affect the options available to you for
scaling it to accommodate more tenants or heavier usage. The scalability patterns address the different challenges posed by
scaling shared databases and dedicated databases.
Table 1. Appropriate Patterns for SaaS Application
Approach
Security Patterns
Extensibility
Patterns
Scalability Patterns
Separate Databases
Trusted Database Connections
Secure Database Tables
Tenant Data Encryption
Custom
Columns
Single Tenant Scaleout
Shared Database, Separate Schemas
Trusted Database Connections
Secure Database Tables
Tenant Data Encryption
Custom
Columns
Tenant-Based Horizontal Partitioning
Shared Database, Shared Schema
Trusted Database Connections
Tenant View Filter
Tenant Data Encryption
Preallocated
Fields
Name-Value Pairs
Tenant-Based Horizontal Partitioning
Security Patterns
Building adequate security
into every aspect of the application is a paramount task for any SaaS architect.
Promoting software as a service basically means asking potential
customers to relinquish some control of their business data.
Depending on the application, this can include extremely sensitive information
about finances, trade secrets, employee data, and more. A secure SaaS application is one that provides defense in depth,
using multiple defense levels that complement one another to provide data protection in different ways, under different circumstances,
against both internal and external threats.
Building security into a SaaS application means looking at the application on different
levels and thinking about where the risks lie and how to address them. The security patterns discussed in this section rely
on three underlying patterns to provide the right kinds of security in the right places:
Filtering:
Using an intermediary layer between a tenant and a data source that acts like a sieve, making it appear to the tenant as though
its data is the only data in the database.
Permissions: Using access control
lists (ACLs) to determine who can access data in the application and what they can do with it.
Encryption:
Obscuring every tenant's critical data so that it will remain inaccessible to unauthorized parties even if they come into
possession of it.
Keep these patterns in mind as you read the rest of this section.
Trusted Database Connections
In a multi-tier application
environment application architects traditionally use two methods to secure access to data stored in databases: impersonation,
and a trusted subsystem account.
With the impersonation access method, the database is set up to allow individual users to
access different tables, views, queries, stored procedures, and other database objects. When an end-user performs an action
that directly or indirectly requires a call to a database, the application presents itself to the database as that user,
literally impersonating the user for the purposes of accessing the database. (In technical terms, the application employs
the user's security context). A mechanism such as Kerberos delegation can be used to allow the application process
to connect to the database on behalf of the user.
Figure 6. An application connects to a database using impersonation
With the trusted
subsystem access method, the application always connects to the database using its own application process identity, independent
of the identity of the user; the server then grants the application access to the database objects that the application can
read or manipulate.
Any additional security must be implemented within the application itself to prevent individual end users from accessing
any database objects that should not be exposed to them. This approach makes security management easier, eliminating the need
to configure access to database objects on a per-user basis, but it means giving up the ability to secure database objects
for individual users.
Figure 7. An application connects to a database as a trusted subsystem
In a SaaS application, the concept of "users"
is a bit more complicated than in traditional applications, because of the distinction between a tenant and an end user. The
tenant is an organization that uses the application to access its own data store, which is logically isolated from data stores
belonging to any other tenants.
Each tenant grants access to the application to one or more end users, allowing them to access some portion of the
tenant's data using end user accounts controlled by the tenant.
In this scenario, you can use a hybrid approach to data access that combines
aspects of both the impersonation and trusted subsystem access methods. This allows you to take advantage of the database
server's native security mechanisms to enforce the maximum logical isolation of tenant data without creating an unworkably
complex security model.
Figure 8. A SaaS application connects to a database using a combination of the impersonation and trusted subsystem
approaches
This approach involves creating a database access account for each tenant, and using ACLs to grant each of these
tenant accounts access to the database objects the tenant is allowed to use.
When an end user performs an action that directly or indirectly
requires a call to a database, the application uses credentials associated with the tenant account, rather than credentials
associated with the end user. (One way for the application to obtain the proper credentials is through impersonation, in conjunction
with a credentialing system like Kerberos.
A second approach is to use a security token service that returns an actual set of encrypted
login credentials established for the tenant, that the application process can then submit to the database.) The database
server does not distinguish between requests originating from different end users associated with the same tenant, and grants
all such requests access to the tenant's data.
Within the application itself, security code prevents end users from receiving and modifying
any data that they are not entitled to access....and, more power to the future 2013 cloud!