Mention cloud storage to most IT professionals and they think of Internet services like Amazon S3 and Nirvanix that store
your data in their data centers.
But a storage cloud doesn't have to be public. A wide range of private cloud storage products have been introduced
by vendors, including name-brand companies such as EMC, with its Atmos line, and smaller players like ParaScale and Bycast.
Other
vendors are slapping the "cloud" label on existing product lines. Given the amorphous definitions surrounding all
things cloud, that label may or may not be accurate. What's more important than semantics, however, is finding the right architecture
to suit your storage needs.
More Storage Insights
A prototypical cloud storage system is made
up of a number of x86 servers, each with its own storage, most commonly using four to 16 SATA drives. Users and their applications
access the system through standard file access protocols like CIFS and NFS or via object storage and retrieval protocols like
SOAP and REST.
The
storage nodes in a private cloud are linked together with a layer of smart software, which performs several functions. First,
it maintains a global name space that allows all the storage in the cluster to be accessed as a single entity, so that administrators
can add storage capacity on the back end without having to tell applications at the front end how to reach it. The software
also handles drive failures and keeps data available to applications and end users.
A private cloud storage infrastructure should also be able to scale
from hundreds of terabytes to multiple petabytes. That level of scalability is achieved not with a forklift upgrade, but simply
by adding more servers as they're needed.
This architecture provides two major benefits. First, storage administrators can configure
and provision new storage nodes quickly and inexpensively. Second, administrators can add capacity only as demand requires,
instead of purchasing additional disk space to meet anticipated future growth and then having that capacity sit idle in the
present.
However, there are
also trade-offs. Cloud storage is best suited to unstructured data, such as medical images, engineering drawings, and Office
documents. For another, because each x86 server isn't as reliable as a high-end enterprise disk array, a private cloud must
store copies of the data on multiple nodes.
This requires more raw disk space than an enterprise disk array using a RAID-5 or 6 system.
For example, if you set a policy for your private cloud to keep three copies of a 60-GB file for data protection, it would
require 180 GB of disk, whereas a 6+2 RAID-6 system would need just 80 GB.
Beyond Low Cost
Our
Take
PRIVATE CLOUD STORAGE
Can reduce up-front hardware and administrative
costs Takes advantage of low-cost
servers and storage Makes it
easier to add storage capacity Most
appropriate for large volumes of unstructured data
Private cloud software vendors are focused
on finding ways to differentiate themselves from their competitors. For instance, Cleversafe says it gets around the RAID
issue through unique dispersal algorithms that ensure data availability with less than 40% overhead.
Several other vendors include location-aware policy engines that
copy data to nodes in specific geographical locations. Data Direct Networks' Web Object Store, Bycast's StorageGrid, and EMC's
Atmos systems can specify that two copies of each object in a folder should be stored in New York and Los Angeles, and that
copies also should be stored in two other locations.
This not only protects data from data center failures but can also put objects on storage
clusters close to the users who need them. Bycast's policy engine takes this notion one step further by including elements,
such as storage tiering, that can migrate objects from more-expensive to less-expensive disk, and even to and from tape.
Organizations planning
to offer private cloud storage services to internal departments may want to consider multitenant features that allow storage
to be partitioned among different groups.
For example, IT could carve out one section of the private cloud for HR and another for
marketing, and then charge those departments based on usage. This means having delegated administration models and/or virtual
servers that restrict each group's access and visibility to only their own data and the resources assigned to them.
A multitenant storage
system should also include accounting features that collect usage data, such as peak utilization, that will help IT in determining
chargebacks
Given the attention
that cloud computing garners these days, some vendors are rebranding existing offerings as private cloud options. This can
be frustrating for potential buyers, but religious arguments over what constitutes a cloud are less important than features,
capabilities, and cost.
More Storage Insights
Caringo and HDS have
repositioned their content addressable storage (CAS) and redundant array of independent nodes (RAIN) systems as private cloud
storage. There are some similarities.
For instance, CAS/RAIN architectures tend to be built with less-expensive disks than you'd find in an enterprise
SAN.
However,
vendors have traditionally positioned CAS/RAIN architectures for archiving and compliance.
Those use cases require more-advanced features
than most private cloud providers offer, such as deduplication, or the ability to set retention and disposition policies or
use hash algorithms to demonstrate that objects haven't been changed after they're saved. These advanced features let vendors
charge a premium, which starts to push these products outside the low-cost boundary of a private cloud.
In addition, the amounts of data CAS/RAIN
storage systems are intended to hold are usually smaller, and have lower performance requirements, than a private cloud architecture.
PRIVATE CLOUD STORAGE OPTIONS
Bycast
StorageGrid Software; location aware; supports multitenancy and multiple data tiers Caringo CAStor CAS/RAIN software; replicates among clusters; optional CIFS/NFS gateway software
Cleversafe Software; disperses data slices across multiple
locations; iSCSI interface Data Direct Networks Web Object Scaler Appliance-based object store; location-aware policy engine; up to 60 TB per node EMC
Atmos Integrated hardware/ software; distributed objects; policy engine; supports multitenancy; minimum config
is 120 TB HDS Hitachi Content Platform CAS/RAIN appliance; internal
or external storage; replicates between clusters; supports multitenancy IBM
Business Storage Cloud Cluster file system with integrated product and services; can use IBM XIV grid back end
ParaScale Cloud Storage Software Software; distributed
object copies; replicates among clusters Symantec FileStore Cluster file
system software; uses shared storage; replicates file systems; supports multiple tiers
The
CAS/RAIN vendors aren't the only ones using cloud labels to fog up product categories. Vendors like IBM and Symantec have
repackaged their clustered file systems into private clouds. Symantec FileStore software wraps Storage Foundation, and its
integrated VxFS clustered file system, in a package that's easier to install and manage.
IBM's Smart Business Storage Cloud leverages
its GPFS clustered file system along with XIV clustered block storage (and of course, IBM services).
While cluster file systems can deliver impressive performance,
their reliance on expensive back-end storage makes them relatively pricey compared with RAIN architectures. Cluster file systems
are more appropriate to applications, like render farms, that require high performance for individual clients.
private-cloud-storage-nas
private-cloud-storage-server-rack-diagram
private-cloud-storage-linux
Pick A Package
Organizations that want to get private
cloud storage off the ground quickly, or prefer the comfort of one throat to choke, should consider integrated systems like
Hitachi's Content Platform, EMC's Atmos, or Data Direct Networks' Web Object Store. These products come complete with storage
hardware, software, processors--and in the case of Atmos, even the rack.
Those looking for cloud economics may prefer software like Bycast's StorageGrid,
ParaScale's Storage Cloud, or Caringo's CAStor.
Because these vendors charge for their software on a per- gigabyte basis, users can easily match capacity to cost.
Meanwhile, Cleversafe sells pre-configured access, storage, and management nodes, and the adventurous can use the open source
community version from Cleversafe.org.
Private cloud storage systems can bring cloud economics to the data center, allowing corporate IT to retain control
over data, security, and reliability. These new architectures promise to not only reduce the up-front cost of storing many
terabytes of unstructured data but also reduce the amount of manpower required to manage it.