Australia and New Zealand's Premier Virtualisation Community




Home arrow Blog arrow Site Recovery Manager - What is it ? An Overview Make Text BiggerMake Text SmallerReset Text Size
vizioncorefoglight
Site Recovery Manager - What is it ? An Overview PDF Print E-mail
Written by Damian Murdoch   
Monday, 30 June 2008

Site Recovery Manager - An Overview

So it is out in the wild, and there are some nice sales bundles around.
Why should I buy it ? What does it actually do ?

Read on for the detail, it's worth the read.

Site Recovery Manager simplifies and automates your disaster recovery workflows, including all of the setup, testing, failover and the failback.

It will help to turn all of those manual runbook processes that you took months to develop into automated recovery plans.

Centralised Management of all the recovery plans are enabled from your existing VirtualCenter infrastructure.

With SRM, you can create, test, update and execute all of your DR recovery plans. You actually go in, build the recovery process, perform the testing of the recovery process and automate it in case of a disaster. It works via integration with your storage vendor, and is the automation of the manual scripts that VMware Professional Services has been implementing for some time. Granted a lot more is involved than just automation of existing workflows, but in order to control the storage infrastructure VMWare is using the hardware vendor buy in for the detailed control of the storage.

At this time, a lot of vendors are on board including storage virtualisation vendors. Though not all models and all features are supported with all of the Vendors. For instance the EMC arrays currently support SRM with synchronous replication at the storage level, but not with a-synch. A-synch support is slated for the end of the year.

So the scenario is that we have the storage replication with a supported vendor configuration in place, we have our VMware Infrastructure up and running and we have implemented SRM and automated our failover process. Our primary site fails, SRM will detect that the heartbeat has been lost and raise an alert to the administrators of a potential site failure.

The next step is to initiate the failover, the administrator confirms the outage and SRM automatically executes the recovery process that was created for the infrastructure. SRM at this point will manage the failover of the storage, and this is where the vendors and the integration scripts mentioned above come in to play. You will need to manyally initiate the failover, a disaster has to be acknowledged before a site failover should be considered.

During the failover process, SRM kicks off a number of things. The process begins with the shutdown or suspend of any virtual machines that are a low priority. You may have a bronze tier of service that are not actually critical in a disaster and you do not actually need them running. SRM will then attach to the storage and enable the replicated copy of the storage at the DR site. Once the storage is enabled, it will connect the replicated volumes to ESX and power on the virtual machines. This includes any IP address changes as needed.

Once this process has been completed, reports are generated capturing all the detail of the recovery process...

SRM should be looked at like VMware HA for physically seperate geographic locations, consider it active/passive and not active/active like clustering. The process does take a while and if you have severe uptime restrictions or failover SLA's on specific services then you should consider a geocluster type configuration for that service and exclude it from the SRM type failover.

You can configure SRM to do test failovers to meet your business' annual DR certification process. When you do the test process SRM can actually be configured to test to a snapshot disk and not affect the syching disk at the DR site causing a total rebuild. It can also connect the virtual machines on that snapshot volume to an isolated network for testing, which then allows you to confidently failover critical services without worrying about then landing on the production network.

So which vendors are supporting it ?

3 Par - Should be available within 90 days of general release, this means soon!
Dell - Available at general release, now
EMC - SRDF, Mirrorview, RecoverPoint, Celerra Replicator all available at general availability (i.e now)
FalconStor - Now
Hitachi - Q3 this year
HP - EVA CA adapter within 90 days of release, soon!
IBM - DS4000, N-Series available at general release (N-series is rebadged Netapp), SVC within 90 days of general release which means soon!
Lefthand Networks - now
NetApp - SnapMirror support available now!

 
Tag it:
Delicious
Furl it!
Scuttle
Spurl
< Prev   Next >
RSS - Subscribe


Joomla! Template Supplied by Netshine Software Limited