Monday, July 19, 2010

What is the PRD for the Cloud? Part 1

Cloud software is a fascinating space because of the complexity of the solution, the breadth of the problems to solve, and the broad set of applications that are supported, not to mention security, scalability, reliability, availability, and capacity. Couple this with a platform approach on top that manages the SDLC, and maybe some geographic load balancing (clouds in different parts of the world that localize and load balance)...and you have yourself a vast ecosystem of scripts, applications, network connections, virtualized servers, that perform automation, monitoring, provisioning, configuration, with a nice UI to abstract the technology beneath to make it all possible. I think there are some key pieces to building these environments in the realm of Cloud Operations Platform Support (COPS) and Infrastructure:

Key Infrastructure Features
a) A quality datacenter environment - Redundant power, HVAC, Network, with raised floors, secure cabinets (if necessary) and a focus on security.
b) Core redundant network with redundant ingress/egress links (peering/transit) (availability).
c) SAN - Redundant switch fabric hierarchical storage with iSCSI mainly for cost benefits, and potentially Fibre for high performance, but the trend today is principally iSCSI for everything except the highest performance Apps/databases. SAN's enable snapshots (recovery), though most Volume Managers do this too. SAN's also allow for multiple virtual servers to access data volumes (redundancy).
d) Reliable Servers with decent RAM/CPU density per rack unit.
e) A cache farm - if necessary for speed up and offloading of content delivery.
f) Clean wireplant with strategic naming conventions for hardware.

"Must have" (Back-End) Software Features
1) Monitoring - To provide feedback for provisioning, to automate differentiation, to give operational feedback, to give feedback for performance tuning.
2) Automation of Hypervisor Creation - Bare metal to production hypervisor in minutes, something the customer never sees, but capacity demands rapid availability so this is definitely the ideal situation.
3) Virtual Server Base Image Creation/Capture - Whether its P2V or just the ability to snap and clone, this feature is a necessity for automation of provisioning.
4) Elastic Provisioning - Easy scalability based on monitoring feedback, tricky subject matter here, do we depend on transactional monitoring, latency, regular monitoring of virtual resources, a combination of these? How does the customer cap scalability if automation would cost money potentially without return on investment...its a double-edged sword.
5) Automated division of labor - The ability to load balance the schedule of work, and divide it among the classes of servers that comprise a SaaS solution, coupled with the ability of servers to re-differentiate to different classes (ie. a mail server could turn on ad graphics services and pop into the ad graphics load balancer (basically a light httpd) and serve up static images if capacity needed graphics more than mail (capacity measured from load balancing statistics)...
6) Simplified operation support tools that are geared towards scaling (like Stanford's netdb suite, puppet/CFEngine, kickstart, func)

"Must have" (Front-end) features
- A Content Management Engine - Extensible, with SVN/GIT code vaulting.
- A Software Release Engine - Synchronize the testing, validation, release, rollback, and such across multiple data centers and multiple classes of virtual servers...of course this could be obsolete depending on if a master virtual image is created then duplicated across the cloud, its a matter of style and the old argument of blocks verses file copying, regardless of the method, the "flip" to a new production code level is always a coordinated effort, and requires forethought.
- Potentially a development PaaS for customers who would like to have further abstraction of technology with stencils/stylesheets for typical webpages or application layouts. The ability to have a pure custom development platform would be available too of course.

Interesting features to think about...
- What about graphics creation?
- Can you roll up Product Management or Product Management into the PaaS?
- Video, VOIP, chat, Add-on services...

All these moving parts can break, the trick is to rely on a highly available model that has the planning, and forethought to achieve the service level objectives to keep customers delighted...and I haven't even delved into the importance of UI design, but that is for another day.