In keeping up with our “how do we compare" series, today we’ll look at Microsoft Azure. Azure is very similar to Amazon EC2 where you are provided “instances" that include a preconfigured set of resources such as cores, ram and storage. Much like Amazon, there are other services such as databases, storage, BizTalk and numerous others. This is very handy if you want to provide a service/product within your company but don’t want the hassles of managing the software. (ie. updates, backups,etc..)
Microsoft launched HDInsight which provides a Hortonworks base install along with other components. You purchase “instances" and instead of traditional local storage, you are instructed to use “Blob" storage. Here is an article [ http://bit.ly/1JhTZHV ] that explains how the storage works and attempts to justify using it over traditional local storage.
Here again we have a company that started their hosting business with elasticity in mind. Smaller servers that you can quickly scale up and down when needed. This works great for development, QA and production scenarios where you get a spike in traffic but when it comes to Hadoop, not so much. The cost and lack of performance will quickly add up.
We decided to attempt to price out a 20 node cluster using 3 masters nodes and 20 data nodes along with an edge and firewall node. The results aren’t pretty much like our Amazon EC2 comparison.
We started by choosing an inferior “A7" instance which only had 8 cores and 56GB of Ram compared to our base offering of dual hex cores and 64GB of Ram. From there, we needed to match our 10TB of storage per data node so we chose the “Page Blog & Disks" per a Azure sales representative.
As we said, the results are not pretty. We’re not sure why anyone would build a cluster that goes against the exact nature of why Hadoop was conceived:
Massive processing using commodity hardware
but then again, we’re still scratching our heads on the terms “Hadoop on Windows" , “Windows 8" and of course “Windows Vista"…