Stefan Piesche, Constant Contact CTO, spoke at the Data @Scale conference in Boston, hosted by Facebook. He discussed how Constant Contact moved from scaling data vertically in large DB2 databases attached to even larger SANs, to using Cassandra as a horizontally scalable data tier for key/value type data.
CTCT used to scale data vertically in large DB2 databases attached to even larger SANs. Since this is not only cost prohibitive but poses significant scalability and availability issues, we have now 2 primary other data strategies.
Cassandra. We use Cassandra as a horizontally scalable data tier for key/value type data. We have around 350 Cassandra nodes spanning 2 data centers. That systems provides 10x the performance of the old RDBMS and 1/10th of the cost. This system is our consumer event tracking systems that scales to 100TB of data, 150BN records that arrive at a velocity of 10k/sec.
Sharded mysql. Our largest deploy is a 36TB system spanning 2 data centers. But, instead of just sharding the DB tier, we even shard the application tier using that system in order to provide complete transparency of the sharding mechanism. Our SOA allows for RESTful access of that data, without any knowledge of the underlying sharding mechanism. However, we have learned that this led to a substantial underutilization of the app tiers – a 96 node cluster of a Ruby Rails application – so we are looking into proprietary DB level sharding mechanisms as well.
The mixture of RDMBS and NOSQL data tiers has caused issues in our analytics platform, a 150TB Hadoop cluster. We use similar mechanism like Netflix does to read data from Cassandra nodes – reading from the SSTables to extract the data.
Tell us what you think, leave us a comment!