Tag: Data Management

CSV Column Discretization with Java

When mining a large amount of data, often times you end up with a lot of columns with continuous values. While this is the most “pure” version of the data, sometimes you want to cluster these values into bins to do things like creating histograms or just easy analysis of the distribution of the data. […]

Applying Data Mining Techniques to MapReduce

Here at the Labs, we have been playing around with the MapReduce programming model (namely the open-source Hadoop implementation) for a while, but have been relatively conservative up till now. Most of the jobs that we have done thus far have been relatively simplistic, being more or less basic aggregation functions, with the most difficult […]

How to Use Processing from the Command Line to Generate Images

The Processing project provides a great Java-based visual programming environment with a number of compelling features, including cross-platform support and OpenGL-accelerated graphics. We’ve used it at Constant Contact Labs for a number of internal data visualization projects, and it’s worked very well for us. Lately we’ve had reason to work out a way to have […]

Engaging Big Data

It’s a familiar story, at least in these Software-as-a-Service circles. Inevitably, growing datacenter operations and business activity start to throw off a lot of data. Not the critical content and customer data that customers pay us to manage, which is very explicitly modeled and optimized, but a huge variety of incidental stuff, relating to server […]