Posts by Huan Lai

CSV Column Discretization with Java

When mining a large amount of data, often times you end up with a lot of columns with continuous values. While this is the most “pure” version of the data, sometimes you want to cluster these values into bins to do things like creating histograms or just easy analysis of the distribution of the data. […]

Applying Data Mining Techniques to MapReduce

Here at the Labs, we have been playing around with the MapReduce programming model (namely the open-source Hadoop implementation) for a while, but have been relatively conservative up till now. Most of the jobs that we have done thus far have been relatively simplistic, being more or less basic aggregation functions, with the most difficult […]

Rails vs Django: A Developer’s Comparison

As you might know from reading some of my previous blog posts, I’ve been working with Python and Django pretty extensively over the last year, mainly for rapid prototyping and developing relatively simple web applications (including a Facebook App). During Cool Stuff Week I decided to try using Ruby on Rails (RoR) as the base […]