Blog

Faster Mediods Data Clustering with CLARA
Previously I introduced K-Medoids clustering, a data clustering method that can be used to cluster datasets containing outliers. The Partition Around Medoids (PAM) algorithm I described in my post is computationally slow and expensive and is typically used on small datasets because of that. What if we have a very large dataset that we’d like …
Continue reading “Faster Mediods Data Clustering with CLARA”

Outlier Clustering with K-Medoids
In my last post, I introduced you to the K-Medians algorithm, a data clustering method that can be used to cluster datasets with outliers. In this post I will be introducing you to another k-means variation that is also effective on datasets with outliers. It’s called K-Medoids. As a matter of fact, there are three …

Using K-Medians for Outlier Clustering.
Hey, readers! I’m back again with another blog post. If you’ve been following the series of posts I’ve been releasing thus far you will already be well familiar with the k-means clustering algorithm. In my last blog post I covered k-modes, a variation of k-means that can be used for clustering categorial data. In this …

Using K-Modes to Cluster Categorical Data
If you’ve been following the series of posts I’ve written thus far, you will have already become familiar with the k-means clustering algorithm and how to determine the optimal number of clusters to produce using the algorithm. This clustering method, unfortunately, is only useful for numerical datasets. What happens if we want to cluster non-numerical …
Continue reading “Using K-Modes to Cluster Categorical Data”

K-Means Parameter Selection
In my last post, I provided a 101 on the K-means clustering algorithm. In case you haven’t had a chance to read the post this link will take you to it. One question you might be wondering after reading the post is how to determine out the optimal number of clusters for k-means. That’s exactly …

K-Means.. The Simple Clustering Algorithm
If you do a quick Google search for clustering algorithms, k-means will undoubtedly be the first algorithm that will be mentioned in the search results, and for good reason too. It’s the first algorithm that most professionals turn to for segmenting datasets. Compared to other clustering methods out there, k-means is both very simple to …
Continue reading “K-Means.. The Simple Clustering Algorithm”

4 skills to focus on when learning Data Science
Python, R, Spark, TensorFlow, pandas, data visualization, etc., etc., etc. With so many technologies and concepts, the task of learning data science, especially for people who are learning on their own, can appear daunting. My goal in this article is to your data science journey less overwhelming. I’m going to show you four skills I …
Continue reading “4 skills to focus on when learning Data Science”
Takeaways from 1 year of teaching coding to kids
Since the summer of last year I’ve been an online coding instructor for children and young adults for theCoderSchool. I’ve taught students 1-on-1, 2-on-1, and sometimes groups of five or more. In this blog post, I will be share a few takeaways I’ve gained from my teaching experiences over the past year. Be aware of the …
Continue reading “Takeaways from 1 year of teaching coding to kids”
Binary Data Series Part 3: Floating Point Numbers
This article is part of the multi part series on binary data. If you’re knowledge of positive and negative integers and binary numbers is a bit flakey, I’d recommend that you read the part 1 and part 2 before you continue. In this article we will discuss how fractions are represented in binary and some …
Continue reading “Binary Data Series Part 3: Floating Point Numbers”