Blog

img

Faster Mediods Data Clustering with CLARA

Previously I introduced K-Medoids clustering, a data clustering method that can be used to cluster datasets containing outliers. The Partition Around Medoids (PAM) algorithm I described in my post is computationally slow and expensive and is typically used on small datasets because of that. What if we have a very large dataset that we’d like …

img

Outlier Clustering with K-Medoids

In my last post, I introduced you to the K-Medians algorithm, a data clustering method that can be used to cluster datasets with outliers. In this post I will be introducing you to another k-means variation that is also effective on datasets with outliers. It’s called K-Medoids. As a matter of fact, there are three …

img

Using K-Medians for Outlier Clustering.

Hey, readers! I’m back again with another blog post. If you’ve been following the series of posts I’ve been releasing thus far you will already be well familiar with the k-means clustering algorithm. In my last blog post I covered k-modes, a variation of k-means that can be used for clustering categorial data. In this …

img

Using K-Modes to Cluster Categorical Data

If you’ve been following the series of posts I’ve written thus far, you will have already become familiar with the k-means clustering algorithm and how to determine the optimal number of clusters to produce using the algorithm. This clustering method, unfortunately, is only useful for numerical datasets. What happens if we want to cluster non-numerical …

img

K-Means Parameter Selection

In my last post, I provided a 101 on the K-means clustering algorithm. In case you haven’t had a chance to read the post this link will take you to it. One question you might be wondering after reading the post is how to determine out the optimal number of clusters for k-means. That’s exactly …

img

K-Means.. The Simple Clustering Algorithm

If you do a quick Google search for clustering algorithms, k-means will undoubtedly be the first algorithm that will be mentioned in the search results, and for good reason too. It’s the first algorithm that most professionals turn to for segmenting datasets.  Compared to other clustering methods out there, k-means is both very simple to …

img

4 skills to focus on when learning Data Science

Python, R, Spark, TensorFlow, pandas, data visualization, etc., etc., etc. With so many technologies and concepts, the task of learning data science, especially for people who are learning on their own, can appear daunting. My goal in this article is to your data science journey less overwhelming. I’m going to show you four skills I …

img

Takeaways from 1 year of teaching coding to kids

Since the summer of last year I’ve been an online coding instructor for children and young adults for theCoderSchool. I’ve taught students 1-on-1, 2-on-1, and sometimes groups of five or more. In this blog post, I will be share a few takeaways I’ve gained from my teaching experiences over the past year.  Be aware of the …

img

Binary Data Series Part 3: Floating Point Numbers

This article is part of the multi part series on binary data. If you’re knowledge of positive and negative integers and binary numbers is a bit flakey, I’d recommend that you read the part 1 and part 2 before you continue. In this article we will discuss how fractions are represented in binary and some …