Growing Growth: Perform Your Own Cohort Analysis with This Open Source Code

Today’s guest post from Toptal, a widely touted placement agency for talented freelance developers, was written by Alejandro Rigatuso and can be found over on

Cohort analysis, retention, and churn are some of the key metrics in company building.

But this isn’t just another article about cohort analysis. If you’re a seasoned data scientist that already knows the importance of the topic and want to skip the introduction, you can jump to the simulator, where you can learn how to do cohort analysis and simulate startup growth based on retention, churn, and a number of other factors, or analyze your own PayPal logs with the software I’ve open sourced.

If, however, you don’t realize that these are some of the most important metrics around–continue reading.

Introduction to Cohort Analysis

First, lets understand what we’re talking about here with a cohort analysis definition. Briefly, a cohort is a group of subjects with a common defining characteristic. Maybe it’s their age, maybe it’s their nationality, maybe it’s their city of birth, etc.

Age is a particularly good example. Often, we refer to those born between the 60s and the 80s as members of “Generation X” and those who were born between the 80s and 90s as members of “Generation Y”. Each cohort, each generation, has its own defining characteristics.

Similarly, any company can group and analyze their customers by cohort. A common and very useful way to analyze your customers is to group them by the date at which they started to use your service.

What if I were to ask you: “How much of your revenue last month came from customers who started to work with you a year ago?” Any at all? New users may look good, but signups alone don’t equate to revenue. Do you know the answer? If not, it’ll be helpful to learn about cohort analysis.

Cohorts, retention, and churn analysis

If you analyze your revenue by cohorts, you can deduce (on a monthly basis) how much of your revenue comes from new users and how much comes from old users. Plus, you can take the next step and predict future revenue attributed to retention and accounting for churn with a significantly higher degree of precision.

Ok, so we’ve established that a cohort is a group of people with a common defining characteristic. From here, we’ll proceed by example, examining the metrics of our new hip cloud computing startup. Let’s start by analyzing just a single cohort. In this case, we’ll look at the customers that started working with us in January 2012.

The first important metric that we need to calculate is retention: how many of our new January users were still with us in February? Say we had 100 subscribers in January, and only 20 decided to cancel their subscriptions, leaving us with 80 subscribers remaining in February. Basic retention analysis tells us that’s an 80% retention rate. Now, let’s say that 8 customers decided to cancel in February. So in March, we have 80-8=72 users. Since 72/80 = 90% we had a 90% retention after 2 months for our January 2012 cohort.

Some people calculate retention as a function of the initial size of the cohort, but I prefer to calculate retention as a function of the previous month of each cohort.

Churn rate is another essential metric. It’s can be defined in terms of retention: churn = 1 – retention. So 80% retention implies 20% churn. In words, it’s the rate at which customers are leaving your service.

Returning to our cloud computing startup, let’s analyze an ideal (read: unreal) case: 100% retention rate. That means that none of our customers leave the service–no one cancels whatsoever. Lets say our company gets 1,000 new customers per month. After 24 months, this company has 24,000 active customers. Not too bad. Unfortunately, this scenario is basically impossible–100% retention only exists in startup paradise.

Now, let’s be slightly more realistic and say that our company has a 90% retention rate. In other words, each cohort loses 10% of its customers every month. Again, we’ll assume 1,000 new customers every month.

In this case, after receiving 1,000 new users in January 2012, we lost 100 customers in February, 90 in March, 81 in April, and so on. Let’s see what this graph looks like.

If you look at the previous cohort graph you will realize that the total number of active users is reaching a saturation point around 9,000. It can be demonstrated mathematically that this company will no longer grow beyond 9,000 users, even when it’s receiving 1,000 users per month.

With 1000 new users per month at a 90% customer retention rate, we have around 9,000 monthly active users after 24 months. Compare this to 100% retention, and we have just 37.5% of the ideal case (24,000 customers).

Put simply: a 10% drop on the retention rate caused a 62% decrease on the total number of active users after 24 months.

The key takeaways here: low retention rates limit growth, and using software for cohort analytics is useful for understanding your retention rates.

2017-01-29T18:06:20-04:00September 22nd, 2016|Big Data, Guest Blog|

About the Author:

Andrew is a technical writer for Deep Core Data. He has been writing creatively for 10 years, and has a strong background in graphic design. He enjoys reading blogs about the quirks and foibles of technology, gadgetry, and writing tips.

Leave A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.