The Problem with Netflix's Recommendation Challenge

Date: Category: Data Science

The Netflix Prize was an open data science compeition that attempted to find the best filtering algorithm to predict ratings for films. However, despite some eye-opening results, this content had one major flaw.

Learn More Outside Resource

LightGBM on Zero-Inflated Data

Date: Category: Statistics

LightGBM is one of the most innovative decision tree implementations on the market right now. Let's explore what it does on Zero-Inflated data and why it happens.

Learn More Outside Resource

How do Computers Calculate pnorm Values?

Date: Category: Statistical Computing

If you're familiar with the normal distribution and R programming, then it's likely you've learned to use pnorm to calculate p-values for analysis. However, the way a computer does this calculation is lesser known. Let's change that!

Learn More Outside Resource

Martingaling vs. Max Bets: A Visual Aid

Date: Category: Statistics

If you've ever played a game at a casino that has a roughly 50% chance of winning, then you may have heard about the martingale betting system. However, if you're aware of the strat, so is the house. Here we visually understand how casinos have disrupted the implementation of this strategy. (EV GRAPH)

Learn More Outside Resource

An Algebraic Proof that the AR(2) ACF is sinusoidal.

Date: Category: Time Series

After learning that the AR(2) ACF is sinusoidal, I wanted to actually prove it. We showcase some examples of this being the case and then detailing the algebraic mess that proves it.

Learn More Outside Resource

Chi-Square Degrees of Freedom Explained

Date: Category: Statistics

The chi-square distribution (and tests of independece/goodness of fit) are characterized by their degrees of freedom. However, many times we just memorize the formulas for it and don't understand where it stems from. Here, we provide some perspective on the origin of these formulas.

Learn More Outside Resource

When to Use Scipy Sparse

Date: Category: Statistics

Sparse data is common in statistical analysis; there are always going to be applications that have zero-inflated observations and finding the right tools to deal with it is important. Scipy sparse is a package that helps with this setting, but a more concrete exploration of when its better than more traditional approaches is warranted.

Learn More Outside Resource

CSR Multiplication

Date: Category: Matrix Computation

CSR is a technique that is used to perform matrix operations in sparse settings. However, even if you understand the big picture, it can be overwhelming to understand how it works. In this post, we explain how CSR works and provide a myriad of cases to flush out the CSR mechanism.

Learn More Outside Resource

Welcome to my blog! Click the arrows to see what I write about!

(Yes, this is my setup that I built myself 😊 )

Theory, Math, and Applications

Concepts, Tutorials, and Projects

Opinions, Mental Health, and Miscellaneous

College Students are Incentivised to Cheat: Here's the Solution

Weak Ties: Oxymoronically the Strongest Asset

NMI vs. ARI: What's the Difference?

On my Experiences with Impostor Syndrome

Bad Beat Simulation in Python

Regression Discontinuty: It's not Causal Inference

Your First Variaional Inference: Explained

Einstein Summation: The Hidden Gem of CP-ALS Implementation

The Problem with Netflix's Recommendation Challenge

LightGBM on Zero-Inflated Data

How do Computers Calculate pnorm Values?

Martingaling vs. Max Bets: A Visual Aid

An Algebraic Proof that the AR(2) ACF is sinusoidal.

Chi-Square Degrees of Freedom Explained

When to Use Scipy Sparse

CSR Multiplication