The backpropagation algorithm for Word2Vec

Since I have been really struggling to find an explanation of the backpropagation algorithm that I genuinely liked, I have decided to write this blogpost on the backpropagation algorithm for word2vec. My objective is to explain the essence of the backpropagation algorithm using a simple - yet nontrivial - neural network. Besides, word2vec has become so popular in the NLP community that it is quite useful to focus on it.

Read More

Bayesian A/B Testing: a step-by-step guide

This article is aimed at anyone who is interested in understanding the details of A/B testing from a Bayesian perspective. It is accompanied by a Python project on Github, which I have named aByes (I know, I could have chosen something different from the anagram of Bayes…) and will give you access to a complete set of tools to do Bayesian A/B testing on conversion rate experiments.

Read More

The confusion over information retrieval metrics in Recommender Systems

Recently I have been reading a lot about evaluation metrics in information retrieval for Recommender Systems and I have discovered (with great surprise) that there is no general consensus over the definition of some of these metrics. I am obviously not the first one to notice this, as demonstrated by a full tutorial at ACM RecSys 2015 discussing this issue.
In the past few weeks I have spent some time going through this maze of definitions. My hope is that you won’t have to do the same after reading this post.

Read More

Changepoint Detection. Part II - A Bayesian Approach

I have recently discussed the problem of changepoint detection from a frequentist point of view. In that framework, changepoints were inferred using a maximum likelihood estimation (MLE) approach. This gave us point estimates for the positions of the changepoints.

In this post I will present the solution to the same problem from a Bayesian perspective, using a mix of both theory and practice (using the $\small{\texttt{pymc3}}$ package). The frequentist and Bayesian approaches give actually very similar results, as the maximum a posteriori (MAP) value, which maximises the posterior distribution, coincides with the MLE for uniform priors. In general, despite the added complexity in the algorithm, the Bayesian results are rather intuitive to interpret.

Read More

The multiple hypothesis testing problem

I must admit that I only learnt about the “multiple testing” problem in statistical inference when I started reading about A/B testing. In many ways I knew about it already, since the essence of it can be captured by a basic example in probability theory: suppose a particular event has a chance of 1% of happening. Now, if we make N attempts what is the probability that this event will have happened at least once among the N attempts?

Read More