### #22 Fundamentals--Poisson distribution

In this episode, we are going to introduce Poisson distribution by theoretical derivation and a demonstration with real world data -- the imported malaria cases in Taiwan.

## Theoretical Derivation

Poisson distribution is basically a variant of binomial distribution given the probability of an event very small. Let's recall the basic definition of a Bernoulli trial and a binomial distribution. A Bernoulli trial is a random experiment with binary outcome -- head or tail, live or dead, etc. If the value of a random variable X is determined by a Bernoulli trial, or a Bernoulli variable, it would satisfy

However, sometimes we are more interested in something more complicated. For example, given that 500 patients are treated with some therapeutic strategy A, what is the probability that more than half of them survive? Assume that the therapeutic processes for each patient are independent from one another. A Bernoulli process is a sequence composed of finite or infinite independent Bernoulli variable. The number of successes in these trials, X, has a binomial distribution. In other words,
where $C^n_r=\frac{n!}{r!(n-r)!}$. The expectation value of X, <X>, equals np.

However, many real world probability problems have either an extremely large n or an extremely small p. For example, how many lottery tickets would win the 1st price in Taiwan every week? The number of tickets is large but the probability of winning is small. Under such circumstances, we could simplify the binomial distribution by assuming λ=np, and
This is the Poisson distribution. For any random variable X fulfilling Poisson distribution, the following holds
$\textup{Pr}[X=r&space;\vert&space;\lambda]&space;=\lambda^r&space;\frac{e^{-\lambda}}{r!}$
If we sum the above probability from r=0 to r=∞, the sum would be exp(λ) * exp(-λ) = 1.

One interesting property of Poisson distribution is that its variance equals its expectation value. Let's check it. From the definition of expectation value,

This sum does not seem to be easy. However, there is a common trick to tackle it. Given the fact that

If we partially differentiate this formula with respect to λ, we will get

This is a really common trick. As for the variance, we could write down

We again partially differentiate  $\sum_{k=0}^{\infty}&space;k\lambda^{k}\frac{e^{-\lambda}}{k!}&space;=&space;\lambda$with respect to λ, and we will get

Thus Var(X) =λ + λ^2 -λ^2 =λ.

Poisson distribution could be used to describe various phenomenon, such as the number of traffic accidents happened in Taipei City each month, the annual number of newly-diagnosed breast cancer and so on. Poisson distribution is also important when it comes to survival analysis. In the final section of this episode, we will take imported malaria in Taiwan as an example.

## Imported Malaria in Taiwan as an Example

Taiwan is a malaria-free country. However, there are still several imported cases annually. Here is the file containing the data of weekly imported malaria in Taiwan: