Featured Post

#69 Fundamentals--Maximum Likelihood Theory 常識集_最大似然估計理論

Today we are going to talk about the maximum likelihood estimation, or the frequently used abbreviation MLE. It is a basic statistical concept which is a little bit like we are using probability in the reversed direction.

An Example   一個簡單的例子

You must have encountered some questions like this:

"Find the probability that, in three tosses of a fair coin, there are one head and two tails."

The answer is 3/8, of course. In this usual problem, the probability of certain event (in this case, the probability of head or tail) is known, and we try to calculate the probability of certain observed result (in this case, one head and two tails). However, in the real world, often the probability of certain event is the one we need to find out, and the observed results are the only certain thing.That is to say, in the world of MLE, our problems often looks like:

"We got one head and two tails in three tosses of a coin. What is the most likely probability of this coin to toss a head?"

In Mathematical Languages   用數學表示

Let's put our above discussion in mathematical form. Assume the probability of tossing a head = θ, and we designate the results with {x}. x = 1 for head while x = 0 for tail. Therefore, the usual probability problem is just like asking:
把上面這串用數學表達。假設拋擲出正面的機率是 θ,把拋擲結果用{x}表示。擲出正面表示x=1而反面則是x=0。因此在一般的機率問題中我們想求的是:
The "|" means "given" just like we did in conditional probability. However, in the world of MLE, we are interested in the opposite question:
"|" 表示已知。但是,在MLE的世界中我們感興趣的通常是這個問題:
The curvy L means likelihood function. Originally θ was taken as a known parameter while x was the random variable. However, in MLE, θ is now the random variable given certain observed x. The magnitude of the likelihood function equals to the joint density function for all observation. If each observation is independent from each other, the likelihood function could be written as:
那個彎彎曲曲的L表示 likelihood function。在我們過去的認知中θ是一個已知的參數,而x則是我們的隨機變數。但在MLE的世界中θ才是隨機變數,而x則是已知的觀察結果。Likelihood function的大小其實就是所有事件發生的聯合機率。如果每次觀測結果是彼此獨立的,聯合機率可以簡單的表示成:
And the most probable  θ would be the one that maximize likelihood function. And that's why the method is called "maximum likelihood estimation."
而讓likelihood function大小最大的 θ 就是最可能的θ。這也就是為何這個方法被稱為"maximum likelihood estimation"。

A real world example 一個真實世界的例子

Normally we will demonstrate how MLE recover our knowledge about "how to estimate a population mean by samples" or "how to estimate the best fit curve", which we might take it for granted. However, it would be so boring to just repeat them. So let's see the problem we will encounter in our series "Gradient Sensing."

In our previous post (see: http://biophys3min.blogspot.tw/2016/08/67-gradient-sensing-ii.html) we have said the slime mold has to estimate the background gradient by detecting where the inward particles flow in. And in our previous post (see: http://biophys3min.blogspot.tw/2016/08/68-gradient-sensing-iii.html) we also showed that
Assume that we have observed inward particle flows density: 
 What is the most probable background particle concentration and concentration gradient?

That is a standard maximum likelihood problem. We know our inward flow density should look like (the sign is not that matter here.)
However, because there are always some noise in it, so we will assume our flow density to be
The strange Y is the spherical harmonics we introduced in our previous post. 
In which the ε follows a normal distribution:
Because we have to know the expression of conditional probability to build our likelihood function,  we simply build our likelihood function based on the normal distribution of the random error:
因為我們必須知道機率要如何表示,我們才有可能寫出Likelihood function。既然我們知道誤差的機率分布,那我們就可以利用他寫出我們的likelihood function:
To maximize the likelihood function, we have to do partial differentiation with respect to each variable and solve the equation. However, it would be crazy to do these stuff on this likelihood function. Instead, we should maximize the log-likelihood function. That is:
就是長得這麼醜。如果想要maximize他,我們還要對他做偏微分,這真的會起笑。所以通常我們會去maximize log-likelihood function 來替代:
In fact, the problem has reduced to a least square error problem. Most of the result could be found in our suggested reading of this series (Endres, R. G. & Wingreen, N. S. (2008). Accuracy of direct gradient sensing by single cells. PNAS 105(41): 15749-15754.) so we are not going to repeat them. We will just mention one. By partial differentiate above equation with respect to G0, we will get
Noted that the results of maximum likelihood are not always the same as the ones given by least square error method. MLE is much more general.
要求出這個不會很困難,只要對G0偏微分然後=0,查一下球諧函數是怎麼表達的,再對照一下原本我們假設的functional form,就可以得到了。有一件事情要注意的是,MLE並不總是等於最小方差問題,他可以解決很多最小方差問題無法解決的事情。

We will use this result in our computer simulation in next episode. Stay tuned!