#69 Fundamentals--Maximum Likelihood Theory 常識集

Today we are going to talk about the maximum likelihood estimation, or the frequently used abbreviation MLE. It is a basic statistical concept which is a little bit like we are using probability in the reversed direction.
今天我們要稍微繞過原本的梯度感應，來介紹最大似然估計理論(MLE)的觀念。他的觀念我們在梯度感應_二(http://biophys3min.blogspot.tw/2016/08/67-gradient-sensing-ii.html)中有提到，但我們今天要用數學的語言重新寫出來。

An Example 一個簡單的例子

You must have encountered some questions like this:
你一定解過下面這個問題：

"Find the probability that, in three tosses of a fair coin, there are one head and two tails."
"拋擲一公平的銅板三次，求出得到一個正面兩個反面的機率。"

The answer is 3/8, of course. In this usual problem, the probability of certain event (in this case, the probability of head or tail) is known, and we try to calculate the probability of certain observed result (in this case, one head and two tails). However, in the real world, often the probability of certain event is the one we need to find out, and the observed results are the only certain thing.That is to say, in the world of MLE, our problems often looks like:
答案顯然是3/8。在過去各位學過的情形中，特定事件發生的機率(拋擲出正面的機率)是已知的，而我們要求出某一組事件(拋擲三次得到一正二反)發生的機率。但是在現實世界中通常是反過來的，我們並不知道特定事件發生的機率，我們只知道最後我們觀察到甚麼結果。也就是說，在MLE的世界裡，問題通常會像這樣子：

"We got one head and two tails in three tosses of a coin. What is the most likely probability of this coin to toss a head?"
"拋擲某銅板三次之後我們觀察到一次正面兩次反面。請問該銅板拋擲出正面的機率最可能是多少？"

In Mathematical Languages 用數學表示

Let's put our above discussion in mathematical form. Assume the probability of tossing a head = θ, and we designate the results with {x}. x = 1 for head while x = 0 for tail. Therefore, the usual probability problem is just like asking:
把上面這串用數學表達。假設拋擲出正面的機率是 θ，把拋擲結果用{x}表示。擲出正面表示x=1而反面則是x=0。因此在一般的機率問題中我們想求的是：

$p(x|\theta)=?$

The "|" means "given" just like we did in conditional probability. However, in the world of MLE, we are interested in the opposite question:
"|" 表示已知。但是，在MLE的世界中我們感興趣的通常是這個問題：

$\mathcal{L}(\theta;x)=?$

The curvy L means likelihood function. Originally θ was taken as a known parameter while x was the random variable. However, in MLE, θ is now the random variable given certain observed x. The magnitude of the likelihood function equals to the joint density function for all observation. If each observation is independent from each other, the likelihood function could be written as:
那個彎彎曲曲的L表示 likelihood function。在我們過去的認知中θ是一個已知的參數，而x則是我們的隨機變數。但在MLE的世界中θ才是隨機變數，而x則是已知的觀察結果。Likelihood function的大小其實就是所有事件發生的聯合機率。如果每次觀測結果是彼此獨立的，聯合機率可以簡單的表示成：

$\mathcal{L}(\theta;x_1, x_2, ...)=p(x_1, x_2,...|\theta)=\prod_i p(x_i|\theta)$

And the most probable θ would be the one that maximize likelihood function. And that's why the method is called "maximum likelihood estimation."
而讓likelihood function大小最大的 θ 就是最可能的θ。這也就是為何這個方法被稱為"maximum likelihood estimation"。

A real world example 一個真實世界的例子

Normally we will demonstrate how MLE recover our knowledge about "how to estimate a population mean by samples" or "how to estimate the best fit curve", which we might take it for granted. However, it would be so boring to just repeat them. So let's see the problem we will encounter in our series "Gradient Sensing."
在正常的講解MLE的文章裡我們通常會用MLE去重現出「如何用樣本估計母體平均值」和「如何求出最可能直線」的結果。但這樣會有點無趣，所以我們直接來看看我們在梯度感應的時候遭遇到的問題。

In our previous post (see: http://biophys3min.blogspot.tw/2016/08/67-gradient-sensing-ii.html) we have said the slime mold has to estimate the background gradient by detecting where the inward particles flow in. And in our previous post (see: http://biophys3min.blogspot.tw/2016/08/68-gradient-sensing-iii.html) we also showed that
在第二集我們已經討論過黏菌必須藉由粒子流入的位置來偵測背景梯度，而且我們在第三集也計算過粒子流的密度可以表達成：

$j(\theta,\phi)=-(\frac{Dc_0}{a}+3D\triangledown c\cdot \overrightarrow{e}(\theta,\phi))\\ \textup{in which}\quad \overrightarrow{e}=(\cos\phi\sin\theta,\sin\phi\sin\theta,\cos\theta)$

Assume that we have observed inward particle flows density:
假設我們在時間T之內實際觀測到的粒子流密度為：

$\sigma_T^{\textup{obs}}=\sum_{i=1}^{N}\delta(\vec{r}-\vec{r_i})$

What is the most probable background particle concentration and concentration gradient?
請問最可能的粒子濃度與粒子濃度梯度為何？

That is a standard maximum likelihood problem. We know our inward flow density should look like (the sign is not that matter here.)
這其實就是一個標準的MLE式問題。我們已知理論的粒子流密度應該長成這樣子：(正負號只是取決於你把流入還是流出當成+所以不是很重要)

$j(\theta,\phi)=\frac{Dc_0}{a}+3D\triangledown c\cdot \overrightarrow{e}(\theta,\phi)$

However, because there are always some noise in it, so we will assume our flow density to be
但是因為現實世界總是有一些隨機的效應存在，所以我們參考上面把實際的粒子流密度表達成下面的形式：

$j(\theta,\phi)=C + \sum_{m=-1,0,1}G_mY_{l=1}^m(\theta, \phi)+\varepsilon(\theta,\phi)$

The strange Y is the spherical harmonics we introduced in our previous post.
那個奇奇怪怪的Y就是上一集講過的球諧函數
In which the ε follows a normal distribution:
而其中隨機效應 ε遵從下面的常態分布：

$\int\varepsilon(\theta,\phi)^2\textup{d}\Omega\sim \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2\sigma^2}\int(\varepsilon(\theta,\phi))^2\textup{d}\Omega}$

Because we have to know the expression of conditional probability to build our likelihood function, we simply build our likelihood function based on the normal distribution of the random error:
因為我們必須知道機率要如何表示，我們才有可能寫出Likelihood function。既然我們知道誤差的機率分布，那我們就可以利用他寫出我們的likelihood function：

$\mathcal{L}=\prod_{i=1}^{N} \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2\sigma^2}(\int(\delta(\vec{r}-\vec{r_i})-C-\sum_{m=-1,0,1}G_mY_{l=1}^m(\theta, \phi))^2\textup{d}\Omega)}$

To maximize the likelihood function, we have to do partial differentiation with respect to each variable and solve the equation. However, it would be crazy to do these stuff on this likelihood function. Instead, we should maximize the log-likelihood function. That is:
就是長得這麼醜。如果想要maximize他，我們還要對他做偏微分，這真的會起笑。所以通常我們會去maximize log-likelihood function 來替代：

$\log\mathcal{L}=\sum_{i=1}^{N} \frac{-1}{2}\log(2\pi\sigma^2)-\frac{1}{2\sigma^2}(\int(\delta(\vec{r}-\vec{r_i})-C-\sum_{m=-1,0,1}G_mY_{l=1}^m(\theta, \phi))^2\textup{d}\Omega)$

In fact, the problem has reduced to a least square error problem. Most of the result could be found in our suggested reading of this series (Endres, R. G. & Wingreen, N. S. (2008). Accuracy of direct gradient sensing by single cells. PNAS 105(41): 15749-15754.) so we are not going to repeat them. We will just mention one. By partial differentiate above equation with respect to G0, we will get
相乘變相加，讚讚讚。事實上如果您常做類似的事情，可以看出來原本的問題已經被我們變成一個最小方差問題了。上面這個問題的解可以從推薦文章裡面得到，只是重複的話有點抄襲的嫌疑所以我們留給讀者自行查閱。但其中有一個答案蠻重要的所以我們需要重複一下：

$G_0=\frac{\int\sigma_T^{\textup{obs}}Y_1^0(\theta,\phi)\textup{d}A}{\int|Y_1^0(\theta,\phi)|^2\textup{d}A}=\frac{\frac{1}{2}\sqrt{\frac{3}{\pi}}\int\sigma_T^{\textup{obs}}\cos\theta\textup{d}A}{a^2}$

and

$c_z=\frac{1}{6DT}\sqrt{\frac{3}{\pi}}G_0=\frac{\sum_{i=1}^N \cos\theta_i}{4\pi Da^2T}$

Noted that the results of maximum likelihood are not always the same as the ones given by least square error method. MLE is much more general.
要求出這個不會很困難，只要對G0偏微分然後=0，查一下球諧函數是怎麼表達的，再對照一下原本我們假設的functional form，就可以得到了。有一件事情要注意的是，MLE並不總是等於最小方差問題，他可以解決很多最小方差問題無法解決的事情。

We will use this result in our computer simulation in next episode. Stay tuned!
我們會在下一集的電腦模擬中利用到上面的結果，敬請期待囉~。

Three Minute Biophysics

Search This Blog

Featured Post

#77 Brownian Carnot engine-I 微觀卡諾引擎-I