#69 Fundamentals--Maximum Likelihood Theory 常識集_最大似然估計理論

Today we are going to talk about the maximum likelihood estimation, or the frequently used abbreviation MLE. It is a basic statistical concept which is a little bit like we are using probability in the reversed direction.

An Example   一個簡單的例子

You must have encountered some questions like this:

"Find the probability that, in three tosses of a fair coin, there are one head and two tails."
"拋擲一公平的銅板三次，求出得到一個正面兩個反面的機率。"

The answer is 3/8, of course. In this usual problem, the probability of certain event (in this case, the probability of head or tail) is known, and we try to calculate the probability of certain observed result (in this case, one head and two tails). However, in the real world, often the probability of certain event is the one we need to find out, and the observed results are the only certain thing.That is to say, in the world of MLE, our problems often looks like:

"We got one head and two tails in three tosses of a coin. What is the most likely probability of this coin to toss a head?"
"拋擲某銅板三次之後我們觀察到一次正面兩次反面。請問該銅板拋擲出正面的機率最可能是多少？"

In Mathematical Languages   用數學表示

Let's put our above discussion in mathematical form. Assume the probability of tossing a head = θ, and we designate the results with {x}. x = 1 for head while x = 0 for tail. Therefore, the usual probability problem is just like asking:

The "|" means "given" just like we did in conditional probability. However, in the world of MLE, we are interested in the opposite question:
"|" 表示已知。但是，在MLE的世界中我們感興趣的通常是這個問題：
The curvy L means likelihood function. Originally θ was taken as a known parameter while x was the random variable. However, in MLE, θ is now the random variable given certain observed x. The magnitude of the likelihood function equals to the joint density function for all observation. If each observation is independent from each other, the likelihood function could be written as:

And the most probable  θ would be the one that maximize likelihood function. And that's why the method is called "maximum likelihood estimation."

A real world example 一個真實世界的例子

Normally we will demonstrate how MLE recover our knowledge about "how to estimate a population mean by samples" or "how to estimate the best fit curve", which we might take it for granted. However, it would be so boring to just repeat them. So let's see the problem we will encounter in our series "Gradient Sensing."

In our previous post (see: http://biophys3min.blogspot.tw/2016/08/67-gradient-sensing-ii.html) we have said the slime mold has to estimate the background gradient by detecting where the inward particles flow in. And in our previous post (see: http://biophys3min.blogspot.tw/2016/08/68-gradient-sensing-iii.html) we also showed that

Assume that we have observed inward particle flows density:

What is the most probable background particle concentration and concentration gradient?

That is a standard maximum likelihood problem. We know our inward flow density should look like (the sign is not that matter here.)

However, because there are always some noise in it, so we will assume our flow density to be

The strange Y is the spherical harmonics we introduced in our previous post.

In which the ε follows a normal distribution:

Because we have to know the expression of conditional probability to build our likelihood function,  we simply build our likelihood function based on the normal distribution of the random error:

To maximize the likelihood function, we have to do partial differentiation with respect to each variable and solve the equation. However, it would be crazy to do these stuff on this likelihood function. Instead, we should maximize the log-likelihood function. That is:

In fact, the problem has reduced to a least square error problem. Most of the result could be found in our suggested reading of this series (Endres, R. G. & Wingreen, N. S. (2008). Accuracy of direct gradient sensing by single cells. PNAS 105(41): 15749-15754.) so we are not going to repeat them. We will just mention one. By partial differentiate above equation with respect to G0, we will get

and
Noted that the results of maximum likelihood are not always the same as the ones given by least square error method. MLE is much more general.

We will use this result in our computer simulation in next episode. Stay tuned!