In this post, we see how to use maximum likelihood to estimate the best parameters for a few of the most common distributions.

## Estimation of best parameter for iid Exponential Distributions

Let $$X_1, X_2, \dotsc, X_m$$ be a random sample from the exponential distribution with probability density functions of the form $$f_\theta(x) = \tfrac{1}{\theta}e^{-x/\theta}$$ for $$x >0$$ and any parameter $$\theta >0.$$ The likelihood function is then given as the product

$$\mathcal{L}(\theta; x_1,\dotsc, x_m) = f_\theta(x_1)\dotsb f_\theta(x_m) = \dfrac{1}{\theta^m} \exp \bigg( -\dfrac{1}{\theta} \sum_{k=1}^m x_k \bigg)$$

We look for the parameter value $$\theta>0$$ that offers an absolute maximum of $$\mathcal{L}.$$ Notice that, since the logarithm is a one-to-one increasing function, the maximum of $$\mathcal{L}$$ coincides with the maximum of $$\log\mathcal{L}.$$ The latter expression is easier to handle than the former, so we use this one to look for the extrema in the usual way:

Set $$g(\theta) = \log \mathcal{L}(\theta; x_1, \dotsc, x_m) = -m \log(\theta) - \tfrac{1}{\theta} \sum_{k=1}^m x_k;$$ it is then $$g’(\theta) = -\tfrac{m}{\theta} - \tfrac{1}{\theta^2}\sum_{k=1}^m x_k.$$ Note that $$g’(\theta) = 0$$ if and only if $$\theta = \tfrac{1}{m} \sum_{k=1}^m x_k,$$ which happens to be positive and actually a maximum of $$\mathcal{L}(\theta;x_1,\dotsc,x_m).$$

Note that the found parameter $$\theta$$ is nothing but the arithmetic mean $$\bar{x}$$ of $${x_1, \dotsc, x_m}.$$

## Estimation of best parameter for iid Geometric Distributions

In this case, the random sample $$X_1, \dotsc, X_m$$ for the Geometric distribution has probability density functions of the form $$f_p(n) = p (1-p)^{n-1}$$ for any $$n \in \mathbb{N}$$ and parameter $$p \in [0,1].$$ We operate as in the previous example, by looking for extrema of the log-likelihood function:

• Set $$\mathcal{L}(p;n_1,\dotsc,n_m) = p^m (1-p)^{-m+n_1+\dotsb+n_m}$$ for $$0 \leq p \leq 1.$$
• Consider $$g(p) = \log \mathcal{L}(p;n_1, \dotsc, n_m) = m\log(p) + \bigg( -m + \displaystyle{\sum_{k=1}^m} n_k \bigg) \log(1-p),$$ but only for $$0<p<1.$$
• It is then $$g’(p) = \dfrac{m}{p} - \dfrac{1}{1-p}\bigg( -m + \displaystyle{\sum_{k-1}^m} n_k\bigg)$$
• $$g’(p) = 0$$ if and only if $$p =\dfrac{m}{\sum_{k=1}^m n_k}.$$

This time, the solution $$p$$ coincides with the inverse of the arithmetic mean $$\bar{n}$$ of the samples $${n_1, \dotsc, n_m}$$ (which is trivially positive and less than one). It is not hard to prove that this critical point is a maximum, and therefore is the parameter that we are looking for.

## Estimation of best parameter for iid Poisson Distributions

The random variables in this case have probability density functions given by $$f_\lambda(n) = \dfrac{\lambda^n e^{-\lambda}}{n!}$$ for any $$n \in \mathbb{N},$$ and parameter $$\lambda>0.$$

• Set $$\mathcal{L}(\lambda;t_1,\dotsc,t_m) = e^{-m\lambda} \dfrac{\lambda^{n_1+\dotsb+n_m}}{n_1! \dotsb n_m!}.$$
• Set $$g(\lambda) = \log \mathcal{L}(\lambda; n_1, \dotsc, n_m) = -m\lambda -\log (n_1! \dotsb n_m!)+(\log \lambda) \displaystyle{\sum_{k=1}^m} n_k .$$
• Its derivative is given by $$g’(\lambda) = -m + \dfrac{1}{\lambda} \displaystyle{\sum_{k=1}^m} n_k.$$
• Note that $$g’(\lambda) = 0$$ only for $$\lambda = \dfrac{1}{m} \displaystyle{\sum_{k=1}^m} n_k,$$ which is trivially a maximum for $$\mathcal{L}.$$

As in the case of exponential distributions, the computed parameter $$\lambda$$ is the arithmetic mean $$\bar{n}$$ of $${n_1, \dotsc, n_m}.$$

## Estimation of best parameter for iid Normal Distributions

This case is a bit different, since we are dealing with two parameters instead of one: Assume $$X_1, X_2, \dotsc, X_m$$ is a random sample from the normal distribution with probability density functions of the form $$f_{\mu,\sigma}(t) = (2\pi\sigma^2)^{-1/2} \exp \big( - \tfrac{(t-\mu)^2}{2\sigma^2} \big)$$ for any $$t \in \mathbb{R},$$ and parameters $$\mu,\sigma \in \mathbb{R}.$$ For ease of computations below, and since the parameter $$\sigma$$ appears always squared on the expression of $$f$$, we prefer to work instead with $$f_{\mu,\theta}(t) = (2\pi\theta)^{-1/2} \exp \big( - \tfrac{(t-\mu)^2}{2\theta} \big),$$ and require the parameter $$\theta$$ to be non-negative. Note the abuse of notation, and how this does not really affect the final result. We proceed to compute the likelihood function and its logarithm as before:

• $$\mathcal{L}(\mu,\theta;t_1,\dotsc,t_m) = \bigg( \dfrac{1}{\sqrt{2\pi\theta}} \bigg)^m \exp \bigg( \dfrac{1}{2\theta} \displaystyle{\sum_{k=1}^m} (t_k-\mu)^2 \bigg)$$
• $$g(\mu,\theta) = \log\mathcal{L}(\mu,\theta;t_1,\dotsc,t_m) = -\dfrac{m}{2}\log(2\pi\theta) - \dfrac{1}{2\theta} \displaystyle{\sum_{k=1}^m} (t_k-\mu)^2.$$
• The partial derivatives of $$g$$ are given by
$$\dfrac{\partial g}{\partial \mu}(\mu,\theta) = \dfrac{1}{\theta} \displaystyle{\sum_{k=1}^m} (t_k - \mu),\qquad \dfrac{\partial g}{\partial \theta}(\mu,\theta) = -\dfrac{m}{2\theta} + \dfrac{1}{2\theta^2} \displaystyle{\sum_{k=1}^m} (t_k-\mu)^2$$
• Note that $$\dfrac{\partial g}{\partial \mu}(\mu,\theta) = 0$$ if and only if $$\mu = \dfrac{1}{m} \displaystyle{\sum_{k=1}^m} t_k.$$ Let us denote it by $$\bar{t},$$ since it represents the mean of the values $${ t_1, \dotsc, t_m }.$$
• Also, by virtue of the previous statement, a solution for $$\dfrac{\partial g}{\partial \theta}(\mu,\theta) = 0$$ is given uniquely by $$\theta = \dfrac{1}{m} \displaystyle{\sum_{k=1}^m} \big(t_k - \bar{t}\big)^2$$. Note that this value (which is positive, and hence satisfies the constraints) coincides with the variance $$s^2$$ of the set $${ t_1, \dotsc, t_m}.$$ It is a priori a valid parameter for $$\theta.$$
• It is not hard to see that the computed critical point $$(\mu,\theta) = (\bar{t}, s^2)$$ offers indeed an absolute maximum for $$\log\mathcal{L}(\mu,\theta;t_1,\dotsc,t_m).$$ Indeed, the Hessian of $$g$$ is given by:
\begin{align} H(g)(\mu,\theta) &= \begin{pmatrix} \tfrac{\partial^2 g}{\partial \mu^2} & \tfrac{\partial^2 g}{\partial \mu \partial \theta} \\ \tfrac{\partial^2 g}{\partial \theta \partial \mu} & \tfrac{\partial^2 g}{\partial \theta^2}\end{pmatrix} \bigg\rvert_{(\mu,\theta)=(\bar{t},s^2)} \\ &= \begin{pmatrix} -m/\theta & -\sum_{k=1}^m (t_k-\mu)/\theta^2 \\ -\sum_{k=1}^m (t_k-\mu)/\theta^2 & m/(2\theta^2) - \sum_{k=1}^m (t_k-\mu)^2/\theta^3 \end{pmatrix} \bigg\rvert_{(\mu,\theta)=(\bar{t},s^2)} \\ &= \begin{pmatrix} -m/s^2 & 0 \\ 0 & -m/(2s^4) \end{pmatrix}. \end{align}

Its determinant at $$(\mu,\theta) = (\bar{t},s^2)$$ is always positive: $$\det H(g)(\bar{t},s^2) = \dfrac{m^2}{2s^6},$$ and since $$\dfrac{\partial^2 g}{\partial \mu^2}(\bar{t},s^2) = -\dfrac{m}{s^2}$$ is always negative, a maximum is attained.