## Resources

Vandenberghe Lectures

## Notation

variable dimension name
$$\mathbf{y}_i$$ $$\mathbb{R}^{(M \times 1)}$$ ith predictor
$$\mathbf{x}_i$$ $$\mathbb{R}^{(P \times 1)}$$ ith state
$$\alpha_i$$ $$\mathbb{R}^{(1 \times 1)}$$ ith sample weight
$$\mathbf{w}$$ $$\mathbb{R}^{(P \times M)}$$ weights
$$\mathbf{Y}$$ $$\mathbb{R}^{(N \times M)}$$ all predictors
$$\mathbf{X}$$ $$\mathbb{R}^{(N \times P)}$$ all states
$$\boldsymbol{\alpha}$$ $$\mathbb{R}^{(N \times N)}$$ identity matrix with columns entries being data point weights

## Weighted Gaussian Linear regression

The log-likelihood of dataset with $$N$$ weighted samples $$\mathcal{D} = \{\mathbf{x}_i, \mathbf{y}_i, \alpha_i \}_{i=1:N}$$ which is modeled by a linear gaussian function is given by:

$$L(\mathcal{D};\boldsymbol{\theta}) \triangleq \sum_{i=1}^N \log p(\mathbf{y}_i|\mathbf{x}_i;\boldsymbol{\theta}) \, \alpha_i$$

where $$p(\cdot)$$ is a Gaussian probability density function:

$p(\mathbf{y}_i|\mathbf{x}_i;\boldsymbol{\theta}) = \frac{1}{(2\pi)^{D/2}|\boldsymbol{\Sigma}|^{1/2}} \exp\Bigg(-\frac{1}{2} (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \Bigg)$

with parameters $$\boldsymbol{\theta} \triangleq \{\mathbf{w}, \boldsymbol{\Sigma}\}$$.

### Expansion of the log-likelihood

First without considering the weights $$\alpha$$ we simplify $$L(\mathcal{D};\boldsymbol{\theta})$$

\begin{align} &=\sum_{i=1}^N \log 1 - \log( (2\pi)^{D/2}|\boldsymbol{\Sigma}|^{1/2}) - \frac{1}{2} (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \\ &=\sum_{i=1}^N - \frac{1}{2}\log((2\pi)^{D}|\boldsymbol{\Sigma}|) - \frac{1}{2} (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \\ &=\sum_{i=1}^N - \frac{1}{2}\log((2\pi)^{D}) - \frac{1}{2}\log |\boldsymbol{\Sigma}| - \frac{1}{2} (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)\\ &= \sum_{i=1}^N - \frac{D}{2}\log(2\pi) - \frac{1}{2}\log |\boldsymbol{\Sigma}| - \frac{1}{2} (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \\ &= -\frac{N\,D}{2}\log(2\pi) - \frac{N}{2}\log |\boldsymbol{\Sigma}| - \frac{1}{2} \sum_{i=1}^N (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \end{align}

Simplifying with the weights:

\begin{align} &= \sum_{i=1}^N - \frac{D}{2}\log(2\pi)\alpha_i - \frac{1}{2}\log |\boldsymbol{\Sigma}|\alpha_i - \frac{1}{2} (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \alpha_i\\ &= - \frac{D}{2}\log(2\pi)\left(\sum_{i=1}^N \alpha_i\right) - \frac{1}{2}\log |\boldsymbol{\Sigma}|\left(\sum_{i=1}^N \alpha_i\right) - \frac{1}{2} \sum_{i=1}^N (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \alpha_i \end{align}

## 1D Maximum likelihood

Given that we are in the 1D case $$\boldsymbol{\Sigma} = \sigma$$, $$D=1$$

\begin{align} &L(\mathcal{D};\boldsymbol{\theta}) = \\ &-\frac{1}{2}\log(2\pi)\left(\sum_{i=1}^N \alpha_i\right) - \frac{1}{2}\log \sigma^2 \left(\sum_{i=1}^N \alpha_i\right) - \frac{1}{2\, \sigma^2} \sum_{i=1}^N (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i) \alpha_i\\ &= - \frac{1}{2}\log(2\pi \sigma^2)\left(\sum_{i=1}^N \alpha_i\right) - \frac{1}{2\, \sigma^2} \sum_{i=1}^N (y - \mathbf{w}^{T}\mathbf{x}_i)^2\alpha_i \end{align}

Set the derivaties with respect to the parameters to zero, $$\frac{\partial L}{\partial \mathbf{w}} = 0$$ and $$\frac{\partial L}{\partial \sigma^2} = 0$$, and solve for $$\mathbf{w}$$ and $$\sigma^2$$:

### maximise weights

$\frac{\partial L}{\partial \mathbf{w}} = \frac{1}{2\, \sigma^2} \sum_{i=1}^N \frac{\partial}{\partial \mathbf{w}} (y_i - \mathbf{w}^{T}\mathbf{x}_i)^2\alpha_i$ \begin{align} 0 &= \frac{1}{2\, \sigma^2} \sum_{i=1}^N - 2\, (\mathbf{Y}_i - \mathbf{w}^{T}\mathbf{x}_i)\, \mathbf{x}_i\, \alpha_i\\ 0 &= \sum_{i=1}^N - \, (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)\, \mathbf{x}_i\, \alpha_i\\ 0 &= -\sum_{i=1}^N \mathbf{y}_i\, \mathbf{x}_i\, \alpha_i + \sum_{i=1}^N \mathbf{w}^{T}\mathbf{x}_i\, \mathbf{x}_i\, \alpha_i \\ 0 &= -\mathbf{X}^{T}\boldsymbol{\alpha}\mathbf{Y} + \mathbf{w} \mathbf{X}^{T}\boldsymbol{\alpha}\mathbf{X}\\ \mathbf{w} \mathbf{X}^{T}\boldsymbol{\alpha}\mathbf{X} &= \mathbf{X}^{T}\boldsymbol{\alpha}\mathbf{Y}\\ \mathbf{w} &= (\mathbf{X}^{T}\boldsymbol{\alpha}\mathbf{X})^{-1}\mathbf{X}^{T}\boldsymbol{\alpha}\mathbf{Y} \end{align}

### maximise variance

\begin{align} \frac{\partial L}{\partial \sigma^2} &= \frac{\partial \left[-\frac{1}{2}\log(2\pi)\left(\sum_{i=1}^N \alpha_i\right) -\frac{1}{2}\log(\sigma^2)\left(\sum_{i=1}^N \alpha_i\right) - \frac{1}{2\, \sigma^2} \sum_{i=1}^N (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^2\alpha_i\right]}{\partial \sigma^2}\\ &= -\frac{1}{2 \sigma^2}\left(\sum_{i=1}^N \alpha_i\right) + \frac{1}{2\, \sigma^4} \sum_{i=1}^N (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^2\alpha_i \\ 0 &= -\left(\sum_{i=1}^N \alpha_i\right) + \frac{1}{\sigma^2} \sum_{i=1}^N (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^2\alpha_i\\ \sigma^2 \left(\sum_{i=1}^N \alpha_i\right) &= \sum_{i=1}^N (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^2\alpha_i \\ \sigma^2 &= \frac{1}{\left(\sum_{i=1}^N \alpha_i\right)} \sum_{i=1}^N (\mathbf{y}_i - \mathbf{w}^{T}\mathbf{x}_i)^2\alpha_i \end{align}