Suppose we have some three dimensional point data and we think that the data can be well described by a plane. We find this plane by minimising the distance between the plane and all the points using a least squares fit.

There are many tutorials of how to do this online. One of the more popular tries to minimise the the squares of the residuals as perpendicular to the main axis, not the residuals perpendicular to the plane. We want the residual perpendicular to the plane since this represents the minimum distance between each point and the plane.

# Mathematical formulation and the method of Lagrange multipliers

First, we try to formulate the problem mathematically. We describe a plane using a normal vector $\vec{n} = ( a, b, c )$ and a distance $d$ so that a point $\vec{p} = ( x, y, z )$ on the plane $\Pi$ satisfies $\vec{n} \cdot \vec{p} + d = 0$. This can also be written as

The distance of any other point $\vec{p}_0 = ( x_0, y_0, z_0 )$ to the plane $\Pi$ is given by

We can now state our problem precisely:

Problem 1.Given a set $P$ of $M$ points in$\mathbb{R}^3$,find a plane $\Pi$ which minimises the objective:$$ f( \Pi ) = \frac{1}{2} \sum_{\vec{p} \in P} \mathrm{dist}( \Pi, \vec{p} )^2. $$

We see that this problem is not well formed at the moment. The representation of the plane we have chosen is not unique! One way to fix this is to fix either $a, b$ or $c$ equal to $0$, but this does not allow all possible solutions. Instead we will fix that the normal should have unit length:

So we update our problem formulation:

Problem 2. Given a set $P$ of $M$ points in $\mathbb{R}^3$, find a plane $\Pi$ which minimises the objective: $$ f( \Pi ) = \frac{1}{2} \sum_{\vec{p} \in P} \mathrm{dist}( \Pi, \vec{p} )^2, $$subject to the constraint $$ g( \Pi ) = a^2 + b^2 + c^2 - 1 = 0. $$

In order to solve this **constraint problem** we use the method of
Lagrange
multipliers. We
introduce a new variable $\lambda$ and the *augmented Lagrangian*

If $\Pi^\ast$ is a minimiser of $f$, then there exists $\lambda^*$ such that $( \Pi^\ast, \lambda^\ast )$ is a stationary point of the augmented Lagrangian (where the partial derivatives with respect to $a,b,c,d,\lambda$ are all equal to $0$). Note, however, that not all stationary points of $\mathcal{L}$ are minimisers of the original problem.

So we update our problem formulation once more:

Problem 3. Given a set of $M$ points $P$ in $\mathbb{R}^3$, find a plane $\Pi$ and a Lagrange multiplier $\lambda$ which is a stationary point of the augmented Lagrangian: $$ \Lambda( \Pi, \lambda ) = \frac{1}{2} \sum_{\vec{p} \in P} \mathrm{dist}( \Pi, \vec{p} )^2 + \lambda ( a^2 + b^2 + c^2 - 1 ). $$

# Implementation using `python`

, `numpy`

and `scipy`

We implement using this simple
tutorial.
First, we import `numpy`

and define our test points

```
import numpy as np
pts = [ [ 1, 0, 1 ],
[ 0, 1, 1 ],
[ 1, 1, 1 ] ]
```

We have chosen a really simple test case to start with three points on the plane $z = 1$.

We define a function to compute the square of the distance. We don’t need the absolute distance so we only implement half the distance squared:

```
def dist2( pt, (a,b,c,d) ):
err = ( a*pt[0] + b*pt[1] + c*pt[2] + d )
return 0.5 * err*err / ( a*a + b*b + c*c )
```

We define the objective function that we wish to minimise

```
def func(X):
a,b,c,d,L = X
D = sum( [ dist2( pt, (a,b,c,d) ) for pt in pts ] )
return D + L * ( a*a + b*b + c*c - 1 )
```

We have rewritten the optimisation problem with respect to the vector $(a,b,c,d,\lambda) \in \mathbb{R}^5$.

We use a numerical finite different to compute the derivatives.
Alternatively, you can compute the derivatives by hand or using `sympy`

but this works for us here:

```
def dfunc(X):
dLambda = np.zeros(len(X))
h = 1e-8 # this is the step size used in the finite difference.
for i in range(len(X)):
dX = np.zeros(len(X))
dX[i] = h
dLambda[i] = (func(X+dX)-func(X-dX))/(2*h);
return dLambda
```

To solve the problem we use the `scipy.optimize`

toolbox and in
particular the function `fsolve`

. We pass `fsolve`

the derivative
`dfunc`

and an initial guess and it returns the vector solution

```
from scipy.optimize import fsolve
X1 = fsolve(dfunc, [1.0/np.sqrt(3), 1.0/np.sqrt(3), 1.0/np.sqrt(3), 0, 0])
```

To conclude we print out the answer:

```
print 'plane: ', X1[0], ' x + ', X1[1], ' y + ', X1[2], ' z + ', X1[3], ' = 0'
print 'Lagrange multiplier: ', X1[4]
print 'half sum of residuals squared:', func(X1)
```

Fantastic - we’ve got the result we wanted (up to some small errors)!!

plane: 6.97320721967e-11 x + 8.6697674477e-11 y + 1.00000000004 z + -1.00000000013 = 0

Lagrange multiplier: 2.02738966847e-12

half sum of residuals squared 2.92992847468e-21

# Full code

```
import numpy as np
pts = [ [ 1, 0, 1 ],
[ 0, 1, 1 ],
[ 1, 1, 1 ] ]
def dist2( pt, (a,b,c,d) ):
err = ( a*pt[0] + b*pt[1] + c*pt[2] + d )
return 0.5 * err*err / ( a*a + b*b + c*c )
def func(X):
a,b,c,d,L = X
D = sum( [ dist2( pt, (a,b,c,d) ) for pt in pts ] )
return D + L * ( a*a + b*b + c*c - 1 )
def dfunc(X):
dLambda = np.zeros(len(X))
h = 1e-8 # this is the step size used in the finite difference.
for i in range(len(X)):
dX = np.zeros(len(X))
dX[i] = h
dLambda[i] = (func(X+dX)-func(X-dX))/(2*h);
return dLambda
from scipy.optimize import fsolve
X1 = fsolve(dfunc, [1.0/np.sqrt(3), 1.0/np.sqrt(3), 1.0/np.sqrt(3), 0, 0])
print 'plane: ', X1[0], ' x + ', X1[1], ' y + ', X1[2], ' z + ', X1[3], ' = 0'
print 'Lagrange multiplier: ', X1[4]
print 'half sum of residuals squared:', func(X1)
```

# See also:

John D Cook has a nice blog post giving a geometric description of Lagrange multipliers.