Edge detection: The Scale Space Theory

Consider an image as a bounded function \( f: \square_2 \to \mathbb{R} \) with no smoothness or structure assumptions a priori. Most relevant information of a given image is contained in the contours of the mapped objects: Think for example of a bright object against a dark background—the area where these two meet presents a curve where the intensity \( f(\boldsymbol{x}) \) varies strongly. This is what we refer to as an “edge.”

Initially, we may consider the process of detection of an edge by the simple computation of the gradient \( \nabla f(\boldsymbol{x}) = \big( \tfrac{\partial f}{\partial x_1}, \tfrac{\partial f}{\partial x_2} \big): \) This gradient should have a large intensity \( \lVert \nabla f(\boldsymbol{x}) \rVert \) and a direction \( \tfrac{\nabla f(\boldsymbol{x})}{\lVert \nabla f(\boldsymbol{x}) \rVert} \) which indicates the perpendicular to the curve. It therefore looks sound to simply compute the gradient of \( f \) and choose the points where these values are large. This conclusion is a bit unrealistic for two reasons:

The points where the gradient is larger than a given threshold are open sets, and thus don’t have the structure of curves.
Large gradient may arise in certain locations of the image due to tiny oscillations or noise, but completely unrelated to the objects being mapped. As a matter of fact, there is no reason to assume the existence or computability of any gradient at all in a given digital image.

The second objection is solved by defining a smoothing process: We associate with the image a smoothed version \( f_t(\boldsymbol{x}) \) depending upon a scale parameter \( t \) measuring the amount of smoothing. This smoothing can be made by convolution of the image with a gaussian kernel of increasing width:

\begin{equation} f_t(\boldsymbol{x}) = \big( f \ast G_t \big)(\boldsymbol{x}), \text{ where } G_t(\boldsymbol{x}) = \dfrac{1}{4\pi t} \exp \bigg( -\dfrac{x^2+y^2}{4t} \bigg). \end{equation}

The first objection is solved by defining edge points not as points where the gradient is large alone, but as points where some maximality property of the gradient is observed. Let us take an example in dimension one:

Let \( f: \mathbb{R} \to \mathbb{R} \) be a \( C^2 \) function and consider points where \( \lvert f’(x) \rvert \) is maximal. At these points, the second derivative might change sign. Thus, we can look for the “edge points” of the smooth signal among the points where \( f’‘(x) \) crosses zero.

Generalizing this idea in two dimensions leads to the Hildreth-Marr edge detection theory, or alternatively to the Canny edge detection theory. The only difference is that Hildreth and Marr replace, in dimension two, \( f’‘(x) \) by the Laplacian of \( f: \)

\begin{equation} \mathop{\Delta}f(\boldsymbol{x}) = \dfrac{\partial^2 f}{\partial x_1^2}(\boldsymbol{x})+\dfrac{\partial^2 f}{\partial x_2^2}(\boldsymbol{x}). \end{equation}

Canny instead, gives up the linearity of the Laplacian and defines edge points as points where the gradient is maximal in the direction of the gradient. This means that an edge point satisfies \( g’(0)=0, \) where

\begin{equation} g(t) = \bigg\lVert \nabla f \big( \boldsymbol{x} + t \tfrac{\nabla f(\boldsymbol{x})}{\lVert \nabla f(\boldsymbol{x}) \rVert} \big) \bigg\rVert. \end{equation}

Note that this condition is equivalent to the simpler:

\begin{equation} \begin{pmatrix} \dfrac{\partial f}{\partial x_1} & \dfrac{\partial f}{\partial x_2} \end{pmatrix} \cdot \begin{pmatrix} \dfrac{\partial^2 f}{\partial x_1^2}& \dfrac{\partial^2 f}{\partial x_1 \partial x_2}\\ \\ \dfrac{\partial^2 f}{\partial x_1 \partial x_2}& \dfrac{\partial^2 f}{\partial x_2^2} \end{pmatrix} \cdot \begin{pmatrix} \dfrac{\partial f}{\partial x_1}\\ \\ \dfrac{\partial f}{\partial x_2}\end{pmatrix} = 0. \end{equation}

Let us proceed to code these two edge detection algorithms: In both cases, we start by convolving the image with Gaussians \( G_t \) of increasing widths (usually for a sequence of dyadic scales \( t=2, 4, 8, 16, \dotsc, 2^m \)) [see Convolution with Gaussian kernels]

Hildreth-Marr

At each scale \( t, \) compute the zero-crossings of the Laplacian: those points \( \boldsymbol{x} \) where \( \nabla f(\boldsymbol{x}) \neq \boldsymbol{0}, \) and \( \mathop{\Delta}f \) changes sign.
Eliminate zero-crossings with small gradient.

Both gradient and Laplacian are easy to compute in sage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from numpy import *
import scipy.ndimage
import scipy.signal

image = scipy.misc.lena().astype(float32)

# Convolve with a Gaussian Gt for t=2
[X,Y]=numpy.mgrid[-6:7,-6:7]
A=X+i*Y
R=abs(A)
t=2.0
Gt=exp(-R*R/4.0/t)/4.0/pi/t
convImage = scipy.signal.convolve(image,Gt)

# Gradient
Grdnt = numpy.gradient(convImage)
# Magnitude of the gradient
magn  = sqrt( Grdnt[0]*Grdnt[0] + Grdnt[1]*Grdnt[1] )
# Laplacian
Lplz  = scipy.ndimage.laplace(float32(convImage))


Gradient (in blue) and perpendicular to the gradient (in red); detail from the center of the image

Magnitude of the gradient (`magn`)	Laplacian of the image (`Lplz`)

For the purposes of this post, we will consider that a pixel \( (x_1,x_2) \) is a zero-crossing, provided at least one of the following conditions holds:

Horizontal crossing:

\begin{equation} \mathop{\Delta} f(x_1-1,x_2) \cdot \mathop{\Delta} f(x_1+1,x_2) > 0. \end{equation}

Vertical crossing:

\begin{equation} \mathop{\Delta} f(x_1,x_2-1) \cdot \mathop{\Delta} f(x_1,x_2+1) >0. \end{equation}

NW-SE crossing:

\begin{equation} \mathop{\Delta} f(x_1-1,x_2-1) \cdot \mathop{\Delta} f(x_1+1,x_2+1) > 0. \end{equation}

NE-SW crossing:

\begin{equation} \mathop{\Delta} f(x_1+1,x_2-1) \cdot \mathop{\Delta} f(x_1-1,x_2+1) >0. \end{equation}

A quick way to code a matrix that collects all the pixels where any of these crossings occur can be done as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Four matrices holding the crossings...
WWEE = zeros(Lplz.shape)
NNSS = zeros(Lplz.shape)
NWSE = zeros(Lplz.shape)
NESW = zeros(Lplz.shape) 

WWEE[1:Lplz.shape[0]-1,:] = sign(multiply(Lplz[0:Lplz.shape[0]-2,:],
                                          Lplz[2:Lplz.shape[0],:]))>0
NNSS[:,1:Lplz.shape[1]-1] = sign(multiply(Lplz[:,0:Lplz.shape[1]-2],
                                          Lplz[:,2:Lplz.shape[1]]))>0
NWSE[1:Lplz.shape[0]-1,1:Lplz.shape[1]-1] =
         sign(multiply(Lplz[0:Lplz.shape[0]-2,0:Lplz.shape[1]-2],
                       Lplz[2:Lplz.shape[0],2:Lplz.shape[1]]))>0
NESW[1:Lplz.shape[0]-1,1:Lplz.shape[1]-1] =
         sign(multiply(Lplz[2:Lplz.shape[0],0:Lplz.shape[1]-2],
                       Lplz[0:Lplz.shape[0]-2,2:Lplz.shape[1]]))>0

# ...combined into a single matrix holding all the crossings
ZeroCrossings = maximum( maximum( maximum(NNSS, WWEE), NWSE), NESW)

It only remains to throw away those zero-crossings (below, left) where the magnitude of the gradient is small. I decided to set as threshold \( \theta = 3. \) In that case, the edge map (below, right) is computed as follows:

1
2
theta=3.0
Edges = multiply(ZeroCrossings, magn>theta)


`ZeroCrossings`	`Edges`

We need to repeat the same procedure on the smoothing \( f_t \) for different scales, say the dyadic \( t=2^k \) for \( k=2, \dotsc, 5, \) and then collect all the corresponding results. Note that the same threshold \( \theta=3 \) must be applied to the magnitude of the different gradients.

Canny

At each scale \( t, \) compute points \( \boldsymbol{x} \) satisfying \( \nabla f(\boldsymbol{x})\neq \boldsymbol{0}, \) and where the expression below changes sign:

At each scale \( t \) fix a threshold \( \theta(t) \) and discard the points \( \boldsymbol{x} \) computed in the previous step that satisfy \( \lVert \nabla f(\boldsymbol{x}) \rVert \leq \theta(t). \)

We start by computing a matrix N where we collect, for each index \( (x_1,x_2), \) the corresponding value of the expression above. We perform on this matrix a similar search for zero-crossings as we computed for the algorithm of Hildreth-Marr:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
D1,D2=numpy.gradient(convImage)
D11,D12=numpy.gradient(D1)
D21,D22=numpy.gradient(D2)
N = D1*D11*D1 + D1*D12*D2 + D2*D21*D1 + D2*D22*D2

WWEE[1:N.shape[0]-1,:] = sign(N[0:N.shape[0]-2,:]*N[2:N.shape[0],:])>0
NNSS[:,1:N.shape[1]-1] = sign(N[:,0:N.shape[1]-2]*N[:,2:N.shape[1]])>0
NWSE[1:N.shape[0]-1,1:N.shape[1]-1] =
          sign(N[0:N.shape[0]-2,0:N.shape[1]-2]*
               N[2:N.shape[0],2:N.shape[1]])>0
NESW[1:N.shape[0]-1,1:N.shape[1]-1] =
          sign(N[2:N.shape[0],0:N.shape[1]-2]*
               N[0:N.shape[0]-2,2:N.shape[1]])>0

ZeroCrossings = maximum( maximum( maximum(NNSS, WWEE), NWSE), NESW)

theta=3.0
Edges=multiply(ZeroCrossings, magn>theta)


`ZeroCrossings (Canny)`	`Edges (Canny)`

As in the algorithm of Hildreth-Marr, we need to repeat the same procedure on the smoothing \( f_t \) for different dyadic scales and collect all the corresponding results. The difference is that here we allow a scale-dependent threshold \( \theta(t). \) A selection that works well is, for example, to choose the mean value of the magnitudes of the gradient of \( f_t. \) I encourage you to play around with different images, and investigate what characteristics will dictate a good choice in this case.

Miscellaneous

The Scale Space Theory

What do we do with all different edge maps at different scales? For most images it is enough to “choose” the edge map at a certain scale—that one that visually satisfies whatever constraints we might require for a particular problem. Sometimes it pays off to collect them all, or start with the edges at the lowest level scale, and add a selection of the new edges from the larger scales… There is no single best method, as far as I am concerned.

Codes for graphics

The code that produced most of the images above is straightforward, except maybe the addition of the vector fields of the gradient and its normal on top of the windowed image. This was accomplished with the following commands:

1
2
3
4
5
6
7
8
9
10
11
12
13
matplotlib.pyplot.figure()
# Plot the Gradient (note the switching of derivatives and sign)
matplotlib.pyplot.quiver(-D2[300:450:4,300:450:4],D1[300:450:4,300:450:4],
                                  color='b', pivot='mid', units='x')
# Plot the perpendicular to the Gradient
matplotlib.pyplot.quiver(D1[300:450:4,300:450:4],D2[300:450:4,300:450:4],
                                  color='r', pivot='mid', units='x')
matplotlib.pyplot.quiver(-D1[300:450:4,300:450:4],-D2[300:450:4,300:450:4],
                                  color='r', pivot='mid', units='x')
# Place the corresponding window of the image as background
matplotlib.pyplot.gray()
matplotlib.pyplot.imshow(float32(convImage[300:450:4,300:450:4]))
matplotlib.pyplot.savefig('/Users/blanco/Desktop/quiver.png')

Note that the first and second derivatives seem to have switched places. One needs to be very careful when dealing with matrices of numbers, since the logical position of the pixels depends very heavily the software/language/libraries that handles them. With sage this is a constant issue, especially since we might be using different (and unrelated) libraries to deal with the same objects.