Thứ Ba, 25 tháng 2, 2014

Tài liệu Image and Videl Comoression P12 doc


© 2000 by CRC Press LLC

It is pointed out by Bertero et al. (1988) that all the low-level processing (also known as early
vision) in computational vision are inverse problems and are often ill posed. Examples in low-level
processing include motion recovery, computation of optical flow, edge detection, structure from
stereo, structure from motion, structure from texture, shape from shading, and so on. Fortunately,
the problem with early vision is mildly ill posed in general. By

mildly

, we mean that a reduction
of errors in the data can significantly improve the solution.
Since the early 1960s, the demand for accurate approximates and stable solutions in areas such
as optics, radioastronomy, microscopy, and medical imaging has stimulated great research efforts
in inverse problems, resulting in a unified theory: the regularization theory of ill-posed problems
(Tikhonov and Arsenin, 1977). In the discussion of optical flow methods, we shall see that some
regularization techniques have been posed and have improved accuracy in flow determination.
More-advanced algorithms continue to come.

13.1.4 C

LASSIFICATION



OF

O

PTICAL

F

LOW

T

ECHNIQUES

Optical flow in image sequences provides important information regarding both motion and struc-
ture, and it is useful in such diverse fields as robot vision, autonomous navigation, and video coding.
Although this subject has been studied for more than a decade, reducing the error in the flow
estimation remains a difficult problem. A comprehensive review and a comparison of the accuracy
of various optical flow techniques have recently been made (Barron et al., 1994). So far, most of
the techniques in the optical flow computations use one of the following basic approaches:
• Gradient-based (Horn and Schunck, 1981; Lucas and Kanade, 1981; Nagel and Enkel-
man, 1986; Uras et al., 1988; Szeliski et al., 1995; Black and Anandan, 1996),
• Correlation-based (Anandan, 1989; Singh, 1992; Pan et al., 1998),
• Spatiotemporal energy-based (Adelson and Bergen, 1985; Heeger, 1988; Bigun et al.,
1991),
• Phase-based (Waxman et al., 1988; Fleet and Jepson, 1990).
Besides these deterministic approaches, there is the stochastic approach to optical flow com-
putation (Konrad and Dubois, 1992). In this chapter we focus our discussion of optical flow on the
gradient-based and correlation-based techniques because of their frequent applications in practice
and fundamental importance in theory. We also discuss multiple attribute techniques in optical flow
determination. The other two approaches will be briefly touched upon when we discuss new
techniques in motion estimation in the next chapter.

13.2 GRADIENT-BASED APPROACH

It is noted that before the methods of optical flow determination were actually developed, optical
flow had been discussed and exploited for motion and structure recovery from image sequences in
computer vision for years. That is, the optical flow field was assumed to be available in the study
of motion recovery. The first type of methods in optical flow determination is referred to as gradient-
based techniques. This is because the spatial and temporal partial derivatives of intensity function
are utilized in these techniques. In this section, we present the Horn and Schunck algorithm. It is
regarded as the most prominent representative of this category. After the basic concepts are pre-
sented, some other methods in this category are briefly discussed.

13.2.1 T

HE

H

ORN



AND

S

CHUNCK

M

ETHOD

We shall begin with a very general framework (Shi et al., 1994) to derive a brightness time-
invariance equation. We then introduce the Horn and Schunck method.

© 2000 by CRC Press LLC

13.2.1.1 Brightness Invariance Equation

As stated in Chapter 10, the imaging space can be represented by
, (13.1)
where indicates the sensor’s position in 3-D world space, i.e., the coordinates of the sensor center
and the orientation of the optical axis of the sensor. The is a 5-D vector. That is, where (
˜

x

,
˜

y

,
˜

z

,

b

,

g

), where
˜

x

,
˜

y

, and
˜

z

represent the coordinate of the optical center of the sensor in 3-D world
space; and

b

and

g

represent the orientation of the optical axis of the sensor in 3-D world space,
the Euler angles, pan and tilt, respectively.
With this very general notion, each picture, which is taken by a sensor located on a particular
position at a specific moment, is merely a special cross section of this imaging space. Both temporal
and spatial image sequences become a proper subset of the imaging space.
Assume now a world point

P

in 3-D space that is perspectively projected onto the image plane
as a pixel with the coordinates

x

P

and

y

P

. Then,

x

P

and

y

P

are also dependent on

t

and . That is,
(13.2)
If the optical radiation of the world point

P

is invariant with respect to the time interval from

t

1

to

t

2

, we then have
(13.3)
This is the brightness time-invariance equation.
At a specific moment

t

1

, if the optical radiation of

P

is isotropical we then get
(13.4)
This is the brightness space-invariance equation.
If both conditions are satisfied, we get the brightness time-and-space-invariance equation, i.e.,
(13.5)
Consider two brightness functions

f

(

x

(

t

, ),

y

(

t

, ),

t

, ) and

f

(

x

(

t

+

D

t

, +

D

),

y

(

t

+

D

t

, +

D

),

t

+

D

t

, +

D

) in which the variation in time,

D

t

, and the variation in the spatial position of
the sensor,

D

, are very small. Due to the time-and-space-invariance of brightness, we can get
(13.6)
The expansion of the right-hand side of the above equation in the Taylor series at (

t,

) and the
use of Equation 13.5 lead to
(13.7)
fxyts,,,
v
()
v
s
v
s
v
s
v
s
f f x ts y ts ts
PP
=
()()
()
,, ,,,.
vvv
fxts yts ts fxts yts ts
PP P P11 11 11 21 21 21
,, ,,, ,, ,,,.
vvv v vv
()()
()
=
()()
()
fx ts y ts ts fx ts y ts ts
PP P P11 11 11 12 12 12
,, ,,, ,, , ,, .
vvv v vv
()()
()
=
()()
()
fxts yts ts fxts yts ts
PP P P11 11 11 2 2 2 2 22
,, ,,, ,, ,,, .
vvv v vv
()()
()
=
()()
()
v
s
v
s
v
s
v
s
v
s
v
s
v
s
v
s
v
s
v
s
f xts yts ts f xt ts s yt ts s t ts s,, ,,, , , , , , .
vvv vv v vv
()()
()
=++
()
++
()
++
()
DD DD DD
v
s


+


+


Ê
Ë
Á
ˆ
¯
˜
+


+


+


Ê
Ë
Á
ˆ
¯
˜
+=
f
x
u
f
y
v
f
t
t
f
x
u
f
y
v
f
s
s
ss
DD
vv
v
v
e 0,
© 2000 by CRC Press LLC
where
If D = 0, i.e., the sensor is static in a fixed spatial position (in other words, both the coordinate
of the optical center of the sensor and its optical axis direction remain unchanged), dividing both
sides of the equation by Dt and evaluating the limit as Dt Æ 0 degenerate Equation 13.7 into
(13.8)
If Dt = 0, both its sides are divided by D , and D Æ 0 is examined. Equation 13.7 then reduces to
(13.9)
When Dt = 0, i.e., at a specific time moment, the images generated with sensors at different spatial
positions can be viewed as a spatial sequence of images. Equation 13.9 is, then, the equation for
the spatial sequence of images.
For the sake of brevity, we will focus on the gradient-based approach to optical flow determi-
nation with respect to temporal image sequences. That is, in the rest of this section we will address
only Equation 13.8. It is noted that the derivation can be extended to spatial image sequences. The
optical flow technique for spatial image sequences is useful in stereo image data compression. It
plays an important role in motion and structure recovery. Interested readers are referred to Shi et al.
(1994) and Shu and Shi (1993).
13.2.1.2 Smoothness Constraint
A careful examination of Equation 13.8 reveals that we have two unknowns: u and v, i.e., the
horizontal and vertical components of an optical flow vector at a three-tuple (x, y, t), but only one
equation to relate them. This once again demonstrates the ill-posed nature of optical flow determi-
nation. This also indicates that there is no way to compute optical flow by considering a single
point of the brightness pattern moving independently. As stated in Section 13.1.3, some regular-
ization measure — here an extra constraint — must be taken to overcome the difficulty.
A most popularly used constraint was proposed by Horn and Schunck and is referred to as the
smoothness constraint. As the name implies, it constrains flow vectors to vary from one to another
smoothly. Clearly, this is true for points in the brightness pattern most of the time, particularly for
points belonging to the same object. It may be violated, however, along moving boundaries.
Mathematically, the smoothness constraint is imposed in optical flow determination by minimizing
the square of the magnitude of the gradient of the optical flow vectors:
(13.10)
It can be easily verified that the smoother the flow vector field, the smaller these quantities. Actually,
the square of the magnitude of the gradient of intensity function with respect to the spatial
coordinates, summed over a whole image or an image region, has been used as a smoothness
u
x
t
v
y
t
u
x
s
u
s
ss
AAA
=


=


=


=


, , , .
˙˙ ˙˙ ˙˙
vv
vv
y
v
s


+


+


=
f
x
u
f
y
v
f
t
0.
v
s
v
s


+


+


=
f
x
u
f
y
v
f
s
ss
vv
v
0.


Ê
Ë
Á
ˆ
¯
˜
+


Ê
Ë
Á
ˆ
¯
˜
+


Ê
Ë
Á
ˆ
¯
˜
+


Ê
Ë
Á
ˆ
¯
˜
u
x
u
y
v
x
v
y
2
2
2
2
.
© 2000 by CRC Press LLC
measure of the image or the image region in the digital image processing literature (Gonzalez and
Woods, 1992).
13.2.1.3 Minimization
Optical flow determination can then be converted into a minimization problem.
The square of the left-hand side of Equation 13.8, which can be derived from the brightness
time-invariance equation, represents one type of error. It may be caused by quantization noise or
other noises and can be written as
(13.11)
The smoothness measure expressed in Equation 13.10 denotes another type of error, which is
(13.12)
The total error to be minimized is
(13.13)
where a is a weight between these two types of errors. The optical flow quantities u and v can be
found by minimizing the total error. Using the calculus of variation, Horn and Schunck derived
the following pair of equations for two unknown u and v at each pixel in the image.
(13.14)
where

2
denotes the Laplacian operator. The Laplacian operator of u and v are defined below.
(13.15)
e
b
f
x
u
f
y
v
f
t
2
2
=


+


+


Ê
Ë
Á
ˆ
¯
˜
.
e
s
u
x
u
y
v
x
v
y
2
2
2
2
2
=


Ê
Ë
Á
ˆ
¯
˜
+


Ê
Ë
Á
ˆ
¯
˜
+


Ê
Ë
Á
ˆ
¯
˜
+


Ê
Ë
Á
ˆ
¯
˜
.
eeae
a
2222
2
2
2
2
2
2
=+
=


+


+


Ê
Ë
Á
ˆ
¯
˜
+


Ê
Ë
Á
ˆ
¯
˜
+


Ê
Ë
Á
ˆ
¯
˜
+


Ê
Ë
Á
ˆ
¯
˜
+


Ê
Ë
Á
ˆ
¯
˜
È
Î
Í
Í
˘
˚
˙
˙
ÂÂ
ÂÂ
b
s
yx
yx
f
x
u
f
y
v
f
t
u
x
u
y
v
x
v
y
,
fu ffv u ff
ffu fv v ff
xxy xt
xy y yt
222
222
+=—-
+=—-
Ï
Ì
Ô
Ó
Ô
a
a
,
f
f
x
f
f
y
f
f
t
ytx
=


=


=


, , ;
—=


+


—=


+


2
2
2
2
2
2
2
2
2
2
u
u
x
u
y
v
v
x
v
y
.
© 2000 by CRC Press LLC
13.2.1.4 Iterative Algorithm
Instead of using the classical algebraic method to solve the pair of equations for u and v, Horn and
Schunck adopted the Gaussian Seidel (Ralston and Rabinowitz, 1978) method to have the following
iterative procedure:
(13.16)
where the superscripts k and k + 1 are indexes of iteration and

u,

v are the local averages of u and
v, respectively.
Horn and Schunck define

u,

v as follows:
(13.17)
The estimation of the partial derivatives of intensity function and the Laplacian of flow vectors
need to be addressed. Horn and Schunck considered a 2 ¥ 2 ¥ 2 spatiotemporal neighborhood,
shown in Figure 13.3, for estimation of partial derivatives f
x
, f
y
, and f
t
. Note that replacing the first-
order differentiation by the first-order difference is a common practice in managing digital images.
The arithmetic average can remove the noise effect, thus making the obtained first-order differences
less sensitive to various noises.
The Laplacian of u and v are approximated by
(13.18)
Equivalently, the Laplacian of u and v, —
2
(u) and —
2
(v), can be obtained by applying a 3 ¥ 3 window
operator, shown in Figure 13.4, to each point in the u and v planes, respectively.
Similar to the pel recursive technique discussed in the previous chapter, there are two different
ways to iterate. One way is to iterate at a pixel until a solution is steady. Another way is to iterate
only once for each pixel. In the latter case, a good initial flow vector is required and is usually
derived from the previous pixel.
13.2.2 MODIFIED HORN AND SCHUNCK METHOD
Observing that the first-order difference is used to approximate the first-order differentiation in
Horn and Schunck’s original algorithm, and regarding this as a relatively crude form and a source
uu
ffu fv f
ff
vv
ffu fv f
ff
kk
xx
k
y
k
t
xy
kk
yx
k
y
k
t
xy
+
+
=-
++
[]
++
=-
++
[]
++
1
222
1
222
a
a
,
u uxy uxy ux y ux y
ux y ux y ux y ux y
vvxy vxy vx
=+
()
+-
()
++
()
+-
()
{}
+
()
+-+
()
++-
()
+++
()
{}
=+
()
+-
()
++
1
6
1111
1
12
11 11 11 11
1
6
111
,, ,,
,,,,
,, ,yyvx y
vx y vx y vx y vx y
()
+-
()
{}
+
()
+-+
()
++-
()
+++
()
{}
1
1
12
11 11 11 11
,
,,,,.
—=
()
-
()
—=
()
-
()
2
2
uuxy uxy
vvxy vxy
,,
,,.
© 2000 by CRC Press LLC
of error, Barron, Fleet, and Beauchemin developed a modified version of the Horn and Schunck
method (Barron et al., 1994).
It features a spatiotemporal presmoothing and a more-advanced approximation of differentia-
tion. Specifically, it uses a Gaussian filter as a spatiotemporal prefilter. By the term Gaussian filter,
we mean a low-pass filter with a mask shaped similar to that of the Gaussian probability density
function. This is similar to what was utilized in the formulation of the Gaussian pyramid, which
was discussed in Chapter 11. The term spatiotemporal means that the Gaussian filter is used for
low-pass filtering in both spatial and temporal domains.
With respect to the more-advanced approximation of differentiation, a four-point central dif-
ference operator is used, which has a mask, shown in Figure 13.5.
As we will see later in this chapter, this modified Horn and Schunck algorithm has achieved
better performance than the original one as a result of the two above-mentioned measures. This
success indicates that a reduction of noise in image (data) leads to a significant reduction of noise
in optical flow (solution). This example supports the statement we mentioned earlier that the ill-
posed problem in low-level computational vision is mildly ill posed.
FIGURE 13.3 Estimation of f
x
, f
y
, and f
t
.
f fxytfxyt fxyt fxyt
fx y t fxyt fx y t fxy t
ffx
x
y
=+
()
-
()
[]
+++
()
-+
()
[]
{
+++
()
-
()
[]
++++
()
-++
()
[]
}
=
1
4
1111
11 111 11
1
4
,, ,, ,, ,,
,, ,, ,, ,,
,
yy t fxyt fx y t fx yt
fxy t fxyt fx y t fx yt
ffxyt
x
+
()
-
()
[]
+++
()
-+
()
[]
{
+++
()
-+
()
[]
++++
()
-+ +
()
[]
}
=+
1111
11 1 1 11 1 1
1
4
1
,,, ,, ,,
,, ,, ,, ,,
,,
(()
-
()
[]
+++
()
-+
()
[]
{
+++
()
-+
()
[]
++++
()
-++
()
[]
}
fxyt fx yt fx yt
fxy t fxy t fx y t fx y t
,, ,, ,,
,, ,, ,, ,,
11 1
11 1 1 11 1 1
© 2000 by CRC Press LLC
13.2.3 THE LUCAS AND KANADE METHOD
Lucas and Kanade assume a flow vector is constant within a small neighborhood of a pixel, denoted
by W. Then they form a weighted object function as follows.
(13.19)
where w(x, y) is a window function, which gives more weight to the central portion than the
surrounding portion of the neighborhood W.
The flow determination thus becomes a problem of a least-square fit of the brightness invariance
constraint. We observe that the smoothness constraint has been implied in Equation 13.19, where
the flow vector is assumed to be constant within W.
FIGURE 13.4 A 3 ¥ 3 window operation for estimation of the Laplacian of flow vector.
FIGURE 13.5 Four-point central difference operator mask.
—ª -
()
+-
()
++
()
++
()
[]
+
()
+-+
()
++-
()
+++
()
[]
-
()
—ª -
()
+
2
2
1
6
1111
1
12
11 11 11 11
1
6
1
u ux y uxy uxy ux y
ux y ux y ux y ux y
uxy
vvxyvx
,, , ,
,,,,
,
,
,, , ,
,,,,
,
yvxyvxy
vx y vx y vx y vx y
vxy
-
()
++
()
++
()
[]
+
()
+-+
()
++-
()
+++
()
[]
-
()
111
1
12
11 11 11 11
wxy
fxyt
x
u
fxyt
v
v
fxyt
t
xy
2
2
,
,, ,, ,,
,
,
()

()

+

()

+

()

È
Î
Í
˘
˚
˙
()
Œ
Â
W
© 2000 by CRC Press LLC
13.2.4 THE NAGEL METHOD
Nagel first used the second-order derivatives in optical flow determination in the very early days
(Nagel, 1983). Since the brightness function f (x, y, t, ) is a real-valued function of multiple
variables (or a vector of variables), the Hessian matrix, discussed in Chapter 12, is used for the
second-order derivatives.
An oriented-smoothness constraint was developed by Nagel that prohibits imposition of the
smoothness constraint across edges, as illustrated in Figure 13.6. In the figure, an edge AB separates
two different moving regions: region 1 and region 2. The smoothness constraint is imposed in these
regions separately. That is, no smoothness constraint is imposed across the edge. Obviously, it
would be a disaster if we smoothed the flow vectors across the edge. As a result, this reasonable
treatment effectively improves the accuracy of optical flow estimation (Nagel, 1989).
13.2.5 THE URAS, GIROSI, VERRI, AND TORRE METHOD
The Uras, Girosi, Verri, and Torre method is another method that uses second-order derivatives.
Based on a local procedure, it performs quite well (Uras et al., 1988).
13.3 CORRELATION-BASED APPROACH
The correlation-based approach to optical flow determination is similar to block matching, covered
in Chapter 11. As may be recalled, the conventional block-matching technique partitions an image
into nonoverlapped, fixed-size, rectangular blocks. Then, for each block, the best matching in the
previous image frame is found. In doing so, a search window is opened in the previous frame
according to some a priori knowledge: the time interval between the two frames and the maximum
possible moving velocity of objects in frames. Centered on each of the candidate pixels in the
search window, a rectangle correlation window of the same size as the original block is opened.
The best-matched block in the search window is chosen such that either the similarity measure is
maximized or the dissimilarity measure is minimized. The relative spatial position between these
two blocks (the original block in the current frame and the best-matched one in the previous frame)
gives a translational motion vector to the original block. In the correlation-based approach to optical
flow computation, the mechanism is very similar to that in conventional block matching. The only
difference is that for each pixel in an image, we open a rectangle correlation window centered on
this pixel for which an optical flow vector needs to be determined. It is for this correlation window
that we find the best match in the search window in its temporal neighboring image frame. This
is shown in Figure 13.7. A comparison between Figures 13.7 and 11.1 can convince us about the
FIGURE 13.6 Oriented-smoothness constraint.
v
s
© 2000 by CRC Press LLC
above observation. In this section, we first briefly discuss Anandan’s method, which is pioneer
work in this category. Then Singh’s method is described. His unified view of optical flow compu-
tation is introduced. We then present a correlation-feedback method by Pan, Shi, and Shu, which
uses the feedback technique in flow calculation.
13.3.1 THE ANANDAN METHOD
As mentioned in Chapter 11, the sum of squared difference (SSD) is used as a dissimilarity measure
in (Anandan, 1987). It is essentially a simplified version of the well-known mean square error
(MSE). Due to its simplicity, it is used in the methods developed by Singh (1992), and Pan, Shi,
and Shu (1998).
In the Anandan method (Anandan, 1989), a pyramid structure is formed, and it can be used
for an efficient coarse-fine search. This is very similar to the multiresolution block-matching
techniques discussed in Chapter 11. In the higher levels (with lower resolution) of the pyramid, a
full search can be performed without a substantial increase in computation. The estimated velocity
(or displacement) vector can be propagated to the lower levels (with higher resolution) for further
refinement. As a result, a relatively large motion vector can be estimated with a certain degree of
accuracy.
Instead of the Gaussian pyramid discussed in Chapter 11, however, a Laplacian pyramid is used
here. To understand the Laplacian pyramid, let us take a look at Figure 13.8(a). There two consec-
utive levels are shown in a Gaussian pyramid structure: level k, denoted by f
k
(x, y), and level k + 1,
f
k+1
(x, y). Figure 13.8(b) shows how level k + 1 can be derived from level k in the Gaussian pyramid.
That is, as stated in Chapter 11, level k + 1 in the Gaussian pyramid can be obtained through low-
pass filtering applied to level k, followed by subsampling. Figure 13.8(c), level k + 1 is first
interpolated, thus producing an estimate of level k,
ˆ
f
k
(x, y). The difference between the original
level k and the interpolated estimate of level k generates an error at level k, denoted by e
k
(x, y). If
there are no quantization errors involved, then level k, f
k
(x, y) can be recovered completely from
the interpolated estimate of level k,
ˆ
f
k
(x, y), and the error at level k, e
k
(x, y). That is,
(13.20)
With quantization errors, however, the recovery of level k, f
k
(x, y) is not error free. It can be shown
that coding
ˆ
f
k
(x, y) and e
k
(x, y) is more efficient than directly coding f
k
(x, y).
f (x, y, t) f (x, y, t – 1)
FIGURE 13.7 Correlation-based approach to optical flow determination.
fxy fxy exy
kkk
,
ˆ
,,.
()
=
()
+
()
© 2000 by CRC Press LLC
A set of images e
k
(x, y), k = 0, 1, º, K – 1 and f
K
(x, y) forms a Laplacian pyramid.
Figure 13.8(d) displays a Laplacian pyramid with K = 5. It can be shown that Laplacian pyramids
provide an efficient way for image coding (Burt and Adelson, 1983). A more-detailed description
of Gaussian and Laplacian pyramids can be found in Burt (1984) and Lim (1990).
13.3.2 THE SINGH METHOD
Singh (1991, 1992) presented a unified point of view on optical flow computation. He classified
the information available in image sequences for optical flow determination into two categories:
conservation information and neighborhood information. Conservation information is the informa-
tion assumed to be conserved from one image frame to the next in flow estimation. Intensity is an
example of conservation information, which is used most frequently in flow computation. Clearly,
the brightness invariance constraint in the Horn and Schunck method is another way to state this
type of conservation. Some functions of intensity may be used as conservation information as well.
FIGURE 13.8 Laplacian pyramid (level k in a Gaussian pyramid). (a) Two consecutive levels in a pyramid
structure. (b) Derivation of level k + 1 from level K. (c) Derivation of error at level k in a Laplacian pyramid.
(d) Structure of Laplacian pyramid.

Không có nhận xét nào:

Đăng nhận xét