published in journal of Visual Communication and Image Representation
A Method for Sparse Disparity Densification
using Voting Mask Propagation
Jarno Ralli
1
, Javier D
´
ıaz
1
, and Eduardo Ros
1
jarno@ralli.fi, jdiaz@atc.ugr.es, eros@atc.ugr.es
1
Departamento de Arquitectura y Tecnolog
´
ıa de Computadores
Abstract
We describe a novel method for propagating disparity values using directional
masks and a voting scheme. The driving force of the propagation direction is image
gradient, making the process anisotropic, whilst ambiguities between propagated
values are resolved using a voting scheme. This kind of anisotropic densification
process achieves significant density enhancement at a very low error cost: in some
cases erroneous disparities are voted out, resulting not only in a denser but also a
more accurate final disparity map. Due to the simplicity of the method it is suitable
for embedded implementation and can also be included as part of a system-on-chip
(SOC). Therefore it can be of great interest to the sector of the machine vision
community that deals with embedded and/or real-time applications.
1 Introduction
Disparity is originally defined as being the horizontal difference of a 3D point being
projected on two adjacent imaging devices (e.g. stereo-rig) and if both intrinsic- and
extrinsic parameters of the imaging system are known, complete 3D-reconstruction of
the scene is possible. However, even if we do not know all the necessary parameters to
do a complete 3D-reconstruction, disparity still conveys relative information of the 3D
structures of the scene which can also be useful.
Disparity extraction models are based on local or global optimization methods that
minimize (or maximize) matching cost of image features between two or more images.
Practically all the sparse models have some kind of threshold or other parameter, ei-
ther implicit or explicit, which affects density and at the same time error in the derived
disparity map [18][3]. On the other hand the global methods that minimize energy
functions within the whole scene, through local operations, usually derive a dispar-
ity map that is typically 100% dense (sometimes detecting occlusions as well). Such
global minimization can be done using different approaches such as variational meth-
ods [7][1][2][4][6] or graph cuts [9][10]. Typically, depending upon the sparse method
Escuela T
´
ecnica Superior de Ingenier
´
ıa Informatica y de Telecomunicac
´
ıon, Universidad de Granada,
Calle Periodista Daniel Saucedo Aranda s/n, E-18071 Granada, Spain
www.jarnoralli.fi 1 Author: Jarno Ralli
used, as density increases, after a certain limit error also starts to increase concomi-
tantly. Therefore, it is worth calculating a less dense, high-confidence disparity map
and afterwards increasing the density by propagating the correct disparity values. In
this way we achieve better accuracy vs density trade-off than by directly reducing the
reliability threshold and thus increasing density at the expense of higher error. Many
global, dense, disparity calculation methods have built-in mechanisms for propagat-
ing/diffusing disparity [1][17] but sparse methods usually lack this capacity. To the
best of our knowledge there are very few independent propagation methods, apart from
interpolation [14] and diffusion [22], that can be applied as a post-processing step. By
independent we mean that the propagation method does not depend upon the algorithm
used to derive the initial disparity map. This work proposes a new densification method
that is able to arrive at denser and more accurate results than the standard one-stage dis-
parity algorithms such as dynamic programming, block- matching and so on that only
slightly affects the error rate. Since the scheme is based on very simple operations,
it can be considered suitable for efficient implementation. Our method resembles im-
age driven anisotropic diffusion, used for instance by Alvarez et al. [1] in variational
disparity calculation, in the sense that the propagation direction is based on the image
gradient. Instead of using a set of equations for defining the diffusion model as it is
done in [1] our approach (VMP) uses a bank of predefined masks and a voting process
to define the local interactions driving the diffusion process.
2 Method
The first step is to calculate a sparse, high-confidence, stereo map. Many feature-based
disparity calculation methods match edges present in stereo-images, since these can be
considered relatively robust features [8][11]. For the reasons set out in the Introduc-
tion we use here a simplified dynamic programming technique based on image edges
[14]. The rational for using dynamic programming is that it has been shown to be both
computationally efficient [12] and capable of producing highly accurate results [3].
Nevertheless, as mentioned earlier, our densification approach does not depend upon
the method used for producing the initial sparse disparity map. The second step is to
apply voting mask propagation (VMP) for propagating disparity in the direction where
estimations are expected to be similar and to use voting for resolving ambiguities. Lo-
cal support of the voting process is based on directional masks: for each image position
for which disparity is known a mask from a pre-determined bank is chosen, depend-
ing upon the underlying image structure (gradient). The properties of the chosen filter
define how many votes each of the neighborhood positions will receive.
2.1 Propagation direction
Since without further image analysis we cannot be sure of which object an image pixel
for which the disparity is known belongs to, and since we assume that inside objects
disparity changes gradually, image gradient is used as a driving force of the propagation
direction. We assume that two different objects will almost certainly have two different
disparity levels. By propagating in a direction perpendicular to the image gradient we
reduce errors since different objects have varying disparities divided by an edge. This
assumption of local maximum gradient separating different objects is also the basis of
anisotropic diffusion, where diffusion direction is driven by the gradient [22][13]. In
this work we concentrate on the case where the disparities for the edges are known and
2
the disparities are propagated in an edge-wise direction. There is, however, no reason
why the disparities not residing at the edges could not be propagated as well. The
tangent-to-edge direction is approximated by calculating image gradient.
2.2 Bank of masks
A bank of masks is designed using a 2D multivariate Gaussian distribution which is
rotated in order to generate masks corresponding to different propagation directions.
The basic mask, corresponding to orientation 0
(i.e. the horizontal axis), is calculated
as per (1).
z = G(i, j, µ
i
= µ
j
= 0, Σ), i = N .. . N, j = P . . . P (1)
where G(·) denotes multivariate Gaussian, (i, j) are the coordinates of the mask, (µ
i
=
µ
j
= 0) are the mean, Σ is the covariance, z is the number of votes that each position
receives and N and P define the mask size. Fig. 1 shows a voting mask corresponding
to several different orientations (rotations).
Figure 1: A bank of 7x7 voting masks corresponding to different orientations. Intensity
codifies the number of votes each position receives.
The use of Gaussian distribution is motivated by the fact that it reflects the proba-
bilistic nature of our approach: the underlying image structure drives the propagation
direction and thus reflects our belief on how the disparity is distributed. Other au-
thors have used similar approaches for image denoising [5]. Furthermore Gaussian
multivariate distribution allows a smooth transition from isotropic to anisotropic cases,
depending on the certainty of the image structure, which can be used in more elabo-
rated schemes by futher analyzing the image. Besides a Gaussian distribution can be
implemented as a separable convolution thus making it efficient computationally.
2.3 Choosing the mask
Once the orientations of the edge tangents have been approximated, propagation is
carried out for each disparity value using the mask whose orientation best matches the
tangent of the edge. The most closely corresponding mask’s centre is placed on top of
the disparity value of interest and each pixel within mask size receives as many votes
for the disparity value as defined by the mask. This is shown in the (2).
V
x+i, y+ j
(D
x, y
) = G
x, y,
(x + i, y + j), i = N .. . N, j = P . . . P (2)
3
where V
x, y
indicates votes received by position (x, y) for disparity, D, G
x, y,
(x + i, y +
j) denotes how the Gaussian voting mask, chosen as per gradient , with a size of
(2N + 1, 2P + 1) , placed at (x, y) votes for each mask position. As a final step, after
the disparities have been propagated for each of the original disparity values, each pixel
position assumes the disparity that receives most votes, as defined in (3).
D
x, y
= MAX
V
(V
x, y
,D
x, y
)
(3)
where D
x, y
indicates the final chosen disparity value for a position (x, y) and MAX
V
(V, D)
returns the disparity value that has received the most votes for a set of vote-disparity
tuples (V, D). Due to the spatial support of the voting mechanism erroneous values are
in certain cases effectively voted out: if within a certain neighborhood there are more
correct values than erroneous values, the erroneous values receive fewer votes and are
discarded. Interestingly enough this effect allows the densification process to arrive at
a denser but at the same time more accurate disparity map than the original.
2.4 Pseudocode
For the sake of clarity, below we have included a pseudocode of the propagation pro-
cess.
Algorithm 1 Pseudocode Describing the Voting Process.
INPUT: I (image driving the voting process)
INPUT: D (disparity map)
INPUT: M (bank of masks with mask size (N, P))
STORAGE: V (a matrix for storing the votes)
//OBTAIN COORDINATES FOR DISPARITIES
(x, y) = coords( notEmpty(D) )
for i = 1 numel(x) do
//CHOOSE THE CLOSEST MASK TO GRADIENT NORMAL
I = calculateImageGradient( I( x(i), y(i) ) )
mask = chooseMask( M, I )
//VOTE USING THE CHOSEN MASK
V = vote( x(i), y(i), mask, D( x(i), y(i)), N, P )
end for
(x, y) = coords( notEmpty(V) )
//CHOOSE WINNERS
for i = 1 numel(x) do
Out( x(i), y(i) ) = MAX( V( x(i), y(i) ) )
end for
The actual voting process can be seen as superimposition of the chosen mask on
top of the disparity value to be propagated where each neighboring pixel (defined by
the mask) receives as many votes for the disparity as defined by the weight of the mask
as each position. For each pixel position the number of votes for each disparity needs
to be stored so that a winner can be chosen accordingly.
4
3 Experiments and Results
We benchmarked the method using well known stereo-images from the Middlebury
database
1
. In order to study how the VMP densification process behaves when dealing
with different initial densities and/or errors, we have used two different methods for
generating the initial disparity maps provided to the VMP model. The different meth-
ods used were dynamic programming (DP) [14] and a phase-based method [18][16].
Furthermore we have used different thresholds and interleave factors for DP in order
to generate initial disparity maps with different densities and errors. Computational
complexity of the our method was approximated by comparing it with execution times
of the DP method. We also introduce a sample application that clearly benefits from a
more dense disparity map as input. In the experiments, size of the propagation masks
was 7x7 pixels. Density is given in terms of a ratio between the number of pixels for
which disparity has been defined and the total number of pixels in the image. Overall
accuracy is measured as the percentage of correct pixels (±1 disparity level) calculated
against the ground-truth values.
3.1 Results
Fig. 2 shows the original stereo-pair images and results calculated directly using DP
and densified by VMP.
1
http://vision.middlebury.edu/stereo/
5
tsukuba C:74.6% D:15.9% C:74.1% D:55.9%
venus C:90.1% D:12.6% C:90.4% D:47.2%
cones C:83.8% D:17.8% C:82.7% D:67.9%
teddy C:77.2% D:12% C:77.5% D:50.3%
Figure 2: Results for the test images: the left-hand column contains left-hand images
of the original stereo-pairs, the middle column shows disparity maps calculated by
dynamic programming and the right-hand column disparity maps densified using VMP.
C denotes the percentage of correct disparities (±1 disparity level) and D, density.
Fig. 3 demonstrates the results for four different initial maps with different densi-
ties. The initial maps are calculated using different thresholds for occlusion detection
with the effect of increasing density at the expense of accuracy. Thus it can be observed
that after certain reasonable limit, in order to obtain even more dense map, the error
starts to increase. In such a case it is better to calculate more reliable initial map and
then densify.
6
(a) Tsukuba&Venus results
(b) Cones&Teddy results
Figure 3: Densification results for several initial densities obtained using different
thresholds for dynamic programming. DP refers to results calculated by directly us-
ing dynamic programming and VMP refers to results densified from corresponding DP
results using voting mask propagation. (a) TS refers to Tsukuba and V refers to Venus
images (b) C refers to Cones and TE refers to Teddy images
Fig. 4 shows the results for the Venus case only. It can be clearly seen that as the
cost for occlusions gets higher (threshold from one to four) density of the resulting
disparity map increases slightly at the expense of accuracy. On the other hand as the
error increases the voting mechanism starts to vote out some of the erroneous values
thus increasing the accuracy.
7