Depth as a latent variable
-
1
University of Frankfurt, Germany
Estimation of binocular disparity in the brain is widely assumed to be based on the comparison of local phase information from binocular receptive fields. The classic binocular energy model shows that this requires the presence of receptive fields in the form of local quadrature pairs within the eye and with phase- or position-shifts across the eyes. While numerous computational accounts of stereopsis have been based on these observations, there has been little work on how energy-models can emerge through learning from the statistics of image pairs.
We describe a way to cast binocular disparity estimation as a probabilistic model, and we show how learning on data-bases of multi-camera views of a scene leads to position-shifted Gabor filters. The model agrees with the classical binocular energy and cross-correlation models in that it learns shifted quadrature pairs, while also allowing for more flexible connectivity that yields richer dependencies between the learned filters. Learning on binocular data with known ground-truth disparities furthermore makes it possible to train a network to perform depth estimation entirely based on training data.
In contrast to standard (monocular) feature learning, acquiring training data for binocular feature learning is more challenging as it requires the generation of image pairs in accordance with a given camera setup. We use a methodology for generating image pairs and corresponding ground truth disparities based on recent evidence that due to fixations and vergence, biological receptive fields are confronted mainly with locally smooth surfaces. To generate the training data, we first generate depth maps as slanted planes in 3D. We then generate texture maps using patches cropped from natural images of the same size as the depth maps. Finally, we project the 3D scene onto a set of cameras defined by their camera matrices.
We demonstrate how the presence of ground truth disparities makes it possible to learn depth estimation, by adding a pooling-layer that is trained to predict the ground-truth disparities from the inferred binocular encoding. We demonstrate this purely learning-based depth estimation scheme on random-dot stereograms.
Keywords:
neural coding and decoding,
stereopsis
Conference:
BC11 : Computational Neuroscience & Neurotechnology Bernstein Conference & Neurex Annual Meeting 2011, Freiburg, Germany, 4 Oct - 6 Oct, 2011.
Presentation Type:
Poster
Topic:
neural encoding and decoding (please use "neural coding and decoding" as keyword)
Citation:
Conrad
C and
Memisevic
R
(2011). Depth as a latent variable.
Front. Comput. Neurosci.
Conference Abstract:
BC11 : Computational Neuroscience & Neurotechnology Bernstein Conference & Neurex Annual Meeting 2011.
doi: 10.3389/conf.fncom.2011.53.00179
Copyright:
The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers.
They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.
The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.
Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.
For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.
Received:
22 Aug 2011;
Published Online:
04 Oct 2011.
*
Correspondence:
Prof. Roland Memisevic, University of Frankfurt, Frankfurt, 60325, Germany, ro@cs.uni-frankfurt.de