Leading approaches to computer vision (SLAM: simultaneous localization and mapping), animal navigation (cognitive maps), and human vision (optimal cue integration), start from the assumption that the aim of 3D vision is to produce a metric reconstruction of the environment. Recent advances in machine learning, single-cell recording in animals, virtual reality, and visuomotor control, all challenge this assumption. The purpose of this meeting is to bring these different disciplines together to formulate an alternative approach to 3D vision.

Free to attend online; register here or for more information follow the link above.

Among a stellar line-up of speakers is our own Andrew Glennerster whose talk will be on Monday 1-Nov at 5:10pm:

Understanding 3D vision as a policy network

Professor Andrew Glennerster, University of Reading, UK

Abstract

A ‘policy network’ is a term used in reinforcement learning to describe the set of actions that are generated in different states, where the ‘state’ reflects both the current sensory stimulus and goal. This is not the typical foundation for describing 3D vision, which in computer vision is based on reconstruction (eg Simultaneous Localisation And Mapping) while in neuroscience the predominant hypothesis has assumed that there are 3D transformations between retino-centric, ego-centric and world-based reference frames. Theoretical and experimental evidence in support of this neural hypothesis is lacking. Professor Glennerster will describe instead an approach that avoids 3D coordinate frames. A policy network for saccades (pure rotations of the camera/eye around the optic array) is a starting point for understanding (i) an ego-centric representation of visual direction, distance, slant and depth relief (what Marr hoped to achieve with his 2½-D sketch) and (ii) a hierarchical, compositional representation for navigation. We have known for a long time how the brain can implement a policy network so, if we could describe 3D vision in terms of a policy network (where the actions are either saccades or head translations), we would have moved closer to a neurally plausible model of 3D vision.