Event Abstract

From field of view to field of reach - could pointing emerge from the development of grasping?

  • 1 Humboldt-Universität zu Berlin, Institut für Informatik, Germany

Pointing behaviour is an important skill that develops during early infancy. It starts with imperative pointing around the age of 9 months [1]. This form of pointing is exhibited by the child regardless of whether an adult is present in the room or is actually looking at the child. This form of pointing turns into declarative pointing around the age of 12 months. It is used to draw somebody else’s attention towards an object of interest. Pointing behaviour and pointing recognition is one of the main ways of attention detection and manipulation in humans, and an important prerequisite for joint attention. Other prerequisites for joint attention are social interaction and intentional understanding ([2], [3]). Since imperative pointing is not directly used to draw attention, it is possible that it arose from grasping objects within the reach of the child and turned into pointing for objects that were outside the field of grasp. It is interesting to note that a pointing gesture by someone else is not understood by the child until about the age of 18 months [4]. Some studies showed that there is no relation between producing pointing gestures and understanding them [5]. This hints to the conclusion that the two skills develop independently from each other and strengthens the hypothesis that pointing may arise from grasping and is not learned by imitation.
In order to test this hypothesis, we set up a robotic experiment where a humanoid robot learns to reach objects by building a body map during random body babbling [6]. Body babbling observed in infants has been classified by Meltzoff and Moore [7] as a mechanism that provides experience for mapping movements to the resulting body configurations. In [6], we implemented learning through self-exploration on a humanoid platform whose dimensions resemble those of a child, actually simulating the real visual input perceived by a young human subject. We used such a technique for learning the mapping between different sensory modalities and for equipping the robot with predicting abilities of sensory consequences (the position of the hand of the robot) from control commands applied to its neck and its arm [8].
In a second phase, we equipped the robot with prediction abilities of arm movement commands that allowed for and resulted in pointing towards an object presented outside the reach of the robot.

The reaching experiment described above is implemented on the humanoid robot Nao from Aldebaran. We use 4 DOF in the arm, 2 DOF in the neck, together with the camera input. The object is tagged with AR markers for simplicity. During the learning phase, the marker is attached to the hand of the robot and a mapping between the point in joint space and the marker position in the image (2D case) or in the world (3D, egocentric) is performed. The robot performs random arm movements and tries to estimate the position of its end-effector (the hand), analysing the frames grabbed from its head camera. We equipped the robot with an elementary attentive system for perceiving parts of its own body and for moving its head to focus on them. During babbling, each time a marker has been detected, we stored the vector containing the marker position together with the neck’s and the arm’s joint configuration into a knowledge base. The knowledge base can be used for inferring data. In [8], a mapping between the proprioceptive data, represented by the 6D vector [neckJoints; armJoints], and the external data, represented by the (x; y) image coordinates of the marker placed on the hand of the robot, has been used to perform forward predictions: given a configuration of the neck and arm joints, infer where the position of the hand will be (here: the coordinates of the marker, if detected, in the image).
Given a query (neck and arm joints), we used a k-Nearest Neighbours algorithm to find the k closest vectors in the knowledge base. For each vector, the elements related to the marker position were extracted; the prediction of the outcome is computed as the mean of these values and a control command is then applied to each joint both of the neck and of the arm, as the mean of the respective elements of the k vectors.
During early exploration behaviour, babies bring objects close to the face and use different sensory modalities for discovering the novelty of the object (like touching, biting, looking at it). Proprioception seems to play a bigger role in hand-eye coordination for infants than for adults [9]. In such a developmental phase, it seems that distance is irrelevant as infant vision first needs to develop. We tried to reproduce this inability in our first experiment, where the robot learned the mapping using only the image position of the marker. In this phase, the robot is not able to generate any pointing gestures: an infinite number of arm and neck configurations could result in the same perceived marker position in the image (see Fig. 1). Depth perception seems to start around the age of 5 months in human infants. We simulated this new ability by the robot learning to use the estimated 3D marker positions with respect to the torso coordinate frame.
We used the Random Walk babbling strategy [8] for learning and, as in the previous experiment, the prediction was based on the k-NN algorithm with k=5. We performed learning both on the real robot and in simulation (using Webots, wherein the robot was equipped with a marker, as in the real experiment). In simulation, having the robot learn for 30 minutes resulted in 14068 collected points. Using the real robot, 20 minutes of learning resulted in 7531 points. Both knowledge bases have been used for generating pointing gestures using the algorithm described in the next section.

In the second phase, objects tagged with AR markers were presented outside the reach of the robot but within its field of view in a distance of up to 1 meter from the robot’s head. The experiment consisted of predicting and generating the joint configuration of the arm which corresponds to the shortest distance between the hand and the detected marker. No markers were placed on the hand of the robot. The configuration of the arm was generated taking from the knowledge base (collected during the previous learning phase) the 5 nearest neighbours to the estimated object position. In the case of the object being outside of reach, the hand position should be on the hull of a sphere representing the robot’s field of reach (Fig. 2). As expected in this case, the forward model tends to place the hand as close as possible to the object, resulting in fact in a pointing gesture (see Fig. 3).
“How can pointing emerge from grasping behavior (T2.3)?” has been one of the open issues in developmental psychology and developmental robotics identified by Kaplan and Hafner [3]. We presented here one robotics approach to identify the necessary sensory and informational prerequisites for realising this effect using body babbling together with a simple prediction model. In the experiments, the robot’s behaviour that was based on learning to reach for an object automatically resulted in pointing behaviour when the object was outside the reach of the robot.

Figure captions:
1) Different configurations can result in the same perceived marker position.
2) An illustration on how pointing gestures are generated. k-NNs are expected to be on the hull of the field of reach (here represented as an ellipsoid).
3) A sequence of pointing gestures. An AR marker is placed on the non-visible surface of the object held by the user. An interesting side effect of the body babbling is that the robot automatically follows with its gaze when trying to reach an object.

Figure 1
Figure 2
Figure 3


This work has been financed by the EU funded Initial Training Network (ITN) in the Marie-Curie People Programme (FP7) INTRO (INTeractive RObotics research network).


[1] S. Baron-Cohen, Mindblindness: an essay on autism and theory of mind. Boston, MA, USA: MIT Press, 1997.
[2] M. Tomasello, M. Carpenter, J. Call, T. Behne, and H. Moll, “Understanding and sharing intentions: the origins of cultural cognition,” Behavioral and Brain Sciences, vol. 28, pp. 675–691, 2005.
[3] F. Kaplan and V. Hafner, “The challenges of joint attention,” Interaction Studies, vol. 7, no. 2, 2006.
[4] G. Butterworth, “Origins of mind in perception and action,” in Joint attention: its origins and role in development, C. Moore and P. Dunham, Eds. Lawrence Erlbaum Associates, 1995, pp. 29–40.
[5] S. Desrochers, P. Morisette, and M. Ricard, “Two perspectives on pointing in infancy,” in Joint Attention: its origins and role in development, C. Moore and P. Dunham, Eds. Lawrence Erlbaum Associates, 1995, pp. 85–101.
[6] G. Schillaci and V. Hafner, “Random movement strategies in self-exploration for a humanoid robot,” in Proceedings of the 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI 2011), 2011, pp. 245–246.
[7] A. N. Meltzoff and M. K. Moore, “Explaining facial imitation: a theoretical model.” Early Development and Parenting, vol. 6, pp. 179–192, 1997.
[8] G. Schillaci and V. Hafner, “Prerequisites for intuitive interaction - on the example of humanoid motor babbling,” in Proceedings of the Workshop on The role of expectations in intuitive human-robot interaction (HRI 2011), 2011, pp. 23–27.
[9] M. E. McCarty, R. K. Clifton, D. H. Ashmead, P. Lee, and N. Goubet, “Biobehaviorial development, perception, and action: How infants use vision for grasping objects,” Child Development, vol. 72, pp. 973–987, 2001.

Keywords: body babbling, development and emergence of pointing skills, humanoid robot, joint attention, motor learning, reaching and grasping, sensorimotor coordination

Conference: IEEE ICDL-EPIROB 2011, Frankfurt, Germany, 24 Aug - 27 Aug, 2011.

Presentation Type: Poster Presentation

Topic: Development and emergence

Citation: Hafner VV and Schillaci G (2011). From field of view to field of reach - could pointing emerge from the development of grasping?. Front. Comput. Neurosci. Conference Abstract: IEEE ICDL-EPIROB 2011. doi: 10.3389/conf.fncom.2011.52.00017

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 09 Apr 2011; Published Online: 12 Jul 2011.

* Correspondence: Prof. Verena V Hafner, Humboldt-Universität zu Berlin, Institut für Informatik, Berlin, 10099, Germany, hafner@informatik.hu-berlin.de