Consultar ensayos de calidad


Selective Attention as an Optimal Computational Strategy



CHAPTER

4
Selective Attention as an Optimal Computational Strategy
ABSTRACT
We explore selective attention as a key conceptual inspiration from neurobiology that can motivate the design of information processing systems. In our framework, an attentional window, the “spotlight of attention,” contains some reduced set of data from the environment, which is then made available to higherorder processes for planning, real-time responses, and learning. This architecture is invaluable for systems with limited computational resources. Our test bed for these ideas is the control of an articulated arm. We implemented a system that learns while behaving, guided by the attention-based content of what the higher-order logic is currently engaged in. In the early stages of learning, the higher-order computational centers are involved in every aspect of the arm’s motion. The attentionally assisted learning gradually assumes responsibility for the arm’s behavior at various levels (motor control, gestures, spatial, logical), freeing the resource-limited higher-order centers to spend more time problem solving. remarkable fact—documented throughout the book— that only a very small fraction of the incoming sensory information is accessible, in a conscious or unconscious manner, to influence behavior. Many people have speculated about consciousness and its function. According to Crick and Koch (1988; Koch, 2004), the function of conscious visual awareness in biologicalsystems is to “[p]roduce the best current interpretation of the visual scene in the light of past experience, either of ourselves or of our ancestors (embodied in our genes), and to make this interpretation directly available, for a sufï¬cient time, to the parts of the brain that contemplate and plan voluntary motor output, of one sort or another, including speech.” This representation consists of a reductive transformation of the massive, real-time sensory input data. That is, the content of awareness corresponds to the state of cache memory that holds a compact version of relevant sensory data as well as recalled items. This strategy can deal with more complex scenarios and generate a strategy for action (Newman et al., 1997). This flexible, but slow, aspect of the system, is complemented by a set of very rapid and highly specialized sensorimotor modules (D. Psaltis, personal communication, 1995), “zombie agents” (Koch, 2004), that perform highly stereotyped actions (e.g., driving a car, moving the eyes, walking, running, grasping objects). Figure 4.1 illustrates one way in which these cognitive strategies may be mapped onto a machine architecture (Billock, 2001). The sections of the diagram toward the bottom—the motor/processing modules, early processing, and error generation—reside below the level of awareness, with fast reflexes and extensive procedural memories. Selective attention and aware-



I. THE ATTENTION–AWARENESS MODEL: AN INTRODUCTION
Computers and software have recently joined the long line of humantools inspired by biology. In this case, it is the phenomenal capabilities of biological nervous systems that intrigue and challenge us. Our desire to mimic the brain stems from the abilities it possesses, which are in so many cases superior to those we can implement today. We here explore the extent to which attentional selection can convey functional advantages to digital machines. By attentional selection we refer to the

Neurobiology of Attention

18

Copyright 2005, Elsevier, Inc. All rights reserved.


II. LEARNING MOTION WITH AN ARTICULATED ARM

19

Logic

Declarative Memory

Awareness

Online Systems

Attention

“Zombie’’/Online System

Early Processing Processing Modules Environment

Error Signal Generation

FIGURE 4.1 In this functional model of the role of attention and awareness, the pathway incorporating the attentional bottleneck operates in parallel with the faster sensorimotor agents (zombie systems), taking their cues from the “error signals” generated by the zombie systems.

ness are the gateways that provide preprocessed sensory data to the higher, more resource-constrained parts of the brain (the logic and planning cortices and memory). Of course, in reality many additional interconnections exist between these components. To maintain a coherent course of action, the system must be capable of alternating between volitional, top-down and reflex-level, bottom-up control. What we would like to abstract from the biological functions of attention and awareness is amachine that can aid in performing similar tasks. We explore how, for sufï¬ciently complex environments, using a reduced representation of the environment allows an algorithm to perform better when under time pressure, compared with an approach in which the entire input is represented. We would also like to understand better what advantages implementing such a bottleneck has for memory and machine learning.



II. LEARNING MOTION WITH AN ARTICULATED ARM
Machine
learning is one area where we expect algorithms inspired by attentional selection strategy to outperform conventional ones. There are several ways in which attention might facilitate learning. One is during learning; if shown a single image of a car embedded in a dense background ï¬lled with other objects, the learning algorithm does not know which features belong to the object of relevance (here the car) and which ones are incidental. If attention would segment the car from the rest of the scene, however, superior performance can be obtained. This is particu-

larly relevant to one-shot learning algorithms. The same is true during the recognition phase. Detecting the same car, say, under a different viewpoint, in a novel scene is much facilitated if an attentional selection strategy can segment the car from the background and just forward its associated features to the recognition module (see Rutishauser et al., 2004, for an illustration of this strategy). Of course, segmentation also helps in reducing the amount of data that must be memorized,thus improving learning speed. Picking the right information to he learned and ignoring the rest is probably one of the key functions of attentional selection. Indeed, the resultant bottleneck appears to be necessary for the utilization of some kinds of memory (Naveh-Benjamin and Guez, 2000). The test bed we use for exploring attentional learning is the control of a segmented arm moving around in a boxlike environment. It can pick up, move, and drop disks. At the most abstract level, the arm is used to solve various kinds of puzzles. The problem we explored was one of ordering various objects into target locations. This is equivalent to the Tower of Hanoi problem (Claus, 1884). In our version of this problem (see Fig. 4.2), we begin with an allotment of disks of various diameters. We assume that they have holes in their middle, that these disks are stacked in order of decreasing size (i.e., a larger disk must always be below a smaller one), and that the segmented arm can transfer the disks from one target stack to another one. The arm moves around the board and physically takes the top disk from each target and moves it to another stack, with the end goal of placing them in increasing size on a speciï¬c goal target. Various obstacles are placed on the board through which the arm cannot pass. The arm’s segments can overlap as it moves. We assume that the end effector, when placed over a target, takes or releases a single disk automatically. Our problem, then, is to manipulate the joints of the arm to move its endeffector between the appropriate targets in the correct order so as to solve the puzzle. The details of the articulated arm, the playing board, and targets are shown in Fig. 4.2. For our purposes, we give the arm segments minimal dynamics involving a maximum torque and a momentum/friction decay characteristic. These force relationships are solved by the logic subsystem using a set of torquechange equations similar to those described by (Uno et al., 1989) for modeling human limb control. Initially, the system has not yet learned to drive its joints, and so must use its logic/planning functions to solve the control problem via explicit equations. The threesegment arm has a complicated inverse kinematics

SECTION I. FOUNDATIONS


20
2

CHAPTER 4. SELECTIVE ATTENTION AS AN OPTIMAL COMPUTATIONAL STRATEGY

1.5

2
1
F2
3 4

F3

l 4

q3

l

q2
0.5
F1

l
1

0

3
-0.5

-1

1
-1.5

-2 -2

-1

0

1

2

Baskets

Obstacles

Reachable Area

FIGURE 4.2 The Tower of Hanoi problem arranged for solution
by an articulated arm. The arm must move between the marked targets without colliding with the solid obstacles. Outlined squares are the positions of the targets. Solid circles indicate obstacles around which the arm must navigate.

which requires an expensive optimization process to ï¬nd the best trajectory to move from a present position to a target position. The minimum torque-change model selects a conï¬guration out of the possible solutions that requires theminimum angular change of the arm segments to achieve. This is a very costly step in terms of required computational power, and so at ï¬rst the attention of the logic unit is taken up by this low-level function. As it does so, the reduced representation of the environment that the logic uses in solving the problem is presented to the “zombie” system for learning. Arm kinematics are learned by a neural network doing a straightforward function ï¬t to the force curve necessary to move the arm from one angle to another. We use a three-layer neural network with four units in the hidden layer. The units use a hyperbolic tangent activation function and are fully interconnected. The input parameters are the current distance from the computed goal angle and the angular velocity of the arm segment. The output is the torque to be placed on the segment joint. The motion learning network starts its training once a sufï¬cient number of samples (about 40 trajectories) have been collected so that it does not fall into a shallow local minimum and fail to learn the motor curves. We then use the Levenberg–Marquardt algorithm (Marquardt, 1963), which

learns rapidly over the next two dozen or so trajectories, after which it has trained enough to substantially take over arm kinematics from the logic unit. During training, the control of the arm is shared by the network and the logic. This sharing is adjusted based on the current training error level of the network multiplied by a quality parameter that increases with the amount oftraining data. When the network is controlling arm motion, and fails to steer the arm to the required target, an error signal triggers the attentional mechanism, and the logic takes over and computes the kinematics, thus producing more training data which improves the network’s performance as it takes over more and more the control of the arm. This behaviorally guided learning approach produces excellent generalization: a training error of 3.46% (relative to the force given by the inverse kinematics solver) produced a test set error over the whole input ï¬eld of 0.17% (both taken in the least-squares sense). For actual trajectories the test set error is still very low but is higher (around that of the training set error). This is because for actual trajectories, points tend to be more clustered in the areas of the input ï¬eld where performance is critical, such as when the arm segment is close to its target location. As basic kinematics are learned, the logic/planning system spends more time planning movements of the arm from one joint conï¬guration to another. These motion plans are called gestures, such as “going around an obstacle clockwise.” As a memory system for the gestural level, we use an ART-like neural network (Carpenter and Grossberg, 1987). This is an unsupervised learning model that autoclusters the trajectory data the logic presents to it during actual problem solving, and learns models for those clusters which can then be introduced into the control loop and largely replace the logic in computinggesture trajectories. Each ART unit is associated with a linear neural network, which it uses to model the gesture parameters of the data with which it is associated. These linear networks have six inputs (a present and goal angle for each of the three arm segments) and six hidden units. Once a unit has more than three data points, it begins to train its associated neural network to model the data. This training also uses the Levenberg–Marquardt algorithm. If a new data point ruins the ability of the existing network to model the data well, an error signal is generated. The data point is then rejected and forms a new ART unit of its own (where it will compete with the existing units). If the new data point can be learned well, which usually happens, then it is incorporated into a new estimate of the mean and covariance of the unit’s resonance region. The outputs of the neural network are relative coordinates for the

SECTION I. FOUNDATIONS


III. ROLE OF THE ATTENTION-AWARENESS MODEL IN LEARNING

21

segments of the arm to steer toward in completing the gesture.
These coordinates then directly drive the kinematic level for controlling the arm joints to move the arm to that conï¬guration. Once an ART unit has more than six data points, it is allowed to begin to respond to the environment itself, and if the current arm parameters are within one standard deviation from a unit’s center point in its input space, then that unit will be chosen to control the choice of the next trajectory path. Each unit,then, corresponds to a “gesture” that the system has learned. As the system solves puzzles, control shifts to the ART network. When no ART unit is found for the present environment, an error signal alerts the logic to calculate a new gesture trajectory. Training and operation overlap: if resonance occurs with one of the existing units, and that unit has sufï¬ciently good performance, it is used to construct the next trajectory. If resonance occurs, but that unit fails to drive the arm successfully, an error signal causes the logic unit to compute the gesture, thus training the network. If no resonance occurs (meaning that no unit is responsible for dealing with the current state), then a new unit is created. The consistency of the gestures permits the networks associated with the ART units to usually achieve least-squares training set errors below 10-3, and frequently converge to the training threshold of 5 ¥ 10-5 without signiï¬cant overtraining. These errors are given in radians, which are target angles relative to current positions learned by the ART unit networks, and correspond to less than a tenth of a degree. On the other hand, similar gestures may not be repeated by every movement from one target to another, so it may take a while (in puzzle-solution time) for the actual arm behavior to lead to the accumulation of enough training examples for a particular unit. During solution of the ï¬rst few puzzles, that is, sets of distributed targets, the logic spends a lot of the solution time (around 60%) planninggestures. After that, however, the zombie system begins to learn commonly repeated gestures and takes over the gesture planning. This reduces the total time spent in problem solution and also dramatically reduces the amount of time the logic spends “attending to” gestures to less than 10%. The spatial sequence to follow when moving from one target to another is memorized using “declarative memory.” These are basically memorized series of gestures: “to go from target 1 to target 3, ï¬rst go clockwise around obstacle 2, then counterclockwise around obstacle 6, then straight on to target 3.” These directions are learned as the logic solves the puzzle and continue to evolve as play proceeds. If this memory control fails, an error signal causes the logic unit to send the arm back to the originating target and

try again by generating a new sequence of gestures itself. These will then replace the original sequence in memory. As it takes only one example for this memory to be useful, the declarative memory comes into play quite quickly. On the other hand, this scheme does not generalize well, except to different arrangements of the items to be sorted on the same playing board (meaning the targets and obstacles are in the same position, and only the initial arrangement of the disks to be sorted has changed).

III. ROLE OF THE ATTENTION–AWARENESS MODEL IN LEARNING
We now return to a discussion of the operation of the system in terms of the blocks of Fig. 4.1. The angular position and velocities of the arm segmentsand the positions of obstacles, targets, and disks form the environment. In a real-life situation a vision system might be used to extract these variables from the raw sensory data. In this environment, the job of the controller is to move the arm in the best way to solve the puzzle. The solution of the puzzle at the abstract level (i.e., which move to perform next with a view to solving the problem) always remains the province of the logic. The different kinds of memory in the “zombie” system correspond to some varieties of memory humans employ to solve different problems. The procedural memory learns from examples to reproduce forces on the arm segments to cause desired motions. The unsupervised ART memory learns from examples of common gestures to take over motion planning. The declarative memory stores sequences of these gestures as spatial “directions” of how to move from one target to another. The selective attention mechanism facilitates the learning process. As the system becomes more trained, the “zombie” gradually takes over control of the arm from the logic. This happens independently at the various levels as they become “reflexive” from the point of view of the logic, which then only “attends” to that level following an error signal. It spends more of its time on other parts of the problem, which then train other parts of memory. The problem of controlling an articulated arm to solve a puzzle is one in which neither pure logic nor traditional machine learning is very good. When the logic has toplan out in detail all the motions of the arm, it can take a very long time to solve the problem. The conventional learning problem is intractable. For even quite small problems, there are dozens of dimen-

SECTION
I.
FOUNDATIONS


22

CHAPTER 4. SELECTIVE ATTENTION AS AN OPTIMAL COMPUTATIONAL STRATEGY

sions in the learning problem, generating error signals is hard, and learning is very slow, if it would work at all. This illustrates the role of the reduced representation of the “awareness window” in interfacing to different kinds of memory. In the arm example, the awareness window contains the single task that the zombie system failed to execute within acceptable error bounds and needs to be currently completed by the logic. The data in the awareness window continuously become the source of training examples for the zombie part of the system. In this way, the awareness mechanism splits up the problem into manageable “chunks.” For declarative memory, the bottleneck reduces the amount of information necessary for it to learn useful patterns. For procedural memory, whether supervised or unsupervised, it assists by pruning out the information in the environment that is less relevant, reducing the dimensionality of the resulting patterns and speeding up learning. The zombie systems greatly improve the speed of overall puzzle solution: ï¬guring out arm ballistics and computing nearoptimal gestures are hard problems, and when these responses have been learned, solution times drop by an order of magnitude andmore. Figure 4.3 shows the fraction of time the system spends in the logic/planning subsystem and the training and recall of various memory subsystems. During solution of the ï¬rst puzzle, the system spends almost 100% of its resources training the gesture and direction memories and only a few percent training the movement memory. However, as this is not so resource intensive, training is rather rapid once sufï¬cient data are collected. As the system operates the fraction of time it spends training the gestures and direction sequence memories decreases, and more time is spent executing logic/planning overhead tasks (considering where to play next and so on). By the time the system is solving the fourth puzzle, resource utilization is taken up mostly by the logic/planning subsystem. Total solution time has dropped to below 20% of what was required for the ï¬rst problem, and so responding to interrupts accounts for almost all of the time spent in this phase.

FIGURE 4.3 Fractional time spent by the system dealing with the various subsystems (movement, gesture learning and recall, directions learning and recall, and the overhead of operating the logic). The time shown is a running average over 15 moves.

IV. CONCLUSIONS
Learning with the help of attentional selection that assists a logic/planning unit in training a hierarchical memory has several beneï¬ts. First, it dramatically reduces the dimensionality of the input space. The sorting problem as described has some 30 dimensions, with dependencies that areill-suited to learning by a

traditional neural network. By segmenting the process and learning the reduced representations as used by the logic/planning subsystem to solve the problem at different levels, it becomes possible to present tractable problems to the learning algorithm. Second, the hierarchical organization of skills enables those learned the fastest to be assumed during the learning of higher-level skills. Third, the attentional mechanism as employed allows for cooperation between the memory subsystems and the logic/planning subsystems. When the faster network-based subsystems can respond, they do so. It is only when they make errors that the logic subsystem is aroused and spends time correcting the error. The problems for which this approach is well-suited must satisfy at least two properties. First, they must be amenable to partial solutions. That is, an approximate solution must be initially acceptable. They cannot be of the sort where only a perfect solution is permissible. Second, they must exhibit the property that as additional information is gathered about the problem, that information becomes less and less important to the solution. The common (although not universal) occurrence of these properties supports the argument that there is a wide class of problems, including many found in nature, whose solution is assisted by the kind of attentional selection architecture described above.

References
Billock, J. G. (2001).
“Attentional Control of Complex Systems.” Ph.D. thesis, CaliforniaInstitute of Technology. Carpenter, G. A., and Grossberg, S. A. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision Graphics Image Process. 37, 54–115.

SECTION I. FOUNDATIONS


IV.
CONCLUSIONS

23

Claus, N. (1884).
La Tour d’Hanoi: Jeu de Calcul. Sci. Nat. 1, 127–128. Crick, F., and Koch, C. (1998). Consciousness and neuroscience. Cereb. Cortex 8, 97–107. Garey, M. R., and Johnson, D. (1979). “Computers and Intractability: A Guide to the Theory of NP-Completeness.” Freeman, San Francisco. Koch, C. (2004). “The Quest for Consciousness: A Neurobiological Approach.” Roberts, Denver, CO. Marquardt, D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appi. Math. 11, 431–441. Naveh-Benjamin, M., and Guez, J. (2000). Effects of divided attention on encoding and retrieval processes: assessment of atten-

tional costs and a componential analysis. J. Exp. Psychol. Learn. Memory Cogn. 26, 1461–1482. Newman, J., Baars, B. J., and Cho, S.-B. (1997). A neural global workspace model for conscious attention. Neural Netw. 10, 1195–1206. Rutishauser, U., Walther, D., Koch, C., and Perona, P. (2004). Is attention useful for object recognition? IEEE Int. Conf. Computer Vision Pattern Recog., in press. Uno, Y., Kawato, M., and Suzuki, R. (1989). Formation and control of optimal trajectory in human multijoint arm movement: minimum torque-change model. Biol. Cybernet. 61, 89–101.

SECTION I. FOUNDATIONS


Política de privacidad