It is important to distinguish between the orienting reaction described in sections 4.4 and 9.2 and the concept of spatial orientation. The orienting reaction directs the sensory apparatus of an animal toward a stimulus in the immediate environment, while spatial orientation refers to an ability to orient from one place to another. A possible way to avoid confusion would be to call the present chapter spatial navigation. Since spatial navigation usually refers to more advanced mechanisms than the ones which will be described here, we have chosen the first term, however.
The first kind is a set of primitive responses, that is, behavior modules which do not use feedback and, thus, cannot be goal-directed. The primitive response repertoire of the creature will generate behaviors such as move-ahead, turn-left, turn-right and so on. At least in theory, all behaviors can be composed of these primitive responses, but as we will see below, this is not a very efficient way to learn complex behaviors. We will also include systems for activities such as grooming within this set of behavior modules. The common denominator for these modules is that they do not use sensory input to control the generated behavior once it has started. In figure 8.2.1, an optional stimulus is shown with the response generation behavior module. This is intended to illustrate that a sign-stimulus can activate the module, but does not control the execution of the behavior.
Figure 8.2.1 Four types of innate behavior modules.
The behavior modules of the second type generate stimulus-approach behaviors. These modules have the internal architecture shown in figure 4.2.10. Each module is, thus, constructed for one specific stimulus. When no learning is used, it is necessary that the creature has one behavior module for each potential stimulus type in the environment.
The behavior modules of the third kind control place-approach behavior (see sections 2.7 and 7.3). These behavior modules must naturally include some learning ability since the stimulus situations at interesting places cannot be innately known. However, given the learning abilities we have presented above, the behavior module, as such, must be innate. The approach module presented in chapter 7 recognized one stimulus with a specific meaning. This stimulus played the role of an unconditioned stimulus, and could be approached on its own. The other stimuli were used to predict the location of this special stimulus. An approach system of this type includes both a place-approach and a stimulus-approach module.
Finally, the creature needs wandering and obstacle avoidance behaviors which help it move around in the world as described in section 4.4. They are used to follow walls and corridors, to avoid obstacles and sometimes for random walk.
Figure 8.3.1 The general layout of a sequential learning system. The sensory cues CS1 and CS2 are associated with two behavior modules BM1 and BM2, by the reinforcement module. After learning, the system will activate the behavior modules according to the discounted reward it expects to receive if the corresponding behavior is performed. In the simulations described in the text, four behavior modules were used and not two. In the larger simulations, 512 stimuli were used which required 4096 facilitated connections (only 8 are drawn here) and 5120 plastic weights (12 are drawn here). The activity of the nodes bi are assumed to reflect the selected behavior.
In the simulations to be presented below, we will investigate two types of procedural learning. The first will be based solely on responses, and will, thus, be an example of stimulus-response learning. The second will use only stimulus-approach modules. Figure 8.3.1 shows the general layout of these learning systems.
Depending on the type of learning required, the behavior modules can be any of the four kinds described above. In a more advanced creature, all types of behavior modules interact with each other. In either case, it is necessary to select among the different behaviors generated by the behavior modules since the learning system does not guarantee that only one behavior module at a time is active. In our first simulations below, arbitration by probabilistic behavior selection will be used as described in section 4.3. Using this arbitration scheme, the probability of a certain behavior is proportional to the activation received by the behavior module. See appendix E for a formal presentation of the network. In the next few sections, we will see how a creature behaves using different types of procedural learning.
Figure 8.3.2 A simple environment where the creature can occupy only 64 locations. The problem for the creature is to learn the shortest path from the start (S) to the goal (G). (a) The initial environment. (b) When the creature has learned the behavior sequence that leads to the goal, the goal is moved and the creature has to modify its behavior.
The sensory input consists of a number of situation categories that are generated by a a fixed grid (figure 8.3.2). In this first simulation the outcome of performing an action in a certain state will be entirely deterministic and moves the creature from one location on the grid to another.
The simulation is divided into a number of trials. At the beginning of each trial, the creature is placed at the start location (S) of the environment shown in figure 8.3.2a. The creature is then allowed to try out the different responses at random until it reaches the goal location (G). When the goal is reached, the creature performs its consummatory behavior and is again placed at the starting location. This procedure is repeated until the creature moves from start to goal in an efficient manner. The final level of performance will depend on the selection of actions. As described in section 4.3, a temperature parameter controls the level of randomness in the action selection. Since this choice is not deterministic, the creature will not always take the shortest path from start to goal.
Figure 8.3.3 The performance in the 64 state environment shown in figure 8.3.2a. When the performance has stabilized after 200 trials, the goal is moved and the creature is allowed to relearn its response sequence. The simulation was run with a temperature (T) of 0.10 and 0.50. With random behavior, it takes on the average 51 steps from start to goal. Note that the creature performs much worse than random when the goal has been moved.
After 200 trials when the creature has learned the path from the starting point to the goal, the goal is moved as shown in figure 8.3.1b and the creature is allowed to continue its business in the environment. Since the creature must now relearn the action sequences leading to the goal, its performance will drop considerably. After 200 new trails, however, the performance has almost returned to the level established for the first goal location (Figure 8.3.3).
During the second phase of the simulation, the creature had a tendency to visit the previous goal-location at more or less regular intervals. The trials which required most steps to reach the goal in this phase nearly always started with the creature first moving to the last goal location, and then on to the new one. This behavior is a consequence of previously established associations, which have not yet been completely extinguished. From a biological perspective, behavior of this type is very sensible in many cases.
When responses were selected at random without any learning, the average number of steps from start to goal were 51. As can be seen in figure 8.3.3, the creature performs much worse than chance when the goal has been moved, and the environment has to be relearned. With a low temperature, relearning took much longer than the initial learning. With the higher temperature, initial learning and relearning were more similar. Since, the creature behaves more randomly at a higher temperature, it is more likely to deviate from the best path and consequently more likely to find the new location of the goal.
In general, the selection of temperature determines the relation between exploration and exploitation of the environment. It seems reasonable to use a high temperature when drive levels are not to high, and to decrease the temperature as much as possible when the drive level becomes higher. One role of the exploratory drive described in chapter 6 is to select a higher temperature for an engagement when it wins the motivational competition.
Figure 8.3.4 The primitive responses in the second simulation.
These responses were chosen so that the creature could potentially end up in any location within the bounds of its environment. When the whiskers sensed a wall to the left or to the right, the creature was forced to select the action that turns away from the wall. A wall to the left would select the response turn-right, and a wall to the right would select the response turn-left.
Like in the previous simulation, the stimulus sensed by the creature was generated by a grid. Note that this grid was used to generate sensory information only, and not to control the movements of the creature. To let the creature know which way it was moving, the sensory state was also a function of the direction in which the creature was heading. This direction was represented with a resolution of eight directions (figure 8.3.5). All in all, this lets the creature recognize, 8¥8¥8=512 unique states, or stimuli. The creature is consequently not able to generalize its behavior across different locations.
When the creature is located in square and facing direction d, the stimulus CSk=1 for k=8(i+8j)+d and CSk=0 for all other stimuli. This should be compared with the infinite number of possible locations the creature can potentially occupy within its environment. The creature has, thus, only a very rough idea of where it finds itself at any time.
Figure 8.3.5 The sensory grid. The creature can recognize 64 different locations in the environment, and 8 different directions. Taken together, this results in 512 different stimuli.
Figure 8.3.6 shows how the performance of the creature becomes better for each trial at different temperatures. At a temperature of 0.005, the creature nearly always choose the best response learned so far for the current stimulus situations. As can be seen in the uppermost diagram in figure 8.3.6, this results in a decrease in the number of steps needed from start to goal for each trial. When the performance has reached a low level, however, a number of spikes appear in the performance level. These are a result of a rather narrow representation of the environment. Since the creature has nearly always chosen the response with the highest expected reward, it has only learned about the locations on the path from start to goal, and not about the other locations in the environment. If it, by chance, selects a response which leads away from the learned path, it will not be able to find its way back again. In these cases, its behavior will be essentially random which gives rise to the large spikes in its performance.
At a higher temperature of 0.010, the frequency of these spikes decreases and disappears completely after 450 trials (figure 8.3.6 middle). At the even higher temperature of 0.020, the spikes essentially disappear, but the performance becomes worse since the choice of response becomes less accurate (figure 8.3.6 bottom).
Figure 8.3.6 The performance in a 8¥8¥8 grid with the four responses, move-ahead, move-slowly, turn-left, turn-right with a stochastic response-selection and a temperature (T) of 0.005, 0.010 and 0.020.
In a final simulation, the behavior of the stimulus-response creature was tested in a simple maze. Figure 8.3.7 shows how the behavior improves over the trials. In the first trial, behavior is entirely, random and as can be seen, the creature spends a lot of its time moving back and fourth aimlessly. After 50 trials, the creature begins to perform more sensibly in the later half of the maze, but behavior is still random in the first half of the maze. With more trials, the behavior becomes better however. When the simulation was stopped after 200 trials, the behavior was nearly almost perfect. At times, the creature would deviate from the optimal path, however, as can be seen in the final illustration.
Figure 8.3.7 The behavior of the stimulus-response creature in a simple maze after different number of trials.
In the first simulation, the creature had to learn the 8¥8 environment described above. As can be seen in top of figure 8.3.8, the performance becomes better with training, although it is not clear whether it evetually converges to the optimal behavior or not. A number of simulations have been run in this environment, but none of them has converged on the optimal behavior. Note, however, that the behavior becomes much better than random search.
Figure 8.3.8 The performance when the creature learns stimulus-approach chains. Learning is very slow, although the performance becomes better with increased practice. It is not clear whether learning will eventually converge to the optimal level or not.
To test if the learned behavior could in fact converge, a number of smaller simulations were run in an environment that used only 16 grid locations. In this case, the learning process nearly always reached a nearly optimal level on most trials when the temperature was sufficiently low. There appears to be two reasons why learning was much better in the smaller environment.
The first reason is that fewer states had to be visited to learn the environment. This obviously leads to faster learning. This does not, however, explain why the learning did not appear to converge in the larger environment. We suspect that the failure of the learning in this case was a consequence of the continuous change of the selected approach behavior. This made it very likely that the creature would reach one stimulus while trying to approach another. As a result, the wrong behavior would very often be rewarded. In the smaller environment, the creature is more likely to reach the stimulus it tries to approach since, on the average, it is at a shorter distance. The incorrect behavior will consequently be less commonly rewarded.
If this suspicion is correct, it implies that a better stimulus-approach chaining could be obtained by forcing the creature to approach the selected stimulus for more than one time step before a new behavior is selected. This has not been tested in any simulation, however.
A final simulation was set up to test if the creature could generate its own categories instead of relying on a fixed set as above. To accomplish this, the categorization mechanism described in chapter 7 was used to process the sensory input before it was handled over to the chaining mechanism. Since the left and the right sensory signals were both included in the sensory representation, the creature could potentially recognize both its location and its heading as before. The matching threshold was set to make the creature generate approximately the same number of categories as it had used in the previous simulations.
Figure 8.3.9 The environment used for response chaining with dynamically created place categories. Seven stimuli, and the start (S), gives off different smells that are recognized by the creature.
Figure 8.3.9 shows the environment with eight different stimuli (the start, S, is one of these). These will generate a sensory input which is unique for each location in the environment and changes smoothly with the location of the creature. The input to the categorization network consists of the array of sensory signals from both the left and the right sensors, and was normalized as described in section 7.2.
Figure 8.3.10 presents an overview of the current system (compare figure 7.2.4). The sensory input is first categorized by the categorization network presented in chapter 7. New categories are generated when the best category does not fit the input sufficiently well. The categorization network will, thus, generate a Voronoi tessellation of the sensory space, which is subsequently used instead of predefined place categories. In the procedural learning system, the place categories are associated with the appropriate behavior using the network described in section 8.3. The output from the learning system is finally used to activate the correct behavior module.
The connections in the overall system are mainly feed-forward, but there are also connections in the opposite direction. The procedural learning system can request the generation of a new category when the actual reward is less than the expected as described in section 7.2. The matching module shown in figure 7.2.4 is here considered a part of the procedural learning system.
There are also back-connections from the behavior modules to the procedural learning system which represent the performed behavior. It is, thus, assumed that behavioral competition takes place among the behavior modules and not in the learning system itself.
Figure 8.3.10 The architecture of a procedural learning system with categorization. The categorization modules generate catagories using the network shown in figure 7.2.2. The procedural learning system consists of the network in figure 8.3.1. Learning in this subsystem is controlled by reward and punishment. The procedural learning system can request new categories from the categorization network when the actual reward is lower than the expected, as shown in figure 7.2.5. The behavior selection is assumed to take place within the behavior modules and sent to the learning system to control learning.
The behaviors shown in figure 8.3.4 were used also in this simulation. Since new categories were generated as soon as they were needed, the categorization was opaque to the procedural learning system. Since approximately the same number of categories were generated as before, the learning did proceed almost identical to that shown in figure 8.3.6. Within this environment, the back-connections from the procedural learning system to the categorization network did not appear to play any role. Although some categories would be created using these back connections, the speed of learning was not affected if these were removed. We suspect that they will be needed in a more complex environment, however (compare section 7.2). Especially if we allow the creature to generalize (see chapter 9).
To summarize, we have shown how a categorization network in combination with a chaining mechanism can form its own categories which it subsequently uses to learn behavior sequences. This view of learning is consistent with the division into separate systems for categorization and behavioral learning described in section 2.5. Also note that reward is used only within the procedural learning system and not in the categorization network. If we assume that learning within the categorization network is slower than within the procedural system, we would obtain a faster learning process if the creature is first exposed to the environment without being rewarded. This would, thus, be an example of latent learning.
This problem was solved by placing the stimuli sufficiently close to each other. The drawback of this solution, apart from being a specially prepared environment is that only a fairly small environment could be used. To solve the problem more generally, a better type of place-categories are needed. However, the model could easily be extended with real place-categories. Any of a number of models could be used here. For example the models proposed by Zipser (1985), Prescott and Mayhew (1993), Schmajuk and Blair (1993), and Touretzky and Redish (1995). This problem is, thus, sensory rather than architectural.
The second problem has already been discussed in the end of chapter 5. The chaining mechanism models interstimulus interval effects rather crudely. This demands that the stimulus category changes between each time step, which in turn requires much more categories than would otherwise be necessary. We believe that by including a mechanism for ISI-effects, learning will be much faster (see appendix C). In the simulations described above, the learning rate at the negative side of the reinforcement module was half that of the one at the positive side to compensate for the times when the category would not change as the creature moved. While this solution appears to work fairly well, it is still not very satisfactory.
The final limitation of the procedural learning system is that it cannot generalize at all since it uses a localist representation of location and direction. This means that one node in the network codes for each specific location. It is likely that a better performance could be generated by letting the creature use distributed representation instead. This is an idea we will again return to in chapter 9.
The model of spatial orientation based on expectations we will present is similar in spirit to that of Schmajuk and Thieme (1992). The network used here is different in some respects, however. One important difference is that the present model is based on an expectancy network instead of an ordinary recurrent network. The expectations used are, thus, considered to be established through classical conditioning during exploration of the environment as described in section 7.4.
We will assume that the appetence behavior of the creature is divided into two alternating phases. In the first phase, the creature samples the various stimuli in the environment and calculates how well each of them predicts the desired goal. In the second phase, the creature approaches the stimulus which is the best goal predictor. (Compare the combined orientation and approach system described in section 4.2.)
Figure 8.4.1 presents a small network which uses an expectancy network to calculate how well each stimulus in the environment predicts the desired goal. The purpose of this network is to select the best goal-predictor and make the creature approach it (see also appendix E).
Let us assume the creature finds itself in a situation where two stimuli, CS1 and CS2, are present. The creature is assumed to evaluate each stimulus in turn in the following way. First, a brief pulse, DCS1, is sent to the network. This signal will activate the node v1 which will stay active as long as CS1 is evaluated. The pulse will also enter the expectancy network where it will be propagated through the network and generate expectations from CS1. These new expectations will then enter the expectancy network again and generate further expectations from the new nodes, and so on. The expectations which return to the network again will be called recurrent expectations. This process will continue until the goal representation, p, is activated. When this happens, the node p activates the two nodes a1 and a2. The plastic connection u1 will consequently sample the activity level of the goal representation at a1. The connection u1 is used as a short-term memory store of the goal prediction for CS1. The output of p will also reset the node v1 which will stop further sampling at u1. The processing of stimulus CS2 will then proceed in the same way.
Because of the discount factor used in the learning process, the activation of the goal representation will be weaker with an increasing number of associative steps. As a consequence, the activation of the goal representation will be strongest for the stimulus with the smallest psychological distance to the goal. The weights on the connections u1 and u2 can, thus, be considered as representations of how well CS1 and CS2 predict the goal.
Figure 8.4.1 Using the expectancy network for calculation of goal prediction. In the simulations described in the text, eight stimuli and approach behaviors were used. In this case, the total network uses 52 nodes. See the text for further explanation.
By activating node a, the activity of the nodes a1 and a2 will reflect the goal predictions of the two stimuli. Since the two nodes a1 and a2 compete with each other as described in section 3.3, only the node corresponding to the best goal predictor will emit a signal. This signal will be used in the second phase to approach the best goal predictor. Since the creature uses its expectancies to look ahead along the evaluated path, this type of subgoal selection will be called look-ahead choice.
Let us again consider the situation described at the beginning of chapter 4 (see figure 4.2.3 and 4.2.4). The creature wants to move from its current location SS to a goal location, which we will call G, and samples the two possible routes in turn to see which path is shortest.
First it evaluates path X where stimulus S2 is used as the first subgoal. The stimulus S2will generate an expectation of the stimulus S3, which in turn will generate an expectation of S4, and so on, until the goal representation is activated. Since there are five associative steps from S2 to the goal, and each expectation is discounted by a factor d, the goal prediction given by S2 will be d5.
Now the creature evaluates the other possible route, called Y, in the same way. The recurrent expectations which activates the goal representation from stimulus SII will be discounted by d2 since two associative steps are needed. Since d2 is larger than d5, the creature will subsequently chose path Y over path X. It seems appropriate to consider the expectations formed as an internal representation of the life-space of the creature. In chapter 9, we will discuss how this mechanism relates to planning and problem solving. Below, we investigate how our artificial creature can use recurrent expectations to behave in a simple environment.
Figure 8.4.2 Behavior based on expectancies. (a) The first trial when the environment is explored. (b) The creature performs very well even on the second trial. The main problem for the creature is not to find the goal, but to avoid walls. A very simple obstacle avoidance module was used in this simulation which generates the inefficient turn around the final corner before the goal.
With learning as fast as this, we can investigate much more complex situations. A second simulation was run in which the creature was placed in the maze presented in figure 2.15.1. A similar maze was used by Tolman and Honzig (1930) to show that the behavior of rats is guided by a cognitive map rather than by habit, and that the animals can show "insight" into the maze when needed. In the simulation, we tried to mimic the experiment by Tolman and Honzig as closely as possible. Since the sensory system of our creature is much more primitive than that of a rat, some differences are necessary however. To let the creature solve the problem, it is necessary that at least one stimulus is placed in each corner of the environment as shown in figure 8.4.3a.
In the first phase of the experiment, the creature was allowed to explore the environment as shown in figure 8.4.3b. During its exploratory behavior, the creature will set up expectation about which stimuli are close to each other by classical conditioning (see appendix I). After this phase, the creature is forced to select stimulus 1 as a goal and is tested under one of three conditions.
Under the first condition, the creature is simply placed in the start box in the bottom of the maze. As can be expected, the creature starts to approach stimulus 1 by the direct path (figure 8.4.3c). Since the creature was initially directed away from the goal, it first had to turn which is the reason for the small deviation from the shortest path shown in the simulation.
In the second condition, the shortest path is blocked between stimuli 7 and 8 as shown in figure 8.4.3d. As a result, the creature will choose the second best path to the goal instead, that is 7-6-8-2-1. Initially, the creature chooses stimulus 7 as the best goal predictor, but while it attempts to approach it, the expectation from stimulus 7 to stimulus 1 will extinguish. At this time, stimulus 6 will become the best goal predictor and the creature will turn toward it instead and choose the path to the right in the maze. As described in chapter 2, this is also the behavior one would expect from a reinforcement learning mechanism.
Under the final condition, the path from start to goal was blocked as shown in figure 8.4.3e and f. As before, the creature starts to approach the goal by the shortest path until stimulus 8 loses its role as best goal predictor. At this time, the creature senses two other stimuli, 6 and 7. Depending on which of these stimuli are the closest, the creature will either choose the behavior in figure 8.4.3e or that in f.
According to Tolman and Honzig (1930), this is an example of insight on behalf of the creature. If it had used a stimulus-response chaining mechanism like the one described earlier in this chapter, the creature would not know what to do when the path was blocked and when it had reached stimulus 7 again after the behavior from stimulus 8 had become extinguished, it would not have chosen the second best path as under the second condition. It would, thus, have tried the path 7-6-8-2-1 even though it had experienced the blocking between stimuli 8 and 2 already.
The creature could only choose the longest path immediately if it had knowledge about the layout of the environment, and not only about what behavior to perform where. With such knowledge, it could infer that the blocking on the shortest path would also be a blocking on the second best path.
This learning mechanism is, thus, a large step from the stimulus-response chaining described above. Also note that for expectancy learning, it is much more reasonable to use stimulus-approach behavior than fixed responses. In the network presented here, this made it possible to form expectations between stimuli without any regard for how the approach behaviors would be executed.
Figure 8.4.3 A simulation of the experiment run by Tolman and Honzig in 1930. (a) The location of the different stimuli. (b) The exploratory phase. (c) Behavior when all paths are free. The creature chooses the shortest path from start to goal (1). (d) The creature selects the second best path when the shortest is blocked. (e) One of two behaviors generated when the shortest and the second shortest path are blocked at the same place. When the creature finds the shortest path is blocked, it will select the second best goal predictor instead. In this simulation, stimuli 6 and 7 are equally good goal predictors. As a consequence, the creature will sometimes choose the behavior in figure (e) and sometimes the one in figure (f).
In the spatial domain, expectancy learning is obviously more efficient than stimulus-response learning, but this type of learning also has its share of problems. These are mainly a consequence of limitations of the expectancy network. In the simulation described above, the creature would alternate between two paths from start to goal when the direct route had been blocked (figure 8.4.3e and f). In this simulation, the path 8-7-5Š is always shorter than the path 8-6-7-5Š, but the creature would very often choose the longer path instead. The reason for this is that the expectancy network does not convert physical distance into psychological distance in the optimal way. Since stimuli 6 and 7 have both been conditioned to stimulus 5, they are both equally good goal predictors allthough the distance from 6 to 5 is longer than the distance from 7 to 5 (see appendix E). To handle this problem, physical distance must somehow be included in the expectations.
Including such information is not very hard if the absolute smell intensity of each stimulus is known, but it is hard to see how such information could be generated by other modalities. What is needed here is a representation of the distances between the different stimuli which is independent of the sensory modality used. Such distance information could possibly be generated by dead reckoning as the creature moves from one location to the next during the exploration phase or by including systems for distance calculation within each modality. In either case, the association formed in the expectancy learning should be a function of the perceived distance from one stimulus to the next (compare section 5.10).
In view of these properties of the two systems, one may ask whether we have bothered to present the procedural learning system at all. There are a number of reasons for this. The first is that it is clear that a system of this kind exists in real animals (Shimamura 1990). When a task has been tried on a large number of occasions, animals will typically start to behave as if a stimulus-response mechanism is involved. This is usually called overtraining (Gallistel 1990). One possible role of the procedural learning system in this case may be to relieve the expectancy network of controlling repetitious behaviors, and to free it for more important tasks (see chapter 9). We may also view the expectancy system as a tutor for the procedural learning system. When the internal incentives generated by the procedural learning system becomes higher than those of the expectancy system, the control of behavior will automatically shift to the procedural system. The relation between these two systems will be further discussed in chapter 10.
Another possible use of the procedural learning system is to modulate the behavior generated by the more cognitive system. The expectancy network generates a fairly good behavior very quickly, but it is not optimal. By combining both types of systems, learning becomes fast while still being able to reach an optimal level eventually. In this case, the expectancy network can be seen as a search-heuristic for the procedural learning system. Since the expectancy system only executes approximately correct behaviors, the procedural learning system will not have to train on a large set of useless behavior sequences as when it is used on its own.
It is interesting to note that the network for expectancy learning is not much more complex than the one for procedural learning. The chaining of stimulus-response associations is in no way simpler than the learning of expectations. In the models presented here, the same type of reinforcement module was used as the main building block for both systems.
The present chapter can, thus, bee seen as a contribution to the classical controversy between the two views of learning. In the tradition started by Thorndike, learning is seen as the acquisition of habits much like the behavior chaining described above. In the other tradition, which is usually associated with Tolman (1932), animals are assumed to acquire knowledge about their environment rather than habits. If learning was, in fact, the acquisition of knowledge, the behaviors of animals in various experiment appeared much easier to explain than with a habit theory. There was one large problem with the cognitive theory, however. It seemed impossible to give a mechanistic account for how the knowledge was converted into behavior.
It took many years before a mechanistic model was presented which could explain the purposive behavior of animals (see Gallistel 1980). The first model was probably presented by Deutsch (1960), and used a mechanism which is in many respects similar to the one described above. This model was also different from the one presented here since associations would not flow from potential subgoals to the goal, but instead from the goal to the current location of the creature. As already mentioned, the model presented by Schmajuk and Thieme (1992) is more similar to the expectancy net used here since it too selects a number of potential goal predictors and evaluates them sequentially.
We can compare the two systems presented here with the distinction between procedural and declarative memory (Shimamura 1990, Squire 1992), although these systems are naturally much more complex in real animals. The view promoted by the present chapter is, thus, that both these theories are valid, but for different types of learning. We also believe that the use of expectations rather than simple associations will make it easier to understand how planning and problem solving abilities can be seen as a more advanced form of recurrent expectations. This is something we will return to in the next chapter.
|
Natural Intelligence in Artificial Creatures © 1995 by Christian Balkenius Lund University Cognitive Studies 37 ISBN 91-628-1599-7 ISSN 1101-8453 ISRN LUHFDA/HFKO--1004--SE |
|
Lund University Cognitive Science Kungshuset, Lundagård S-222 22 LUND Sweden |
| sekreteraren@lucs.lu.se |