Published in Parallel Problem Solving from Nature 2, R. Männer and B. Manderic (eds.), North-Holland: Amsterdam, Sept. 1992.
by Kurt Thearling
The ability to genetically breed computer
programs is a relatively new approach to problem solving. Recent
advances in parallel processing have provided platforms to
efficiently carry out this type of computation. In this paper we
describe a genetically optimized artificial lifeform. The goal of
this work is to breed swarms of artificial organisms that can
move over corrupted bitmapped images and correct any errors that
are found. We will show that the genetically optimized lifeforms
can correct most instances of some forms of errors. In addition,
we will show that ability of the organisms to communicate can
improve the fitness of an organism. To efficiently carry out this
task, an organism exploits two different forms of parallelism.
The first form is an implementation specific form of parallelism
that makes use of the massively parallel processing paradigm to
evaluate an organism's performance in parallel for each
generation. The second form is derived from the independent and
parallel operation of each organism with the others.
Much of the research in the field of artificial life asks the fundamental question: ``Can something inside a computer be considered alive?'' Although this is an important question, it does not really address whether or not artificial life has something to offer its creators other than a philosophical curiosity. This has motivated us to to consider some of the more practical questions regarding the abilities of artificial life. In this paper we investigate the application of artificial lifeforms to problem solving tasks. The specifics of the problem at hand involve image reconstruction but it is hoped that the capabilities of the lifeforms transcend the specifics of this problem.
The past twenty years have produced numerous publications related to the use of genetic algorithms for optimization. Applications ranging from the traveling salesman problem to job-shop scheduling have been reported. The following are just a small sample of related work published in recent years. In , Jefferson and his colleagues used genetic algorithms to breed ``ants'' which attempt to traverse an incomplete trail in a lattice environment. The ``ants'' are controlled by either a finite state machine (FSM) or neural network. The genome specifies a state transition table (in the case of a FSM controller) or connections/weights (in the case of a neural network controller). During each generation the fitness of an individual organism (which is defined as the the distance along the trail that an organism travels) is evaluated. The test trail does not change between generations. Jefferson and his colleagues were able to breed organisms which could successfully navigate complex and corrupted trails to completion. As will be shown later, the key difference between Jefferson's work and the work presented in this paper is that we are dealing with multiple organisms operating in the same environment simultaneously. This allows for more complex problems to be solved and introduces the possibility of cooperation between organisms.
The work of Ray  is also an example of an evolutionary approach to the optimization of programmed organisms. Each organism consists of a sequence of instructions from a language which has been specially designed to allow for evolutionary operations. The fitness of an organism, like the fitness of real biological organisms, is the ability to efficiently reproduce and generate offspring. A simulation system called Tierra, which executes an organism's program, provides an environment for the competition between an inhomogeneous collection of organisms for limited resources. Ray has shown that a simple ``ancestor'' program can evolve into a complex soup of complicated organisms.
Finally, the recent work of Clearwater, Huberman, and Hogg  describes a series of experiments to evaluate the effectiveness of cooperation in a group of agents. They show that a collection of agents that are allowed to communicate (via a central ``blackboard'') are able to solve a simple constraint satisfaction much faster than agents that are not cooperating. Although the cooperation results presented in this paper do not use a blackboard for communication, they do suggest the communication can improve an organism's overall fitness.
... However, some dermestids are valuable as scavengers; some of the carrion-feeding species (e.g. Dermestes caninus) are used by zoologists to clean skeletons of animals. 
The goal of the Dermestes artificial lifeform is to wander around in its environment trying to correct mistakes that are discovered. The allusion to the Dermestes beetle refers to our attempt to create an artificial organism which cleans a data skeleton by ``eating'' errors that it comes upon. In the context of this paper the intended environment is a bitmapped image containing pixels that are either on or off. Each organism starts out at a different location (the ``home'' pixel for the organism) and the number of organisms equals the number of pixels. It is the job of each organism to decide to either correct (i.e. flip the value) or leave unchanged the value for its home pixel.
An individual Dermestes organism is controlled by a finite state machine which takes as input details about the environment and the current state. This information is mapped to a range of possible movements as well as decisions about correcting its home pixel. The FSM can also enter the ``done'' state which means the organism should stop moving and use the last correction decision as the final decision. Once all organisms are ``done,'' the process is complete. It should be noted that there is a hardcoded ``lifetime limit'' on the number of steps an organism can take. This prevents an infinite loop from causing an organism to never terminate.
The population operated on by the genetic operators are a collection of genomes for the different instances of the Dermestes organism. Typical population sizes in the work presented here range from 64 to 1024 and are inhomogeneous. Once a population of genomes has been generated, each genome in the population is assigned a different corrupted image to work with. At this point the individual genomes are replicated so that each organism in a homogeneous sub-population is assigned a unique pixel in the test image. Each member of the population operates on a slightly different image with each organism in the sub-population responsible for an individual pixel within an image. To alleviate ambiguities, we will refer to a the collection of homogeneous (i.e., the same genome) organisms assigned to an individual image as a sub-population. A population will refer to the collection of all sub-populations. Sub-populations do not interact each other.
The test environment which will be discussed in this paper is a simple grid-like pattern of size 32 by 32 pixels. Each pixel corresponds to a single organism which means that there are a total of 1024 organisms in each sub-population. The individual pixels are either ``on'' or ``off''. Two classes of errors have been experimented with: unidirectional and bi-directional. Unidirectional means that there are only ``on'' changed to ``off'' errors. (Note that this implies that after the errors have been introduced, any ``on'' value cannot be an error). Bi-directional allows for both ``on'' to ``off'' and ``off'' to ``on'' errors. In each generation of the simulation, the errors that are introduced change. The goal of this work is to breed organisms which can correct all errors, not just a specific error pattern (i.e., generalize). An example of the a unidirectional error pattern is shown in Figure 1. The image on the left side of the figure shows the original, corrupted image. The image in the middle shows the pattern after being operated on by an organism with a high fitness (the image on the right shows where the uncorrected errors remain). The correct pattern is a grid and as can be seen, most of the errors have been corrected.
Figure 1: Example of bitmapped environment before and after organisms are applied.
Using a parallel computer (the Connection Machine CM-5), we are able to perform the evaluation step of the genetic optimization in parallel. Each parallel processor is assigned several genomes to evaluate and each evaluation is handled in parallel. Typical runs of 1000 generations take approximately one hour on a 128 processor CM-5 for a population of 256.
The genome of an organism represents the finite state machine controller for an organism. It corresponds to an enumerated list of all possible states and transitions from those states. Encoded in each word of the genome are transitions from one state to another. There are four pieces of information coded into each state transition:
A transition is chosen based on the following information:
There are eight different transitions out of one state to other states. These eight transitions correspond to the enumeration of the possible input values for the last three items in the above list (each value is binary). In addition to these three input values, there are others that can be added. As will be discussed later, the ability to communicate can be incorporated into the organism's FSM controller.
The length of the genome is related to the number of states in the FSM and the number of transitions out of each of the states. For an N state FSM, the total number of transitions (which is the number of entries in the genome) equals 8 N. Each transition is specifies the four pieces of information in the above list (next state, movement, and correction) and this information is encoded in the genome as a single 32-bit word to simplify the representation. Unused leading order digits are ignored.
The movement outputs are used to allow an organism to move within the image. Both forward/backward and sideways actions are allowed in the same move (which means that diagonal moves are permitted). It is also possible for an organism to decide to stay where it is (i.e., both movements return a null value). Each organism starts out on their home pixel but subsequent iterations of the FSM could (and in most cases do) move the organism to other pixels. Multiple organisms are allowed to move to the same pixel. The original (uncorrected) value of the pixel that an organism occupies is provided to the organism. This information is used in the mapping to the next state.
The genetic optimization of the organism genomes uses the traditional (two point) crossover and point mutation operators. The fitness of a sub-population is simply the total number of pixels that are correct when all of the organisms within that sub-population are either done or out of time. Organisms that do not finish do not contribute to the total fitness (i.e., they are equivalent to organisms that are incorrect) but there is no explicit penalty assigned for not finishing.
After all sub-populations in the current generation are finished, the fitness scores are sorted and the best X% are allowed to live (X is called the reaper percentage - it is a user specified parameter). If X is set to one hundred percent, all organisms are allowed to survive. Once the surviving organism are chosen, they are replicated for the next generation based on their fitness. The number of offspring a particular genome receives is determined by taking the sum of all fitness scores and calculating the percent of the total which is contributed by that genome.
Once the offspring for the next generation are allocated, the next step is mating. Each genome is assigned a random partner from within the population. Incest is not prevented. The pairings are such that each sub-population is guaranteed one and only one mate. After the matings are complete, two-point crossover is carried out between the pairs. Point mutation (at a user specified rate) completes the reproduction process. A more detailed description the implementation of this algorithm is available in .
The first experiments carried out with the Dermestes organisms involved unidirectional errors. Various error levels were created and the organisms were then challenged to correct these errors. Subsequent generations (after the application of the genetic operators) were given different error patterns (where each pattern contained the same number of errors). The results of three such experiments are illustrated in Figure 2.
Figure 2: Maximum fitness vs. Generation for various numbers of error.
Each curve corresponds to the best score among the population in a given generation. Since the specific errors change each generation, the curve is not strictly monotonic (although the general trend of each curve is a monotonically increasing). The three different curves represent 124, 224, and 324 initial errors (out of 1024 pixels). The parameters for the genetic evolution for these experiments were:
Since there are eight transitions out of every state, a genome length of 2048 corresponds to a FSM with 256 states.
As can be seen in Figure 2, approximately 110 out of 124 errors were corrected. When the error rate was increased to 224 errors, approximately 150 could be corrected. Finally, when the error rate increased to 324 errors, approximately 210 could be consistently corrected.
One key point to remember is that in the case of unidirectional errors, it is impossible for ``on'' values to be in error. Thus any existing ``on'' values can provide a reference point when an organism tries to orient itself. The organisms whose home pixel has an ``on'' value take advantage of this situation by quickly deciding that the existing value does not need to be corrected. Organisms whose home pixel is ``off'' search for ``on'' pixels and use the location information to decide if their home pixel should be ``on'' or ``off.'' Increasing the error rate decreases the number of reference pixels and hampers ability of the organisms to orient themselves. This is the reason that the maximum fitness decreases as the error rate increases. As will be seen later, this reference frame is important in effectively correcting the errors. Bi-directional errors, which do not provide a reference frame, cannot be corrected to the same degree of accuracy as the unidirectional errors.
The genome length parameter was investigated to determine how
it was related to fitness. Various lengths, from 32 to 4096
(increasing by factors of two) were tried. Figure 3 shows the
results for the maximum and average fitness scores for
generations 500 through 1000. As can be seen, the optimal length
for this problem appears to be 128 or 256. Above this length the
fitness scores decrease slightly. Below a length of 128 the
scores are substantially reduced. We propose that there are two
likely explanations for this behavior. When the length is too
small, it it impossible to produce a FSM with enough complexity
to solve the problem at hand. The organism cannot generalize its
behavior to cover all possible error configurations. When the
length is longer than necessary, there are more opportunities to
make a mistake so more craftsmanship is required to produce a fit
Figure 3: Maximum Fitness and Average Fitness vs. Genome size.
We now consider the situation in which both ``on'' to ``off'' and ``off'' to ``on'' errors are allowed. As discussed earlier, this class of error would appear to be much more difficult to correct than the unidirectional errors. Our first experiments on bi-directional errors used the same type of FSM that was used with the unidirectional errors. In these experiments, approximately 30 out of 124 errors (in images with 1024 total pixels) could be corrected. Obviously the performance is not nearly as good as with the unidirectional errors (where 110 out of 124 errors were corrected).
It occurred to us that some of the organisms might be able to decide on correction more easily than others under the right circumstances. Such a situation might occur if there were clusters of correct pixels. The organisms responsible for those pixels might be able to quickly decide that no correction was necessary. This information might then be useful to nearby organism which could use it to orient their position relative to the grid pattern. This corresponds to a very simple form of communication between organisms.
To do this it is necessary for an organism to communicate the fact that a decision about the correction of its home pixel has been made to other organisms. This can be encoded in the FSM by adding an additional input which states whether or not the organism whose home pixel is the currently occupied pixel is done yet. If the organism is done, the original value at the current pixel is replaced by the corrected value. The inputs to the FSM are now:
Experiments on the same bi-directional error patterns (124 errors out of 1024 pixels) were run using this new FSM configuration. Although the results are not dramatically better than the non-cooperative fitness results, they are consistently better by a small amount. From generation 200 to 1000, the cooperative organisms average 4.97% higher fitness than the non-cooperative organisms (Figure 4).
Figure 4: Cooperative (black) vs. Non-cooperative (red) fitness results.
This same cooperative strategy was also tried with unidirectional errors. Since there are already a large number of reference points (the guaranteed to be correct ``on'' pixels) to start with, the addition of communication did not appreciably improve the fitness.
This paper described an investigation into the genetic optimization of an artificial organism. The purpose of this organism is to reconstruct corrupted bitmapped images by moving over the image correcting errors. This optimization is carried out by a collection of organisms operating in parallel. We have shown that one class of errors (unidirectional) can be effectively corrected by this type of organism. Although a more difficult class of errors (bi-directional) cannot be corrected as easily, the addition of a cooperative problem solving strategy provided a measurable improvement in performance.
Thanks to Stephen Smith, David Waltz, Kris Carlson, and Karl Sims for conversations, ideas, and support.