What is the future of genetic algorithms

Reinforcement learning: genetic algorithm

Algorithms in the reinforcement learning category learn independently by trying to maximize rewards or minimize penalties. Behind this is the principle of trial and error, combined with an evaluation that rewards good (goal-oriented) behavior and punishes bad behavior patterns. A reward here means that these behavior patterns will be tried out more frequently in the future. In the event of a punishment, the behavior patterns used will in future be tried out less frequently.

  1. QUNDIS GmbH, Mannheim
  2. SFC Energy AG, Brunnthal near Munich

The algorithm runs through a large number of iterations in which it combines proven behavior patterns and randomly tries out new behavior patterns. In this way he comes closer to the optimum step by step. The best-known representatives of this category are the genetic algorithms, which are based on Charles Darwin's theory of evolution.

Reinforcement learning is used in minimize and maximize tasks. It is also used in learning processes in which one should react to changing environmental influences. For example, reinforcement learning could be used to teach a colony of robot ants how to get around optimally. Each robot ant would initially try to move forward with a random movement technique.

Success can be measured (fitness function): the distance covered. In the next generation, locomotion techniques that have been exceptionally successful will be combined with one another more often than average (recombination) and their characteristics will be inherited, which means that their characteristics will be used more frequently in the future. A generation is the totality of all individuals who are sexually compatible with one another for reproductive purposes in one step of the temporal chain of reproduction.

In addition, however, a new, random movement feature is always tried out with a certain probability (mutation rate). This corresponds to the mutation in evolution. At the end of each generation, the fitness function is evaluated again. As a result, the robot ants have become more and more successful in moving over the many generations.

Reinforcement learning has the advantage that learning also takes changing environmental factors into account. If the terrain changes every now and then, for example because it rains and the ground becomes muddy, the evolution of movement techniques will take this into account. That is why life on earth was able to continue despite ice ages and dry periods: It has adapted to the new environmental influences.

The three most important principles in genetic algorithms are the terms recombination, mutation and selection. Recombination stands for the accidental mixing of 50 percent of each parent's genetic make-up during sexual reproduction and its transmission to the child. In genetic algorithms, recombination is the mixture of properties of the parent generation when they are passed on to the child generation. In genetic algorithms, mutations are random changes in the properties of individual individuals.

Selection means that individuals with the better genes have a higher chance of living long and having many offspring. The selection is driven by external pressure: predators in nature, food shortages, epidemics, climatic challenges, etc. In the case of genetic algorithms, selection usually takes place through a mathematical evaluation function: the so-called fitness function. This function awards points (score) that evaluate the achievement of goals. Another possibility is that the function calculates the costs and the goal is to minimize the costs. Costs can be very different here: distances, monetary costs, fuel consumption, failure probability of components, etc.

How the algorithm works is shown in Figure 4.

How the genetic algorithm works (Image: Miroslav Stimac)

The execution of the algorithm ends either after reaching the target or exceeding a target number of points, which can be calculated with a mathematical evaluation function, or after a number of generations specified by the user.

Due to the way it works, a genetic algorithm has two important properties: It does not guarantee an optimal result, but usually only an improvement from generation to generation. In exceptional cases it can even happen that a subsequent generation deteriorates. This often happens due to a too high mutation rate.

In addition, the end result achieved is not the only possible one, nor is it always repeatable. If you run the algorithm a second time with the same initial population, the result is often different than the first time. This is because recombination and mutation are influenced by chance. A mental game: if one were to clone the earth from 5 billion years ago and let evolution take place on both earths, if the evolution theory of Charles Darwin really describes the development of life on earth correctly and completely, it would be very likely that on the second earth, not humans, but another, perhaps more or less intelligent form of life would arise. It is therefore often advisable to repeat the reinforcement learning with the genetic algorithm a few times and to compare the results.

  2. 1
  3. 2
  4. 3
  5. 4