Using a genetic algorithm to express a number as a multiple and sum of others
14 min read

Using a genetic algorithm to express a number as a multiple and sum of others

Preface

Genetic algorithms are algorithms used for optimizing parameters. I won't dwell more on explaining what they are (perhaps in another article), I'll focus on implementing one (while still explaining what's happening). However, if you have no idea what genetic algorithms are, I invite you to check out this video and then keep reading!

Intro

So, what do I mean by expressing a number as a multiple and a sum of others? Well, for instance, 14 can be written as :

14 = 1*7+7*1
14 = 1*5+6+3
14 = 2*5+2*2

Or even :

14 = 5+1*5*1+3+6-5*1

You get the idea. Two things are noteworthy: First of all, the order of operation follows the standard order (2*5+2*2 is the same as 10+4). Secondly, even though I only talked about multiple and sums, a subtraction made its way to this article. However, this is not a problem because subtractions are just additions with one (or more) negative numbers.
Now the real question: What is the practical use of writing a number this way? I have no idea, but it's fun, so let's get into it!

The idea behind the algorithm

Each individual will be stored as a str type representing one or multiple operations. For example, 1*5+6+3 could be an individual and stored as
individual = "1*5+6+3". Then the fitness evaluation function will simply evaluate each individual with python's eval function and calculate the distance to the target number (14 in this case). For example, an individual represented by 2+5*8-9 (=33)would get a fitness score of 19 (because the distance from 14 to 33 is 19). We will try to minimize the fitness score to optimize our individuals.

Code

If you're simply interested in the code then you can just download it down below or scroll to the bottom of the page and copy-paste it. However, if you follow through, I'll dissect the code and split it into multiple parts to make it easier to understand. So let's get right into it:

Setup and creating our population

import random

TARGET = 1234 #CHANGE THIS TO WHATEVER

class FindNumber:

    def __init__(self, target):
        self.target = target
        self.mutation_probability = 0.05
        self.individual_length = 7 #This represents the numbers in an individual, not total characters
        self.population_size = 1000
        self.keep_population = 0.2
        
        
    def __initialize_population(self):
        self.population = list()
        components = ['123456789', '+-*']
        
        for _ in range(self.population_size):
            individual = str()
            for _ in range(self.individual_length-1):
                individual = individual + random.choice(components[0])
                individual = individual + random.choice(components[1])
            individual = individual + random.choice(components[0])
            
            self.population.append(individual)

This may seem a bit daunting but hang in there!
We'll use the random library to generate random individuals, and randomly select those individuals to merge and mutate. Next up, the variable TARGET will be the number we'll try to express, this can be pretty much any number (as long as it's small enough to be expressed with a certain number of characters, for instance with only two digits we can't go higher than 81 = 9*9, with three digits we can't go higher than 729 = 9*9*9, etc...)
Our whole algorithm will revolve around a class named FindNumber, which will use different methods on a population to try and optimize it. This class only takes the target number as an argument, there are other class variables but those have to be changed manually. I'll quickly go over each variable :

  • mutation_probability represents the probability of an individual mutating each iteration. 0.05 is the same as 5%
  • individual_length represents the number of digits in an individual, for example, an individual with two digits could be something like 8-3 and an individual with four digits could be 9*5+9*2
  • population_size means how many individuals will there be in the population
  • keep_population represents what proportion of the population will be kept for the next generation. 0.2 is the same as 20%

The private method __initialize_population() creates a class variable called population containing each individual in a list. The length of the population and each individual depends on the values chosen for the variable above.

Evaluating the fitness and sorting the population

    def __fitness(self):
        self.population_fitness = list()
        
        for individual in self.population:
            value = eval(individual)
            distance = abs(self.target-value)
            
            self.population_fitness.append(distance)
            
            
    def __sort_by_fitness(self):
        self.__fitness()
        ordered_population = zip(self.population_fitness, self.population)
        ordered_population = sorted(ordered_population)
        
        self.population = list()
        for individual in ordered_population:
            self.population.append(individual[1])

Still inside the FindNumber class, we create the __fitness() method that calculates the fitness of each individual and stores it in a list in the same order as the population. I'll give an example to explain it better:
say your target number is 23 and you have a population of three individuals stored like this: population = ["7+6*9", "1*1-5", "2+1-2"], then if we calculate the fitness of each individual we'd have respectively 38, 27, and 22. The __fitness() method would therefore create this list: population_fitness = [38, 27, 22].


The __sort_by_fitness() method's objective is to sort the population from lowest fitness (which is better) to highest fitness (which is worse). It starts by calling the previously defined __fitness() and then creates a list regrouping each individual with its fitness score. If we use the same example as before it would look like this: ordered_population = [(38, "7+6*9"), (27, "1*1-5), (22, "2+1-2")]. The fitness score needs to be before the individual because the sorted() function intrinsic to python will only look at the first element of each element in a list (read that twice!), which means that we can sort our population by fitness score. Once this list is ordered we extract each individual and re-create a population list, but this time in order. This means that still using our example list, the population would now be: population = ["2+1-2", "1*1-5", "7+6*9"].

Reproduction and mutation

    def __mutate(self, individual):
        components = ['123456789', '+-*']
        index_mutate = random.randint(0, len(individual)-1)
        
        if individual[index_mutate] in components[0]:
            new_character = random.choice(components[0])
        else:
            new_character = random.choice(components[1])
            
        new_individual = individual[:index_mutate] + new_character + individual[index_mutate+1:]
        return new_individual
        
    def __merge(self, ind1, ind2):
        half = len(ind1)//2
        return ind1[:half] + ind2[half:]
        
    def __new_population(self):
        keep_population = round(self.population_size*self.keep_population)
        self.population = self.population[:keep_population]
        
        while len(self.population) < self.population_size:
            ind1 = random.choice(self.population[:keep_population])
            ind2 = random.choice(self.population[:keep_population])
            individual = self.__merge(ind1, ind2)
            
            self.population.append(individual)

The __mutate() method takes an individual as argument and returns that individual with one changed digit or operation, for example, if we passed 5+2+4*8 through this method, we might get 2+2+4*8 or 5+2*4*8 or many other possibilities.
The __merge() method takes two individuals as arguments and returns an individual made up of halves of each parent. For example, taking "5+2+4*8" and "3*8-1+5" as arguments, we will get 5+2-1+5 or 3*8+4*8 depending on the order of the arguments.
Finally, we create a method called __new_population() that uses all methods created previously to make a new population out of the best individuals, the code is pretty straightforward here.

Finalizing our class

    def start(self):
        self.__initialize_population()
        self.__sort_by_fitness()
        
        average_fitness = sum(self.population_fitness)/self.population_size
        best_fitness = abs(self.target - eval(self.population[0]))
        
        return average_fitness, best_fitness
        
    def epoch(self):
        self.__new_population()
        if self.mutation_probability:
            for i, individual in enumerate(self.population):
                if random.random()<self.mutation_probability:
                    self.population[i] = self.__mutate(individual)
        self.__sort_by_fitness()
        
        average_fitness = sum(self.population_fitness)/self.population_size
        best_fitness = abs(self.target - eval(self.population[0]))
        
        return average_fitness, best_fitness

To make all of this work, we create 2 final methods (public this time) : start() and epoch.
start() creates the population and sorts it, it then calculates the average population fitness and the best fitness out of all the individuals and returns those values.
epoch() replaces the current population with the next generation and just like start(), it sorts it and returns the average and best fitness.

And before we run the code, this is what the final file looks like (notice I added a main function at the end):

import random

TARGET = 1234 #CHANGE THIS TO WHATEVER

class FindNumber:

    def __init__(self, target):
        self.target = target
        self.mutation_probability = 0.05
        self.individual_length = 7 #This represents the numbers in an individual, not total characters
        self.population_size = 1000
        self.keep_population = 0.2
        
        
    def __initialize_population(self):
        self.population = list()
        components = ['123456789', '+-*']
        
        for _ in range(self.population_size):
            individual = str()
            for _ in range(self.individual_length-1):
                individual = individual + random.choice(components[0])
                individual = individual + random.choice(components[1])
            individual = individual + random.choice(components[0])
            
            self.population.append(individual)
            
            
    def __fitness(self):
        self.population_fitness = list()
        
        for individual in self.population:
            value = eval(individual)
            distance = abs(self.target-value)
            
            self.population_fitness.append(distance)
            
            
    def __sort_by_fitness(self):
        self.__fitness()
        ordered_population = zip(self.population_fitness, self.population)
        ordered_population = sorted(ordered_population)
        
        self.population = list()
        for individual in ordered_population:
            self.population.append(individual[1])
            
    def __mutate(self, individual):
        components = ['123456789', '+-*']
        index_mutate = random.randint(0, len(individual)-1)
        
        if individual[index_mutate] in components[0]:
            new_character = random.choice(components[0])
        else:
            new_character = random.choice(components[1])
            
        new_individual = individual[:index_mutate] + new_character + individual[index_mutate+1:]
        return new_individual
        
    def __merge(self, ind1, ind2):
        half = len(ind1)//2
        return ind1[:half] + ind2[half:]
        
    def __new_population(self):
        keep_population = round(self.population_size*self.keep_population)
        self.population = self.population[:keep_population]
        
        while len(self.population) < self.population_size:
            ind1 = random.choice(self.population[:keep_population])
            ind2 = random.choice(self.population[:keep_population])
            individual = self.__merge(ind1, ind2)
            
            self.population.append(individual)
            
        
    def start(self):
        self.__initialize_population()
        self.__sort_by_fitness()
        
        average_fitness = sum(self.population_fitness)/self.population_size
        best_fitness = abs(self.target - eval(self.population[0]))
        
        return average_fitness, best_fitness
        
    def epoch(self):
        self.__new_population()
        if self.mutation_probability:
            for i, individual in enumerate(self.population):
                if random.random()<self.mutation_probability:
                    self.population[i] = self.__mutate(individual)
        self.__sort_by_fitness()
        
        average_fitness = sum(self.population_fitness)/self.population_size
        best_fitness = abs(self.target - eval(self.population[0]))
        
        return average_fitness, best_fitness 



def main(target):
    method = FindNumber(target)
    avg, best = method.start()
    print(f"#gen: 0 #average: {avg} #best: {best}")
    
    for i in range(100):
        avg, best = method.epoch()
        print(f"#gen: {i+1} #average: {avg} #best: {best}")
        if best==0:
            break
    
    print(f"{method.population[0]} = {eval(method.population[0])}")
            
if __name__ == '__main__':
    main(TARGET)
    input()

Running the code

I'm going to show what the results are for different individual lengths, going from longest to shortest :
Scroll down in the output to see how it expressed 1234. Also note that the result is random and if you run the program another time, it will probably find another way to express it.

  • individual length = 7
    Output : 6*6*4*9-8*7-6 = 1234
    #gen: 0 #average: 1283.497 #best: 45
    #gen: 1 #average: 1624.319 #best: 33
    #gen: 2 #average: 5443.837 #best: 9
    #gen: 3 #average: 11078.058 #best: 4
    #gen: 4 #average: 7854.75 #best: 4
    #gen: 5 #average: 9967.631 #best: 2
    #gen: 6 #average: 6061.968 #best: 2
    #gen: 7 #average: 10449.052 #best: 2
    #gen: 8 #average: 5800.774 #best: 2
    #gen: 9 #average: 6804.307 #best: 2
    #gen: 10 #average: 6410.493 #best: 2
    #gen: 11 #average: 7261.284 #best: 2
    #gen: 12 #average: 8808.024 #best: 2
    #gen: 13 #average: 3182.644 #best: 2
    #gen: 14 #average: 5649.5 #best: 2
    #gen: 15 #average: 5097.521 #best: 2
    #gen: 16 #average: 5472.465 #best: 2
    #gen: 17 #average: 370.246 #best: 2
    #gen: 18 #average: 336.828 #best: 0
    6*6*4*9-8*7-6 = 1234
  • individual length = 6
    Output : 9+5*7*1*5*7 = 1234
    #gen: 0 #average: 1303.262 #best: 11
    #gen: 1 #average: 1555.56 #best: 11
    #gen: 2 #average: 3538.95 #best: 3
    #gen: 3 #average: 2367.064 #best: 3
    #gen: 4 #average: 1944.92 #best: 3
    #gen: 5 #average: 1359.517 #best: 3
    #gen: 6 #average: 1172.492 #best: 3
    #gen: 7 #average: 965.504 #best: 3
    #gen: 8 #average: 932.787 #best: 3
    #gen: 9 #average: 651.214 #best: 3
    #gen: 10 #average: 807.595 #best: 3
    #gen: 11 #average: 1030.949 #best: 3
    #gen: 12 #average: 1221.126 #best: 1
    #gen: 13 #average: 103.367 #best: 0
    9+5*7*1*5*7 = 1234
  • individual length = 5
    Output : 7*5*5*7+9 = 1234
    #gen: 0 #average: 1203.101 #best: 145
    #gen: 1 #average: 1204.109 #best: 5
    #gen: 2 #average: 1392.504 #best: 5
    #gen: 3 #average: 1017.989 #best: 5
    #gen: 4 #average: 860.794 #best: 5
    #gen: 5 #average: 725.785 #best: 5
    #gen: 6 #average: 670.127 #best: 5
    #gen: 7 #average: 628.187 #best: 5
    #gen: 8 #average: 405.353 #best: 5
    #gen: 9 #average: 318.836 #best: 5
    #gen: 10 #average: 188.848 #best: 1
    #gen: 11 #average: 30.342 #best: 0
    7*5*5*7+9 = 1234
  • individual length = 4
    Output : 5*7*5*7 = 1225
    #gen: 0 #average: 1204.587 #best: 26
    #gen: 1 #average: 1062.097 #best: 9
    #gen: 2 #average: 737.667 #best: 9
    #gen: 3 #average: 394.063 #best: 9
    #gen: 4 #average: 286.486 #best: 9
    #gen: 5 #average: 207.212 #best: 9
    #gen: 6 #average: 98.09 #best: 9
    #gen: 7 #average: 81.662 #best: 9
    #gen: 8 #average: 76.739 #best: 9
    #gen: 9 #average: 64.103 #best: 9
    #gen: 10 #average: 56.901 #best: 9
    #gen: 11 #average: 72.082 #best: 9
    #gen: 12 #average: 63.886 #best: 9
    #gen: 13 #average: 71.025 #best: 9
    #gen: 14 #average: 56.678 #best: 9
    #gen: 15 #average: 71.073 #best: 9
    #gen: 16 #average: 64.129 #best: 9
    #gen: 17 #average: 77.108 #best: 9
    #gen: 18 #average: 79.855 #best: 9
    #gen: 19 #average: 103.666 #best: 9
    #gen: 20 #average: 117.171 #best: 9
    #gen: 21 #average: 134.503 #best: 9
    #gen: 22 #average: 156.947 #best: 9
    #gen: 23 #average: 36.257 #best: 9
    #gen: 24 #average: 38.516 #best: 9
    #gen: 25 #average: 49.014 #best: 9
    #gen: 26 #average: 36.746 #best: 9
    #gen: 27 #average: 39.686 #best: 9
    #gen: 28 #average: 37.657 #best: 9
    #gen: 29 #average: 41.802 #best: 9
    #gen: 30 #average: 32.015 #best: 9
    #gen: 31 #average: 38.399 #best: 9
    #gen: 32 #average: 35.048 #best: 9
    #gen: 33 #average: 40.86 #best: 9
    #gen: 34 #average: 32.855 #best: 9
    #gen: 35 #average: 32.892 #best: 9
    #gen: 36 #average: 35.576 #best: 9
    #gen: 37 #average: 41.061 #best: 9
    #gen: 38 #average: 39.714 #best: 9
    #gen: 39 #average: 48.719 #best: 9
    #gen: 40 #average: 38.182 #best: 9
    #gen: 41 #average: 42.599 #best: 9
    #gen: 42 #average: 40.68 #best: 9
    #gen: 43 #average: 38.966 #best: 9
    #gen: 44 #average: 39.667 #best: 9
    #gen: 45 #average: 47.506 #best: 9
    #gen: 46 #average: 40.718 #best: 9
    #gen: 47 #average: 38.979 #best: 9
    #gen: 48 #average: 46.27 #best: 9
    #gen: 49 #average: 32.961 #best: 9
    #gen: 50 #average: 41.688 #best: 9
    #gen: 51 #average: 36.572 #best: 9
    #gen: 52 #average: 46.256 #best: 9
    #gen: 53 #average: 44.951 #best: 9
    #gen: 54 #average: 42.005 #best: 9
    #gen: 55 #average: 43.246 #best: 9
    #gen: 56 #average: 26.846 #best: 9
    #gen: 57 #average: 34.736 #best: 9
    #gen: 58 #average: 43.15 #best: 9
    #gen: 59 #average: 36.646 #best: 9
    #gen: 60 #average: 43.788 #best: 9
    #gen: 61 #average: 33.342 #best: 9
    #gen: 62 #average: 46.356 #best: 9
    #gen: 63 #average: 42.165 #best: 9
    #gen: 64 #average: 45.391 #best: 9
    #gen: 65 #average: 41.978 #best: 9
    #gen: 66 #average: 40.736 #best: 9
    #gen: 67 #average: 52.285 #best: 9
    #gen: 68 #average: 37.754 #best: 9
    #gen: 69 #average: 35.377 #best: 9
    #gen: 70 #average: 36.597 #best: 9
    #gen: 71 #average: 41.606 #best: 9
    #gen: 72 #average: 35.752 #best: 9
    #gen: 73 #average: 31.369 #best: 9
    #gen: 74 #average: 41.644 #best: 9
    #gen: 75 #average: 42.229 #best: 9
    #gen: 76 #average: 28.423 #best: 9
    #gen: 77 #average: 40.902 #best: 9
    #gen: 78 #average: 37.607 #best: 9
    #gen: 79 #average: 43.581 #best: 9
    #gen: 80 #average: 26.735 #best: 9
    #gen: 81 #average: 34.304 #best: 9
    #gen: 82 #average: 43.786 #best: 9
    #gen: 83 #average: 41.965 #best: 9
    #gen: 84 #average: 31.633 #best: 9
    #gen: 85 #average: 42.292 #best: 9
    #gen: 86 #average: 30.567 #best: 9
    #gen: 87 #average: 29.47 #best: 9
    #gen: 88 #average: 41.595 #best: 9
    #gen: 89 #average: 44.918 #best: 9
    #gen: 90 #average: 40.125 #best: 9
    #gen: 91 #average: 36.224 #best: 9
    #gen: 92 #average: 33.657 #best: 9
    #gen: 93 #average: 36.966 #best: 9
    #gen: 94 #average: 46.021 #best: 9
    #gen: 95 #average: 33.586 #best: 9
    #gen: 96 #average: 43.349 #best: 9
    #gen: 97 #average: 38.567 #best: 9
    #gen: 98 #average: 39.415 #best: 9
    #gen: 99 #average: 50.545 #best: 9
    #gen: 100 #average: 36.911 #best: 9
    5*7*5*7 = 1225
    
  • individual length = 3
    Output : 9*9*9 = 729
    #gen: 0 #average: 1211.794 #best: 667
    #gen: 1 #average: 1111.828 #best: 586
    #gen: 2 #average: 894.121 #best: 505
    #gen: 3 #average: 726.419 #best: 505
    #gen: 4 #average: 610.22 #best: 505
    #gen: 5 #average: 548.688 #best: 505
    #gen: 6 #average: 522.487 #best: 505
    #gen: 7 #average: 526.195 #best: 505
    #gen: 8 #average: 522.154 #best: 505
    #gen: 9 #average: 520.039 #best: 505
    #gen: 10 #average: 529.939 #best: 505
    #gen: 11 #average: 518.14 #best: 505
    #gen: 12 #average: 524.305 #best: 505
    #gen: 13 #average: 524.953 #best: 505
    #gen: 14 #average: 522.775 #best: 505
    #gen: 15 #average: 522.109 #best: 505
    #gen: 16 #average: 516.412 #best: 505
    #gen: 17 #average: 520.336 #best: 505
    #gen: 18 #average: 518.419 #best: 505
    #gen: 19 #average: 527.095 #best: 505
    #gen: 20 #average: 522.487 #best: 505
    #gen: 21 #average: 517.933 #best: 505
    #gen: 22 #average: 517.249 #best: 505
    #gen: 23 #average: 525.862 #best: 505
    #gen: 24 #average: 520.021 #best: 505
    #gen: 25 #average: 523.405 #best: 505
    #gen: 26 #average: 524.161 #best: 505
    #gen: 27 #average: 523.279 #best: 505
    #gen: 28 #average: 526.438 #best: 505
    #gen: 29 #average: 522.064 #best: 505
    #gen: 30 #average: 525.502 #best: 505
    #gen: 31 #average: 522.289 #best: 505
    #gen: 32 #average: 520.057 #best: 505
    #gen: 33 #average: 527.095 #best: 505
    #gen: 34 #average: 524.35 #best: 505
    #gen: 35 #average: 520.444 #best: 505
    #gen: 36 #average: 529.021 #best: 505
    #gen: 37 #average: 530.56 #best: 505
    #gen: 38 #average: 525.79 #best: 505
    #gen: 39 #average: 527.671 #best: 505
    #gen: 40 #average: 530.812 #best: 505
    #gen: 41 #average: 530.272 #best: 505
    #gen: 42 #average: 525.322 #best: 505
    #gen: 43 #average: 516.799 #best: 505
    #gen: 44 #average: 524.422 #best: 505
    #gen: 45 #average: 520.003 #best: 505
    #gen: 46 #average: 527.797 #best: 505
    #gen: 47 #average: 520.354 #best: 505
    #gen: 48 #average: 520.633 #best: 505
    #gen: 49 #average: 525.682 #best: 505
    #gen: 50 #average: 524.701 #best: 505
    #gen: 51 #average: 525.88 #best: 505
    #gen: 52 #average: 527.383 #best: 505
    #gen: 53 #average: 525.34 #best: 505
    #gen: 54 #average: 525.664 #best: 505
    #gen: 55 #average: 525.457 #best: 505
    #gen: 56 #average: 525.43 #best: 505
    #gen: 57 #average: 520.21 #best: 505
    #gen: 58 #average: 522.163 #best: 505
    #gen: 59 #average: 529.309 #best: 505
    #gen: 60 #average: 526.708 #best: 505
    #gen: 61 #average: 524.035 #best: 505
    #gen: 62 #average: 525.943 #best: 505
    #gen: 63 #average: 527.797 #best: 505
    #gen: 64 #average: 526.663 #best: 505
    #gen: 65 #average: 525.115 #best: 505
    #gen: 66 #average: 518.401 #best: 505
    #gen: 67 #average: 521.083 #best: 505
    #gen: 68 #average: 530.839 #best: 505
    #gen: 69 #average: 521.452 #best: 505
    #gen: 70 #average: 529.858 #best: 505
    #gen: 71 #average: 524.494 #best: 505
    #gen: 72 #average: 526.546 #best: 505
    #gen: 73 #average: 524.521 #best: 505
    #gen: 74 #average: 528.625 #best: 505
    #gen: 75 #average: 525.34 #best: 505
    #gen: 76 #average: 526.843 #best: 505
    #gen: 77 #average: 521.173 #best: 505
    #gen: 78 #average: 519.643 #best: 505
    #gen: 79 #average: 530.776 #best: 505
    #gen: 80 #average: 522.649 #best: 505
    #gen: 81 #average: 525.52 #best: 505
    #gen: 82 #average: 525.853 #best: 505
    #gen: 83 #average: 526.339 #best: 505
    #gen: 84 #average: 517.672 #best: 505
    #gen: 85 #average: 519.517 #best: 505
    #gen: 86 #average: 522.901 #best: 505
    #gen: 87 #average: 521.92 #best: 505
    #gen: 88 #average: 522.289 #best: 505
    #gen: 89 #average: 520.606 #best: 505
    #gen: 90 #average: 525.196 #best: 505
    #gen: 91 #average: 527.023 #best: 505
    #gen: 92 #average: 521.011 #best: 505
    #gen: 93 #average: 523.153 #best: 505
    #gen: 94 #average: 521.479 #best: 505
    #gen: 95 #average: 528.625 #best: 505
    #gen: 96 #average: 524.512 #best: 505
    #gen: 97 #average: 520.984 #best: 505
    #gen: 98 #average: 518.257 #best: 505
    #gen: 99 #average: 521.641 #best: 505
    #gen: 100 #average: 526.663 #best: 505
    9*9*9 = 729

We can observe, that for the last two test runs (lengths of 3 and 4) the algorithm is unable to find a solution and just tries to minimize the fitness score. We could give it a lower number to see it succeed at a lower individual length.

If you reached the end, thanks a lot for reading and you can subscribe to my newsletter, it's free and keeps me motivated!