Selecting the optimal parameters for machine learning tasks is challenging. Some results may be bad not because the data is noisy or because the learning algorithm used is weak, but because of poor selection of parameter values. This article provides a brief introduction to Evolutionary Algorithms (EAs) and describes the Genetic Algorithm (GA), which is one of the simplest Random EAs.
Suppose a data scientist has an image dataset divided into several classes and an image classifier needs to be created. After the data scientist has investigated the data set, K-Nearest Neighbor (KNN) seems to be a good fit. To use the KNN algorithm, there is one important parameter to use, which is K. Suppose an initial value of 3 is selected. The scientist starts the learning process of the KNN algorithm with K=3 selected. The generated trained model achieved a classification accuracy of 85%. Is this percentage acceptable? If not, can we achieve better classification accuracy than we currently have? We cannot say that 85% is the best precision that can be achieved by performing different experiments. But to do another experiment, we definitely need to change something in the experiment, like changing the K value used in the KNN algorithm. We cannot definitively say that 3 is the best value to use in this experiment unless we try applying different values for K and observe how the classification accuracy varies. The question is "how to find the best value for K that maximizes the classification performance?" This is what is called optimization.
In optimization, we start with some sort of initial value for the variables used in the experiment. Since these values may not be the best to use, we need to change them until we get the best ones. In some cases, these values are generated by complex functions that we cannot easily solve manually. But it is very important to do optimization because a classifier can produce a bad classification precision not because, for example, the data is noisy or the learning algorithm used is weak, but because of the bad selection of the initial learning values. parameters As a result, there are different optimization techniques suggested by Operations Research (OP) researchers to make this optimization work. According to [1], optimization techniques are classified into four main categories:
- constrained optimization
- Multimodal optimization
- Multi-objective optimization
- Combinatorial Optimization
By looking at various natural species, we can see how they evolve and adapt to their environments. We can take advantage of such existing natural systems and their natural evolution to create our own artificial systems doing the same job. This is called bionics. For example, the airplane is based on how birds fly, the radar is based on bats, the submarine is invented on the basis of fish, etc. As a result, the principles of some optimization algorithms come from nature. For example, the Genetic Algorithm (GA) takes its central idea from Charles Darwin's theory of natural evolution, "survival of the fittest."
Before going into the details of how GA works, we can have a general idea about Evolutionary Algorithms (EAs).
We can say that the optimization is done by evolutionary algorithms (EA). The difference between traditional algorithms and EAs is that EAs are not static but dynamic as they can evolve over time.
Evolutionary algorithms have three main characteristics:
- population based: Evolutionary algorithms are to optimize a process in which the current solutions are bad to generate new better solutions. The set of current solutions from which new solutions will be generated is called the population.
- fitness oriented: If there are several solutions, how can you say that one solution is better than another? There is a fitness value associated with each individual solution calculated from a fitness function. Such a fitness value reflects how good the solution is.
- driven by variation: If there is no acceptable solution in the current population according to the calculated fitness function of each individual, we must do something to generate new, better solutions. As a result, the individual solutions will undergo a series of variations to generate new solutions.
We will move to GA and apply these terms.
The genetic algorithm is a classical evolutionary algorithm with a random basis. By random here we mean that, to find a solution using GA, random changes are applied to current solutions to generate new ones. Please note that GA may be called Simple GA (SGA) due to its simplicity compared to other EAs.
GA is based on Darwin's theory of evolution. It is a slow and gradual process that goes from making changes to making light and slow changes. Also, GA slowly makes small changes to its solutions until it gets the best solution.
Here is the description of how GA works:
GA works with a population consisting of some solutions where the population size (popsize) is the number of solutions. Each solution is called individual. Each individual solution has a chromosome. The chromosome is represented as a set of parameters (characteristics) that define the individual. Each chromosome has a set of genes. Each gene is represented in some way, as a sequence of 0's and 1's, as shown in figure 1.
Furthermore, each individual has a fitness value. To select the best individuals, a fitness function is used. The result of the fitness function is the fitness value that represents the quality of the solution. The higher the fitness value, the higher the quality of the solution. Selection of the best individuals based on their quality is applied to generate what is called a mating pool, where the highest quality individual is most likely to be selected in the mating pool.
The individuals in the mating group are called parents. Every two parents selected from the mating group will produce two offspring (sons). By mating only high quality individuals, one hopes to get better quality offspring than their parents. This will prevent the bad guys from spawning more bad guys. By continuing to select and breed high-quality individuals, there will be a better chance of keeping only the good properties of the individuals and leaving out the bad ones. Ultimately, this will result in the desired optimal or acceptable solution.
But the offspring currently generated using the selected parents only have the traits of their parents and nothing else unchanged. No new ones are added to it, and therefore the same disadvantages in their parents will exist in the new offspring. To overcome this problem, some changes will be applied to each offspring to create new individuals. The set of all newly generated individuals will be the new population that replaces the old population previously used. Each population created is called a generation. The process of replacing the old population with the new is called substitution. Figure 2 summarizes the GA steps.
There are two questions that need to be answered to get a complete idea about GA:
- How are the two offspring generated from the two parents?
- How does each child change slightly to be an individual?
We will answer these questions later.
There are different representations available for the chromosome and the selection of the appropriate representation is problem specific. A good representation is what makes the search space smaller and therefore easier to find.
The available representations for the chromosome, including:
· Tracks: Each chromosome is represented as a string of 0's and 1's.
· Permutation: Useful for ordering problems such as the traveling salesman problem.
· Valor: The actual value is encoded as is.
For example, if we want to encode the number 7 in binary, it might look like Figure 3.
Each part of the chromosome above is called a gene. Each gene has two properties. The first is its value (allele) and the second is the location (locus) within the chromosome, which is the number above its value.
Each chromosome has two representations.
- genotype: The set of genes that represent the chromosome.
- phenotype: The actual physical representation of the chromosome.
In the example above, the binary of 0111 is the genotype and 7 is the representation of the phenotype.
After representing each chromosome in the correct way to serve as a search in space, the next step is to calculate the fitness value of each individual. Suppose the fitness function used in our example is:
f(x)=2x+2
where x is the chromosome value
Then the fitness value of the previous chromosome is:
f(7)=2(7)+2=16
The process of calculating the fitness value of a chromosome is called evaluation.
Once you understand how to represent each individual, the next step is to initialize the population by selecting the appropriate number of individuals within it.
Then select a number of individuals from the population in the mating group. Based on the previously calculated fitness value, the best individuals are selected based on a threshold. After this step, we will finish selecting a subset of the population from the mating group.
Based on the individuals selected from the mating group, the parents are selected for mating. The selection of each parent can be done by selecting the parents sequentially (1–2, 3–4, etc.). Another way is the random selection of parents.
For every two parents selected, there are several variation operators to apply, such as:
- Crossover (recombination)
- Mutation
Figure 4 gives an example for these operators.
The cross in GA generates a new generation equal to the natural mutation. By mutating the parents of the previous generation, the offspring of the new generation carry genes from both parents. The number of genes each parent carries is random. Remember that GA is a random EA. Sometimes offspring take half their genes from one parent and half from the other parent, and sometimes that percentage changes. For every two parents, crossing over occurs by selecting a random point on the chromosome and swapping genes before and after that point in their parents. The resulting chromosomes are offspring. Therefore, the operator is called a single point crossover.
Keep in mind that mating is important and without it, the offspring will be identical to the parent.
The next variation operator is mutation. For each offspring, select a few genes and change their value. The mutation varies depending on the chromosome representation, but it is up to you to decide how to apply the mutation. If the encoding is binary (that is, the value space of each gene has only two values 0 and 1), invert the bit value of one or more genes.
But if the gene value comes from a space of more than two values, such as 1, 2, 3, 4, and 5, the binary mutation is not applicable and we must find another way. One way is to select a random value from such a set of values, as shown in Figure 5.
Note that without a mutation, the offspring will have all the properties of its parents. To add new features to this offspring, a mutation was produced. But since the mutation occurs randomly, it is not recommended to increase the number of genes to be applied to the mutation.
The individual after the mutation is called a mutant.
[1] Eiben, Agoston E., and James E. Smith. Introduction to evolutionary computation. vol. 53. Heidelberg: Springer, 2003.
The original article is available on LinkedIn at this link:https://www.linkedin.com/pulse/introduction-optimization-genetic-algorithm-ahmed-gad
It is also shared on KDnuggets at this link:https://www.kdnuggets.com/2018/03/introduction-optimization-with-genetic-algorithm.html
To contact the author:
Linkedin:https://linkedin.com/in/ahmedfgad
Email: ahmed.f.gad@gmail.com
FAQs
What is optimization in genetic algorithm? ›
Optimization problems. In a genetic algorithm, a population of candidate solutions (called individuals, creatures, organisms, or phenotypes) to an optimization problem is evolved toward better solutions.
How genetic algorithm can be used for optimization problems? ›The genetic algorithm solves optimization problems by mimicking the principles of biological evolution, repeatedly modifying a population of individual points using rules modeled on gene combinations in biological reproduction.
What is the introduction for genetic algorithm? ›A genetic algorithm begins with a randomly chosen assortment of chromosomes, which serves as the first generation (initial population). Then each chromosome in the population is evaluated by the fitness function to test how well it solves the problem at hand.
Why genetic algorithm is good for optimization? ›The genetic algorithm (GA) is a search heuristic that is routinely used to generate useful solutions for optimization and search problems. It generates solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover.
What are the four steps of optimization? ›The conversion optimization process has four main steps: research, testing, implementation, and analysis.
What are the three elements of optimization? ›Every optimization problem has three components: an objective function, decision variables, and constraints.
Is genetic algorithm an optimization technique? ›The most commonly used optimization strategy are Genetic Algorithms. Genetic Algorithms are based off of Darwin's theory of natural selection. It is relatively easy to implement and there is a lot of flexibility for the setup of the algorithm so that it can be applied to a wide range of problems.
How is genetic algorithm different from other optimization techniques? ›The main difference between genetic algorithm and traditional algorithm is that genetic algorithm is a type of algorithm that is based on the principle of genetics and natural selection to solve optimization problems while traditional algorithm is a step by step procedure to follow, in order to solve a given problem.
What is an example of genetic algorithm in real life? ›Traveling salesman problem (TSP)
TSP or traveling salesman problem is one of the real-life combinatorial optimization problems that were solved using genetic optimization. It helps in finding an optimal way in a given map with the distance between two points and with the routes to be covered by the salesman.
What are the two main features of Genetic Algorithm? Explanation: Fitness function helps choosing individuals from the population and Crossover techniques defines the offspring generated.
How does genetic algorithm work in AI? ›
In computing terms, a genetic algorithm implements the model of computation by having arrays of bits or characters (binary string) to represent the chromosomes. Each string represents a potential solution. The genetic algorithm then manipulates the most promising chromosomes searching for improved solutions.
Which method is best for optimization? ›The gradient descent method is the most popular optimisation method. The idea of this method is to update the variables iteratively in the (opposite) direction of the gradients of the objective function.
How is optimization used in real life? ›In our daily lives, we benefit from the application of Mathematical Optimization algorithms. They are used, for example, by GPS systems, by shipping companies delivering packages to our homes, by financial companies, airline reservations systems, etc.
What are the major disadvantages of genetic algorithm? ›- Genetic algorithms are often criticized for being too slow. There are several disadvantages of using genetic algorithms. ...
- They can be expensive to implement. ...
- They can be difficult to understand. ...
- They can be difficult to debug. ...
- They can be difficult to optimize.
Linear and Nonlinear Optimization
There are many different optimization, or “solving,” methods, some better suited to different types of problems than others. Linear solving methods include techniques known as tabu search, linear programming, and scatter search. Nonlinear solving methods include genetic algorithms.
The concept of optimization is considered as a main goal for a decision maker (DM) to achieve the best solution or most favorable set of solutions of one or more given criteria.
What is the easiest way to solve optimization problems? ›- Identify what is to be maximized or minimized and what the constraints are.
- Draw a diagram (if appropriate) and label it.
- Decide what the variables are and in what units their values are being measured in. ...
- Write a formula for the function that is to be maximized or minimized.
Five Steps to Solve Optimization Problems
It is: visualize the problem, define the problem, write an equation for it, find the minimum or maximum for the problem (usually the derivatives or end-points) and answer the question.
We can distinguish between two different types of optimization methods: Exact optimization methods that guarantee finding an optimal solution and heuristic optimization methods where we have no guarantee that an optimal solution is found.
What are the principles of optimization? ›The optimization principle states that the entity will act so as to maximize the value of a specific combination of abstract functions. When we specify what those functions are, we can get different specific scientific laws.
What are the five phases in genetic algorithm? ›
This is the flow chart of genetic algorithm including some basic steps of population initialization, fitness calculation, selection, crossover and mutation. I will start with population initialization and fitness calculation. At first we have to initialize a population of chromosomes.
Are genetic algorithms still used? ›Genetic algorithms are one of the most fundamental algorithms in computer science. Consequently, they have found many applications in the real world in different industries and for different tasks.
What is better than genetic algorithm? ›Let's examine the cases when neural networks can be an efficient choice over genetic algorithms. When there is a function approximation problem with continuous data, a neural network is the best to use.
Which algorithm is used for solving optimization problems? ›The genetic algorithm is a method for solving optimization problems.
What is the most common application area of genetic algorithms? ›Optimization − Genetic Algorithms are most commonly used in optimization problems wherein we have to maximize or minimize a given objective function value under a given set of constraints. The approach to solve Optimization problems has been highlighted throughout the tutorial.
What are the advantages and disadvantages of genetic algorithm? ›- Pros: (1) Faster than other algorithms. (2) Easier. ...
- Cons: (1) The random heuristics sometimes doesn't find the optimum. (2) It is not a complete algorithm (not always the algorithm finds a suitable solution).
Four types of Genetic Algorithms (GA) are presented - Generational GA (GGA), Steady-State (µ + 1)-GA (SSGA), Steady-Generational (µ, µ)-GA (SGGA), and (µ + µ)-GA. Based on 30 runs of the best performing EC variants (a total of 12), each crossover method for each type of GA is divided into its equivalent classes.
What are the applications of genetic algorithm? ›- Clustering, using genetic algorithms to optimize a wide range of different fit-functions.
- Multidimensional systems.
- Multimodal Optimization.
- Multiple criteria production scheduling.
- Multiple population topologies and interchange methodologies.
- Mutation testing.
C. There are several variations of the C programming language, including C, C++, and C#. C is a machine programming language used to develop automated programs to make life easier for the SEO professional.
What is the first step of optimization? ›- Identify/map it out. To know what you have to improve, you first must identify the places for improvement. ...
- Rethink. Business process optimisation can be achieved through the usage of existing resources. ...
- Analyse. ...
- Automate. ...
- Monitor.
Which tool is commonly used in optimization? ›
The Optimization tool solves linear programming (LP), mixed-integer linear programming (MILP), and quadratic programming (QP) optimization problems using matrix, manual, and file input modes. This tool uses the R tool.
What is the benefit of optimization? ›You work more efficiently
Process optimization leads to working more efficiently. You eliminate unnecessary steps and automate steps in the process to save time, reduce errors and avoid duplicate work.
- Optimization enables the free flow of data through the optimal usage of network system resources.
- Optimization tracks performance metrics, providing real-time reporting to help network managers proactively manage the network.
The purpose of optimization is to achieve the “best” design relative to a set of prioritized criteria or constraints. These include maximizing factors such as productivity, strength, reliability, longevity, efficiency, and utilization.
What is optimization explain? ›: an act, process, or methodology of making something (such as a design, system, or decision) as fully perfect, functional, or effective as possible. specifically : the mathematical procedures (such as finding the maximum of a function) involved in this.
What is optimization in bioinformatics? ›Sequence alignment in bioinformatics is an optimization problem that identifies and quantifies the degree of similarity between two sequences. It is a central problem in this field because of its power to elucidate the location, structure and function of genes.
What does it mean to optimize a reaction? ›Optimizing chemical reactions is a very common task for chemists. It usually aims at maximizing the yield or selectivity of a reaction in order to get the most possible product from some raw material.
Which is the simplest optimization algorithm? ›Ananya algorithm is one of the simplest optimization algorithms to implement, among all optimization techniques. This algorithm has only two candidates hence it avoids large calculations. This algorithm moves towards a better solution with the difference between the mean of variables and the best variable.
What are two types of optimisation? ›We can distinguish between two different types of optimization methods: Exact optimization methods that guarantee finding an optimal solution and heuristic optimization methods where we have no guarantee that an optimal solution is found.