The aim of this project is to create a complete method for training artificial neural networks using genetic algorithms (GAs) and evolutionary algorithms (EAs) as alternatives to the classical backpropagation approach. These optimization methods, inspired by natural evolution, involve the iterative improvement of a population of candidate solutions in order to find the best possible solution.
In this project, we will explore the use of GAs and EAs to optimize ANNs in various ways, including the evolution of connection weights, architectures, hyperparameters, activation functions, and learning rules. While these techniques have several advantages over traditional derivative-based optimization methods, they can be computationally expensive and the error landscape for ANNs can be complex and noisy.
To evaluate the performance of each method, we will work with synthetic data, which allows us to control the sample size, the amount of noise, and the difficulty of the problem. Specifically, we will create a classification problem and compare the results obtained with each method.
These are the libraries we will use throughout the project:
library(GA)
## Loading required package: foreach
## Loading required package: iterators
## Package 'GA' version 3.2.3
## Type 'citation("GA")' for citing this R package in publications.
##
## Attaching package: 'GA'
## The following object is masked from 'package:utils':
##
## de
library(cmaesr)
## Loading required package: ParamHelpers
## Loading required package: BBmisc
##
## Attaching package: 'BBmisc'
## The following object is masked from 'package:base':
##
## isFALSE
## Loading required package: checkmate
## Loading required package: smoof
library(nnet)
library(ggplot2)
library(mlbench)
library(tictoc)
library(caret)
## Loading required package: lattice
We will also set the seed:
set.seed(1)
In the following code snippet, we show how we can create synthetic data. These are the parameters we use:
Size = n
Hardness = d (dimensionality)
Noise = sd (standard deviation)
data <- mlbench.simplex(n=1000,d=2, sd=0.25)
plot(data)
data <- mlbench.simplex(n=1000,d=2, sd=0.5)
plot(data)
data <- mlbench.simplex(n=1000,d=2, sd=0.1)
plot(data)
The following function receives a dataframe and the percentage values we use to split the data into train, validation and test set. We will use 0.6% 0.2% 0.2%.
split_data <- function(df, perc_train, perc_val, perc_test) {
#compute sample size
ssTrain <- floor(perc_train*nrow(df))
ssVal <- floor(perc_val*nrow(df))
ssTest <- floor(perc_test*nrow(df))
#create random indices separation
idTrain <- sort(sample(seq_len(nrow(df)), size=ssTrain))
idnotTrain <- setdiff(seq_len(nrow(df)), idTrain)
idVal <- sort(sample(idnotTrain, size=ssVal))
idTest <- setdiff(idnotTrain, idVal)
splittedData <- list("train" = df[idTrain,], "validation" = df[idVal,], "test" = df[idTest,])
return (splittedData)
}
#USAGE
data <- mlbench.simplex(n=1000,d=2, sd=0.25)
data <- as.data.frame(data)
splittedData <- split_data(data, 0.6, 0.2, 0.2)
train_data <- splittedData$train
validation_data <- splittedData$validation
test_data <- splittedData$test
summary(train_data)
## x.1 x.2 classes
## Min. :-1.190644 Min. :-1.01688 1:148
## 1st Qu.:-0.388814 1st Qu.:-0.37066 2:154
## Median : 0.021266 Median :-0.12154 3:148
## Mean : 0.006736 Mean :-0.01402
## 3rd Qu.: 0.393153 3rd Qu.: 0.39190
## Max. : 1.071177 Max. : 1.19497
summary(validation_data)
## x.1 x.2 classes
## Min. :-1.11032 Min. :-0.90915 1:49
## 1st Qu.:-0.38421 1st Qu.:-0.38130 2:45
## Median :-0.04684 Median :-0.02484 3:56
## Mean :-0.02996 Mean : 0.05155
## 3rd Qu.: 0.31637 3rd Qu.: 0.52201
## Max. : 0.99123 Max. : 1.09855
summary(test_data)
## x.1 x.2 classes
## Min. :-1.0663598 Min. :-0.93273 1:53
## 1st Qu.:-0.3300275 1st Qu.:-0.41094 2:51
## Median : 0.0131126 Median :-0.20340 3:46
## Mean : 0.0007551 Mean :-0.04815
## 3rd Qu.: 0.3745331 3rd Qu.: 0.39461
## Max. : 0.8543368 Max. : 1.03875
This is an implementation of the classical Back-propagation technique to train artificial neural networks. The function returns a list with the trained model, the model’s accuracy on the training set, the model’s accuracy on the validation set, the model’s accuracy on the test set, the confusion matrix and the training time.
#Back propagation
train_BP <- function(dt, dimensions, n_units, max_it) {
#one hot encoder
dummy <- dummyVars(" ~ .", data=dt)
dt <- data.frame(predict(dummy, newdata=dt))
#split data
splittedData <- split_data(dt, 0.6, 0.2, 0.2)
train_dt <- splittedData$train
val_dt <- splittedData$validation
test_dt <- splittedData$test
target <- dimensions+1
input <- dimensions
tic('train') #init timer
#train nn with backpropagation technique
nn <- nnet(train_dt[,1:input], train_dt[,target:ncol(train_dt)], size = n_units, softmax=T, trace=F)
toc(log = T, quiet=T) #stop timer
log.lst <- tic.log(format = FALSE)
tic.clearlog()
#model evaluation
evaluate.CM <- function(true, pred) {
true <- max.col(true)
pred <- max.col(pred)
table(true, pred)
}
pred<-predict(nn, test_dt[,1:input])
confusionMat <- evaluate.CM(test_dt[,target:ncol(test_dt)], pred)
test_accuracy <- mean(max.col(test_dt[,target:ncol(test_dt)])==max.col(pred))
pred<-predict(nn, train_dt[,1:input])
train_accuracy <- mean(max.col(train_dt[,target:ncol(train_dt)])==max.col(pred))
pred<-predict(nn, val_dt[,1:input])
val_accuracy <- mean(max.col(val_dt[,target:ncol(val_dt)])==max.col(pred))
results <- list('nn' = nn, 'trainAccuracy' = train_accuracy, 'valAccuracy' = val_accuracy, 'testAccuracy' = test_accuracy, 'cm' = confusionMat, 'time' = unlist(lapply(log.lst, function(x) x$toc - x$tic))[1])
}
#USAGE
dim <- 3
data <- mlbench.simplex(n=1000, d=dim, sd=0.3)
plot(data)
dt <- as.data.frame(data)
#results <- train_BP(dt, dim, 50, 1000)
Accuracy of the algorithm for each data portion:
Confusion matrix:
Time:
This is an implementation of the GA technique to train artificial neural networks. Here’s an overview of how genetic algorithms work:
Initialize the population: The population of solutions, also known as chromosomes, is typically initialized randomly, with each solution representing a potential solution to the optimization problem.
Evaluate the fitness of each solution: We compute the fitness function of each of the solutions. Our goal is to optimize such fitness function.
Select the fittest solutions: The fittest solutions are selected from the population using a selection function. This function typically selects solutions with higher fitness values with higher probability.
Breed new solutions: New solutions are created by combining the selected solutions using genetic operators such as crossover (combining two solutions to create a new solution) and mutation (randomly changing the values of a solution).
Repeat the process: The process is repeated for a number of iterations until a satisfactory solution is found.
For the implementation of the genetic algorithm, it has been decided to use the ga() function, provided by the “GA” [1] library of R. In particular, we will use this algorithm to maximize a custom fitness function. This function will generate an MLP with the weights and number of hidden units determined by the genetic algorithm and will return the accuracy of this network in the training data. To represent this, we have decided to set a maximum number of hidden units for the neural network architecture. This way, the size of each individual is fixed and the first gene in the chain is used as the number of hidden units. The next N genes are used as weights of the neural network and the remaining ones are refused. Understandably, the number N of parameters will depend on the number of hidden units set in each iteration [3].
As before, the following function returns a list with the trained model, the model’s accuracy on the training set, the model’s accuracy on the validation set, the model’s accuracy on the test set, the confusion matrix and the training time.
train_GA <- function(dt, dimensions, max_it) {
#one hot encoder
dummy <- dummyVars(" ~ .", data=dt)
dt <- data.frame(predict(dummy, newdata=dt))
#split data
splittedData <- split_data(dt, 0.6, 0.2, 0.2)
train_dt <- splittedData$train
val_dt <- splittedData$validation
test_dt <- splittedData$test
target <- dimensions+1
input <- dimensions
n_features <- ncol(train_dt[,1:input])
n_classes <- ncol(train_dt[,target:ncol(train_dt)])
#fitness function based on train accuracy
fitness <- function(x) {
n_hidden <- floor(x[1]) # Number of neurons
num_param <- n_features * n_hidden + n_hidden * n_classes + n_hidden + n_classes
Wts <- x[(2:(num_param+1))] # Weights
nn <- nnet(train_dt[,1:input], train_dt[,target:ncol(train_dt)], size = n_hidden, Wts = Wts, maxit = 0, trace=F, softmax=T)
pred<-predict(nn, train_dt[,1:input])
accuracy <- mean(max.col(train_dt[,target:ncol(train_dt)])==max.col(pred))
return(accuracy)
}
# Bounds for number of neurons
neuron_bounds = c(2,50)
max_param <- n_features * neuron_bounds[2] + neuron_bounds[2] * n_classes + neuron_bounds[2] + n_classes
# Bounds for weights
bounds <- matrix(nrow = max_param, ncol=2)
for(row in 1:max_param){
bounds[row,] <- c(-100,100)
}
tic('train') #start timer
#train nn with GA technique
gann <- ga(type = "real-valued", fitness = fitness, maxFitness = 1, lower = c(neuron_bounds[1], bounds[, 1]), upper = c(neuron_bounds[2], bounds[, 2]), popSize = 50, maxiter = max_it, run = 100, monitor=F)
toc(log = T, quiet=T) #stop timer
log.lst <- tic.log(format = FALSE)
tic.clearlog()
n_units <- floor(head(gann@solution, 1)[1]) # Number of neurons
num_param <- n_features * n_units + n_units * n_classes + n_units + n_classes # Number of weights
W <- head(gann@solution, 1)[(2:(num_param+1))] # Weights
nn <- nnet(train_dt[,1:input], train_dt[,target:ncol(train_dt)], size = n_units, Wts = W, maxit = 0, trace=F, softmax=T) #Result NN
#Model evaluation
evaluate.CM <- function(true, pred) {
true <- max.col(true)
pred <- max.col(pred)
table(true, pred)
}
pred<-predict(nn, test_dt[,1:input])
confusionMat <- evaluate.CM(test_dt[,target:ncol(test_dt)], pred)
test_accuracy <- mean(max.col(test_dt[,target:ncol(test_dt)])==max.col(pred))
pred<-predict(nn, train_dt[,1:input])
train_accuracy <- mean(max.col(train_dt[,target:ncol(train_dt)])==max.col(pred))
pred<-predict(nn, val_dt[,1:input])
val_accuracy <- mean(max.col(val_dt[,target:ncol(val_dt)])==max.col(pred))
results <- list('ga' = gann, 'nn' = nn, 'trainAccuracy' = train_accuracy, 'valAccuracy' = val_accuracy, 'testAccuracy' = test_accuracy, 'cm' = confusionMat, 'time' = unlist(lapply(log.lst, function(x) x$toc - x$tic))[1])
}
#USAGE
dim <- 2
data <- mlbench.simplex(n=1000, d=dim, sd=0.3)
plot(data)
dt <- as.data.frame(data)
#results <- train_GA(dt, dim, 1000)
#plot(results$ga)
Accuracy of the algorithm for each data portion:
Confusion matrix:
Time:
Summary:
One key difference between GAs and EAs is the way in which they represent candidate solutions. In GAs, candidate solutions are typically represented as strings of symbols, called chromosomes, which encode the potential solutions to a problem. These chromosomes can be modified through processes such as crossover (exchange of genetic material between two chromosomes) and mutation (random modification of a chromosome). In EAs, candidate solutions can be represented in a variety of ways, including as numerical values, vectors, or even complex objects such as neural networks.
Another difference between GAs and EAs is the way in which they explore the space of possible solutions. GAs typically use a fixed set of operations, such as crossover and mutation, to generate new candidates solutions. EAs, on the other hand, can use a wider range of techniques, such as mutation, crossover, selection and recombination to generate and improve candidate solutions.
This is an implementation of the EA technique to train artificial neural networks. In this case, it has been decided to use the “cmaesr” library [2]. The implementation of the algorithm in this section is very similar to the genetic algorithm described in the previous section. The main difference is that the Covariance Matrix Adaptation Evolution Strategy minimizes the objective function [4]. For this reason, in this case the optimization function returns the error of the neural network instead of the accuracy. As before, the function returns a list with the trained model, the model’s accuracy on the training set, the model’s accuracy on the validation set, the model’s accuracy on the test set, the confusion matrix and the training time.
train_EA <- function(dt, dimensions, max_it) {
dummy <- dummyVars(" ~ .", data=dt)
dt <- data.frame(predict(dummy, newdata=dt))
#split data
splittedData <- split_data(dt, 0.6, 0.2, 0.2)
train_dt <- splittedData$train
val_dt <- splittedData$validation
test_dt <- splittedData$test
target <- dimensions+1
input <- dimensions
n_features <- ncol(train_dt[,1:input])
n_classes <- ncol(train_dt[,target:ncol(train_dt)])
# fitness function based on error
fitness <- function(x) {
n_hidden <- floor(x[1]) # Number of neurons
num_param <- n_features * n_hidden + n_hidden * n_classes + n_hidden + n_classes
Wts <- x[(2:(num_param+1))] # Weights
nn <- nnet(train_dt[,1:input], train_dt[,target:ncol(train_dt)], size = n_hidden, Wts = Wts, maxit = 0, trace=F, softmax=T)
pred<-predict(nn, train_dt[,1:input])
accuracy <- mean(max.col(train_dt[,target:ncol(train_dt)])==max.col(pred))
return(1-accuracy)
}
# Bounds for number of neurons
neuron_bounds = c(2,50)
max_param <- n_features * neuron_bounds[2] + neuron_bounds[2] * n_classes + neuron_bounds[2] + n_classes
# Bounds for weights
bounds <- matrix(nrow = max_param, ncol=2)
for(row in 1:max_param){
bounds[row,] <- c(-100,100)
}
# custom objective function
obj.fn = makeSingleObjectiveFunction(
name = "fitness",
fn = fitness,
par.set = makeNumericParamSet("x", len = max_param+1, lower = c(neuron_bounds[1], bounds[, 1]), upper = c(neuron_bounds[2], bounds[, 2]))
)
tic('train') #start timer
#train nn with ea technique
ea = cmaes(
obj.fn,
monitor = NULL,
control = list(
sigma = 1.5, # initial step size
lambda = 50, # number of offspring
stop.ons = c(
list(stopOnMaxIters(max_it),stopOnOptValue(0, tol = 1e-08)), # stop after 100 iterations
getDefaultStoppingConditions() # or after default stopping conditions
)
)
)
toc(log = T, quiet=T)# stop timer
log.lst <- tic.log(format = FALSE)
tic.clearlog()
n_units <- floor(ea$best.param[1]) # Number of neurons
num_param <- n_features * n_units + n_units * n_classes + n_units + n_classes
W <- ea$best.param[(2:(num_param+1))] # Weights
nn <- nnet(train_dt[,1:input], train_dt[,target:ncol(train_dt)], size = n_units, Wts = W, maxit = 0, trace=F, softmax=T) # result nn
# Model evaluation
evaluate.CM <- function(true, pred) {
true <- max.col(true)
pred <- max.col(pred)
table(true, pred)
}
pred<-predict(nn, test_dt[,1:input])
confusionMat <- evaluate.CM(test_dt[,target:ncol(test_dt)], pred)
test_accuracy <- mean(max.col(test_dt[,target:ncol(test_dt)])==max.col(pred))
pred<-predict(nn, train_dt[,1:input])
train_accuracy <- mean(max.col(train_dt[,target:ncol(train_dt)])==max.col(pred))
pred<-predict(nn, val_dt[,1:input])
val_accuracy <- mean(max.col(val_dt[,target:ncol(val_dt)])==max.col(pred))
results <- list('ea' = ea, 'nn' = nn, 'trainAccuracy' = train_accuracy, 'valAccuracy' = val_accuracy, 'testAccuracy' = test_accuracy, 'cm' = confusionMat, 'time' = unlist(lapply(log.lst, function(x) x$toc - x$tic))[1])
}
#USAGE
dim <- 2
data <- mlbench.simplex(n=1000, d=dim, sd=0.3)
dt <- as.data.frame(data)
#results <- train_EA(dt, dim, 1000)
Accuracy of the algorithm for each data portion:
Confusion matrix:
Time:
Summary:
In this section, we aim to compare the three techniques we have used to train ANN. We will base our conclusions on three scores: the computational time, the accuracy obtained and the size of the architecture, i.e. the number of hidden units, which will measure somehow the complexity of the model. With these, we will try to see which one is better and how they change their performance in terms of the data they use.
Firstly, though, let’s take a look at the following consideration.
In the following cell, we create 15 different data sets, each with a different size. We save the time, the three accuracies and the number of hidden neurons (in the case of GA and EA). Afterwards, we will plot these features in terms of the dataset samples.
Surprisingly, while the first plot exhibits a linear trend, the other plots are more noisy, with fluctuating values. However, in both cases, the tendency is for the values to increase as the number of samples increases, which is expected. In the GA plot, the values start at around 20 seconds and, in the worst cases, reach almost ten times that amount. In the EA plot, the algorithm takes even longer, starting at 60 seconds and reaching almost 250 seconds in some cases.
Regardless of this, we can ensure the time needed to train the ANN is considerably lower in the case of BP, where even the biggest data set is processed within less than a second. It is therefore clear that time is a clear drawback for both GAs and EAs.
One of the main reasons that these techniques can take a lot of time is because they involve a large number of iterations, each of which requires computation and evaluation. Depending on the size and complexity of the problem being solved, the algorithm may need to evaluate thousands or even millions of possible solutions in order to find the optimal one. Additionally, these algorithms may need to run for a long time in order to explore the entire solution space and find the global optimal solution, rather than getting stuck in a local minimum.
As we can see in the previous plots, the accuracy we obtain does not seem to be significantly affected by the size of the data, regardless of which split of the data we refer to. The last plot exhibits more variability than the other two, with some values reaching as low as 0.5, but we observe no clear trend and such fluctuations appear to be random.
Another remark we can extract from the plots is the fact that the Train, Validation and Test lines are almost overlapped. This is a good indicator and it means that no overfitting is taking place, that is, the ANNs trained with the three techniques do a great job when trying to generalize the patterns they learn.
We have just performed an analysis on how the time, accuracy and optimal number of hidden units are affected by the size of the data set we work with. We can now wonder how the same variables are affected by changing the standard deviation. We expect the accuracy to drop significantly, since as we increase the sd parameter, the data gets mixed and much more difficult to classify.
dim <- 2
sequence <- seq(from = 0.1, to = 1, by = 0.1)
time_BP <- c()
acc_BP <- c()
time_GA <- c()
acc_GA <- c()
n_hidden_GA <- c()
time_EA <- c()
acc_EA <- c()
n_hidden_EA <- c()
for(st_dv in sequence){
data <- mlbench.simplex(n=1000, d=dim, sd=st_dv)
dt <- as.data.frame(data)
# BP data
results_BP <- train_BP(dt, dim, 15, 1000)
time_BP <- c(time_BP, results_BP$time)
acc_BP <- rbind(acc_BP, c(results_BP$trainAccuracy, results_BP$valAccuracy, results_BP$testAccuracy))
# GA data
results_GA <- train_GA(dt, dim, 1000)
time_GA <- c(time_GA, results_GA$time)
acc_GA <- rbind(acc_GA, c(results_GA$trainAccuracy, results_GA$valAccuracy, results_GA$testAccuracy))
n_hidden_GA <- c(n_hidden_GA, floor(results_GA$ga@solution[1]))
# EA data
results_EA <- train_EA(dt, dim, 500)
time_EA <- c(time_EA, results_EA$time)
acc_EA <- rbind(acc_EA, c(results_EA$trainAccuracy, results_EA$valAccuracy, results_EA$testAccuracy))
n_hidden_EA <- c(n_hidden_EA, floor(results_EA$ea$best.param[1]))
}
# BP
plot(sequence, time_BP, type = 'l', ylab = "BP Time", col = 'red', xlab = "Standard deviation", lwd = 2)
title("Computational time in terms of standard deviation - BP")
# GA
plot(sequence, time_GA, type = 'l', ylab = "GA Time", col = 'red', xlab = "Standard deviation", lwd = 2)
title("Computational time in terms of standard deviation - GA")
# EA
plot(sequence, time_EA, type = 'l', xlab = "Standard deviation", col = 'red', lwd = 2, ylab = "EA Time")
title("Computational time in terms of standard deviation - EA")
As we saw in the previous section, the BP technique consistently requires less time to run compared to the GA and EA techniques. However, when the standard deviation of the data is high, we observe a subtile trend of increasing time needed to run the code, especially for GA and EA. This is because the high standard deviation makes it more difficult to accurately label the samples, leading to the need for more hidden units in these two approaches. As a result, we require more time due to the additional weights that need to be calculated.
# BP
plot(sequence, acc_BP[,1], type = 'l', ylab = "BP Accuracy", ylim = c(0.3,1), xlab = "Standard deviation", lwd = 2)
lines(sequence, acc_BP[,2], type = 'l', col = 'green', lwd = 2)
lines(sequence, acc_BP[,3], type = 'l', col = 'red', lwd = 2)
legend("topright", c("Train", "Validation", "Test"),
lty = 1, col = c("black", "green", "red"), lwd = 2)
title("Accuracy in terms of standard deviation - BP")
# GA
plot(sequence, acc_GA[,1], type = 'l', ylab = "GA Accuracy", ylim = c(0.3,1), xlab = "Standard deviation", lwd = 2)
lines(sequence, acc_GA[,2], type = 'l', col = 'green', lwd = 2)
lines(sequence, acc_GA[,3], type = 'l', col = 'red', lwd = 2)
legend("topright", c("Train", "Validation", "Test"),
lty = 1, col = c("black", "green", "red"), lwd = 2)
title("Accuracy in terms of standard deviation - GA")
# EA
plot(sequence, acc_EA[,1], type = 'l', ylab = "EA Accuracy", ylim = c(0,1), xlab = "Standard deviation", lwd = 2)
lines(sequence, acc_EA[,2], type = 'l', col = 'green', lwd = 2)
lines(sequence, acc_EA[,3], type = 'l', col = 'red', lwd = 2)
legend("topright", c("Train", "Validation", "Test"),
lty = 1, col = c("black", "green", "red"), lwd = 2)
title("Accuracy in terms of standard deviation - EA")
As expected, the accuracy significantly deteriorates as the sd parameter increases. This is likely because the data becomes more challenging to classify. In most cases, we see a decrease from a perfect score to around 0.5 as the sd parameter increases. However, the EA case exhibits a lower initial accuracy, with a relatively poor performance compared to the other cases.
The Nemenyi test is a non-parametric statistical procedure used to compare the means of multiple groups and determine whether there are significant differences between them. In a Nemenyi plot, the mean ranks of the groups being compared are plotted on the y-axis, and the groups are represented by points on the x-axis. A horizontal line is drawn at the critical difference (CD) value, which is calculated based on the statistical test used. If the distance between two groups is greater than the CD value, then the means of those groups are significantly different.
In this section, we create a standard data set and compute the three techniques to train our ANN. This is repeated 5 times, in which we save the time and the accuracy of the test split. With these values we will plot two graphs, one for the time and the other for the test accuracy, which will allow us to compare the three methods and see which is better in terms of which variable.
dim <- 2
data <- mlbench.simplex(n=1000, d=dim, sd=0.3)
dt <- as.data.frame(data)
time_BP <- c()
acc_BP <- c()
time_GA <- c()
acc_GA <- c()
time_EA <- c()
acc_EA <- c()
set.seed(NULL)
for(iter in 1:5){
# BP data
results_BP <- train_BP(dt, dim, 15, 1000)
time_BP <- c(time_BP, results_BP$time)
acc_BP <- c(acc_BP, results_BP$testAccuracy)
# GA data
results_GA <- train_GA(dt, dim, 1000)
time_GA <- c(time_GA, results_GA$time)
acc_GA <- c(acc_GA, results_GA$testAccuracy)
# EA data
results_EA <- train_EA(dt, dim, 500)
time_EA <- c(time_EA, results_EA$time)
acc_EA <- c(acc_EA, results_EA$testAccuracy)
}
df_time <- as.data.frame(NULL)
df_time <- rbind(df_time, time_BP)
df_time <- rbind(df_time, time_GA)
df_time <- rbind(df_time, time_EA)
df_time <- cbind(c("bp", "ga", "ea"), df_time)
colnames(df_time) <- c("id", "test1", "test2", "test3", "test4", "test5")
df_time
## id test1 test2 test3 test4 test5
## 1 bp 0.09 0.09 0.08 0.08 0.09
## 2 ga 22.61 24.75 21.50 18.51 26.19
## 3 ea 105.39 94.53 79.65 96.35 87.92
df_acc <- as.data.frame(NULL)
df_acc <- rbind(df_acc, acc_BP)
df_acc <- rbind(df_acc, acc_GA)
df_acc <- rbind(df_acc, acc_EA)
df_acc <- cbind(c("bp", "ga", "ea"), df_acc)
colnames(df_acc) <- c("id", "test1", "test2", "test3", "test4", "test5")
df_acc
## id test1 test2 test3 test4 test5
## 1 bp 0.9133333 0.9200000 0.9000000 0.9133333 0.9066667
## 2 ga 0.9400000 0.9466667 0.9400000 0.8800000 0.8933333
## 3 ea 0.8933333 0.9066667 0.6666667 0.7600000 0.8866667
In this function, given a dataframe like the ones we have previously computed, we can visualize the Nemenyi [5] plots.
CDdiagram <- function(df, plot_title) {
id <- c("bp", "ga", "ea")
#compute ranks
for(i in 2:ncol(df)) { # for-loop over columns
df[ , i] <- rank(df[, i], ties.method = "average")
}
df.id <- id
#compute mean
mean_rank <- rowMeans(df[2:ncol(df)])
#compute critical difference
k = nrow(df) #k different models to compare
N = ncol(df)-1 #N results for each model
qa = 2.343 #q alpha nemenyi value for k=3
CD = qa*sqrt((k*(k+1))/(6*N))
#cd-diagram as a ggplot
data <- data.frame(
trainig_method=id,
rank=mean_rank,
sd=rep(CD, 3)
)
diagram <- ggplot() +
geom_errorbar(data=data, aes(x=trainig_method, ymin=rank-(sd/2), ymax=rank+(sd/2)), width=0.2, color='blue', size=0.8) +
geom_point(data=data, mapping=aes(x=trainig_method, y=rank), size=5, shape=21, fill="white") + ggtitle(plot_title)
return(diagram)
}
CDdiagram(df_time, "Time")
CDdiagram(df_acc, "Test accuracy")
As we can see in the previous plots, the BP method definitely comes in first in terms of time, which is something we already saw in the last sections. The fact that its mean is exactly placed in 1 tells us that this method, in the 5 tests we computed, was always the fastest one. On the other hand, EA was always the most computationally expensive one, with its mean placed on 3.
Since the BP and EA lines do NOT overlap, we conclude the means of these two methods are significantly different in terms of time. However, the GA line does overlap the other two, which tells us it is not significantly different from the other two.
If we have to discuss the accuracy obtained, we see that EA was the one that scored the worst in all cases. Its line overlaps with BP, but not GA, so EA and GA are significantly different accuracy-wise. On the other hand, BP and GA have almost the same performance, GA being a little bit better.
As a conclusion to this test, we see that, at least with the conditions and data we have worked on, EA does not give good results, since it is the slowest one and the one with the lowest test accuracy. GA seems to be remarkably better in terms of accuracy, but, still, regarding the time it needs, we assume that BP would be in most of the cases the preferable option.
In conclusion, our study showed that the performance of back propagation, genetic algorithms and evolutionary algorithms can vary significantly in terms of accuracy, computational time, and number of hidden units. While BP was consistently the fastest method, EA was generally the slowest. We also observed that GA and EA tended to require more hidden units as the standard deviation of the data increased, indicating that they may be more suitable for more complex problems. However, it is difficult to determine which algorithm is the best overall, as it ultimately depends on the nature of the problem at hand. In this particular case, GA proved to be a strong performer, especially considering the possibility to optimize hyperparameters, such as the number of hidden units, while BP was the top choice for faster performance.
[1] Luca Scrucca. “Package ‘GA’ manual”. In: Cran R project (October 19, 2022). https://cran.r-project.org/web/packages/GA/GA.pdf (accessed: 22.12.2022).
[2] Jakob Bossek. “Package ‘cmaesr’ manual”. In: Cran R project (October 12, 2022). https://cran.r-project.org/web/packages/cmaesr/cmaesr.pdf (accessed: 22.12.2022).
[3] David J.Montana and Lawrence Davies. “Training Feedforward Neural Networks Using Genetic Algorithms”. In: BBN Systems and Technologies Corp (1989). https://www.ijcai.org/Proceedings/89-1/Papers/122.pdf (accessed: 26.12.2022).
[4] Alejandro Baldominos, Yago Saez, Pedro Isasi. “Evolutionary convolutional neural networks: An application to handwriting recognition”. In: Neurocomputing, 283, pp. 38-52 (2018). https://www.sciencedirect.com/science/article/abs/pii/S0925231217319112?via%3Dihub (accessed: 27.12.2022).
[5] Janez Demsar. “Statistical Comparisons of Classifiers over Multiple Data Sets”. In: Journal of Machine Learning Research 7,pp. 1-30 (2006). https://www.jmlr.org/papers/volume7/demsar06a/demsar06a.pdf (accessed: 27.12.2022).