Dai et al. This is because when a large number of noisy data are removed, they can better avoid premature convergence and explore the whole data space more effectively. Moreover, the obtained feature subset is much better than these 5 classical feature selection methods in terms of quantity and performance improvement of the classifier. Selection is adopted to keep the overall size of the population constant. It gives the algorithm the capability of local random search. In all tabular data, the best results of each standard are identified in bold. 1.4 Iterative Deepening Search (IDS) In IDS, we increase the depth limit iteratively and execute the DFS until the goal node is found. This method helps to reduce the complexity of the algorithm and avoid the algorithm falling into local optimization. Monitoring agriculture from remote sensing is a vast subject that has been widely addressed from multiple viewpoints, sometimes based on specific applications (e.g. 90, Article ID 10696, 2020. Contour i has all nodes with f=f i (IDS=iterative deepening search) Memory Bounded Heuristic Search: Recursive BFS (best-first) X. Y. Liu, Y. Liang, S. Wang, Z. Y. Yang, and H. S. Ye, A hybrid genetic algorithm with wrapper-embedded approaches for feature selection, IEEE Access, vol. In this paper, the encoding is a binary vector of length n. Each bit corresponds to a feature, and n is the size of the feature space. And different adaptive adjustment factors are introduced in the mutation and update phases of the algorithm. Learn, vol. The results show that the feature reduction rate for all datasets is above 99%, and the performance improvement for the classifier is between 5% and 48.33%. In terms of the computational cost of the algorithm, the following results can be drawn from the data in Table 6. The computational cost on 80% of the datasets is below 70% of the lowest value of other methods. Moreover, on 80% of the datasets, the number of features of the optimal feature subset obtained by the HFIA method is below 57% of the minimum value of other methods. The purpose is to obtain a subset of features with a smaller number of features while striving to achieve higher classification accuracy. The following conclusions can be drawn intuitively from the figure. Relationship between the Fisher score screening threshold and classification accuracy of GLI-85. Password requirements: 6 to 30 characters long; ASCII characters only (characters found on a standard US keyboard); must contain at least 4 different symbols; Finally, the principle of the immune clonal selection algorithm is introduced. These feature selection methods include five classical feature selection algorithms and 14 hybrid feature selection methods reported in the latest literature. On 28% of the datasets, it improves classification performance by more than 20%. But for classification problems, too much redundant feature data will cause a serious decline in the performance of the classifier and even lead to the problem of dimensional disaster. Wan et al. It has the characteristics of fast convergence speed and strong global optimization ability. According to the experimental analysis of the Fisher algorithm in Section 3.5, this paper sets its filter threshold value to be 200. It is widely used to solve the problem of dimension reduction of datasets in different fields, such as the best gene screening in biomedicine [7], the hot topic recognition in text mining [8], and the best visual content pixel and color selection in image analysis [9]. The construction method of the fitness function is also widely used in other literature [3437] to evaluate the quality of feature subsets. The smallest proportion of features in the optimal feature subset is Lymphoma and Pixraw10P. M. a. In this paper, the value of is 0.2. 62, pp. S. S. Shreem, S. Abdullah, and M. Z. Hypermutation is an important mechanism for the biological immune system to recognize external invasion. In order to verify the necessity of each part of the functional modules in the proposed model, ablation experiments are also performed. Existieren mehrere optimale Lsungen, wird eine davon gefunden (abhngig von Implementierungsdetails). In the problem of feature selection, a variety of hybrid algorithms have been used to solve the problem of selecting the optimal feature subset of high-dimensional data. They have a good global search ability, and there is no need to provide domain knowledge or advance assumptions about the search space. They determine the shape of the Cauchy distribution. 5, 1898 pages, 2019. M. M. Mafarja and S. Mirjalili, Hybrid Whale Optimization Algorithm with simulated annealing for feature selection, Neurocomputing, vol. Comparison of probability density functions between Cauchy distribution and Gaussian distribution. In this section, the implementation details of the HFIA algorithm and several improvements to the clonal selection algorithm will be described in detail. Secondly, the implementation details of the proposed method are described in Section 3. The filter method uses feature correlation criteria to select feature subsets with lower computational costs. On 70% of the dataset, it is below 10% of the minimum of other methods. Ma et al. When the fitness value change of the local optimal antibody meets the mutation conditions, Figure 2 describes the genetic changes when an antibody performs lethal mutation operation. M. A. E. Aziz and A. E. Hassanien, Modified cuckoo search algorithm with rough sets for feature selection, Neural Computing & Applications, vol. Its effect is to accelerate the decay rate of the genes of the elite antibodies. They all have good global searchability and do not need to provide domain knowledge or prior assumptions about the search space. In addition, people always want to get better results from machine learning models. In addition, the probability distribution of many random variables takes the normal distribution as its limit distribution under certain conditions. In numerical analysis, hill climbing is a mathematical optimization technique which belongs to the family of local search.It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution by making an incremental change to the solution. S. Manochandar and M. Punniyamoorthy, Scaling feature selection method for enhancing the classification performance of Support Vector Machines in text mining, Computers & Industrial Engineering, vol. The algorithm evaluates and ranks all features in the data space by the Fisher scoring function in the first stage. On 30% of the datasets, HFIA outperforms the maximum values of other methods by more than 10%. In this paper, a hybrid feature selection method (HFIA) combining the filter feature selection method with a multi-objective artificial immune algorithm is proposed. Next is PSO, which outperforms CSO methods on all datasets. For feature selection methods based on metaheuristic algorithms, binary coding strategy is mostly used to represent feature space. 182, pp. Therefore, this paper adopts the artificial immune algorithm with a good global search performance to perform a secondary search on the feature subset after the initial screening. From the comparative analysis of the above two aspects, it can be concluded that HFIA achieves better classification accuracy than the entire feature space with a very small number of features. Mach. S. Zhang, C. K. M. Lee, H. K. Chan, K. L. Choy, and Z. Wu, Swarm intelligence applied in green logistics: a literature review, Engineering Applications of Artificial Intelligence, vol. 409424, Mar 2009. On the other hand, a linear incremental regulator is added in the population update phase. Shang et al. These search technologies are mainly divided into three categories. The Gaussian distribution is a very important probability distribution in many fields such as mathematics, physics, and engineering. 2369, 2003. 42, no. In terms of improving the performance of the classifier, the following results are obtained by observing and comparing the data in Table 7. The algorithm combines the Fisher filtering algorithm and an improved clonal selection algorithm to explore the search space of the optimal feature subset. On 80% of the datasets, HFIA outperforms the maximum classification accuracy obtained by other methods by more than 5%. Iterative deepening depth-first search is a combination of depth-first search and breadth-first search. 356360, Honolulu, HI, USA, April 2007. Also, all algorithms are performed on different datasets using the same settings. This paper conducts comparative experiments with 5 classical feature selection methods on 10 benchmark datasets. And to improve the search performance of the CSA algorithm. Firstly, the algorithm generates the real initial population space through the standard Cauchy distribution function. It should be noted that the results in Table 3 and Figure 5 are the average number and average classification accuracy of the optimal feature subsets of each dataset after repeated execution 20 times. The algorithm combines filter algorithms and improves clone selection algorithms to explore the feature space of high-dimensional data. With the continuous in-depth understanding of the research object and the development of data acquisition technology, high-dimensional data has become more and more common. represents the mean of all data samples on the ith feature. O. S. Qasim and Z. Y. Algamal, Feature selection using particle swarm optimization-based logistic regression model, Chemometrics and Intelligent Laboratory Systems, vol. Among the six feature selection methods involved in the experiment, the HFIA method proposed in this paper outperforms the other five methods on 87.5% of the datasets. They play the role of noise in the pattern recognition model and greatly increase the computational cost of the model. A. Jovi, K. Brki, and N. Bogunovi, A review of feature selection methods with applications, 2015 38th Int. This fully shows that, compared with these five feature selection methods, the redundancy of the optimal feature subset selected by the HFIA method is very advantageous. 266273, 2011. 100, Article ID 104210, 2021. optimal: Es wird immer die optimale Lsung gefunden. From the performance of the KNN algorithm in classification accuracy on all datasets, the fusion method of HFIA has a better performance advantage than any other individual method. Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. Computational Intelligence and Neuroscience, https://github.com/primekangkang/Genedata, The gene locus is transformed according to the threshold value, Apply filtering algorithm (Fisher score) and generate feature subsets. Its purpose is to dynamically adjust the number of antibody genes that is newly added to the population according to iterative changes. The hybrid scheme of multiple algorithms greatly increases the probability of finding the optimal solution efficiently and quickly. Quantitative comparison between the optimal feature subset of HFIA and full features. It is a commonly used validation technique and is widely used to evaluate the performance of machine learning models. The HFIA algorithm framework is shown in Figure 3. The calculation formula of the acceleration factor is shown in the following formula: Among them, t is the current number of iterations, and is the total number of iterations. The five methods are as follows: CFS (statistical-based) [47], FCBF (information theoretical-based) [48], ReliefF (similarity-based) [49], SBMLR (sparsity-based) [50], and SPEC (graph theory-based) [51]. If the value of is larger, the peak height of the probability density function will be smaller and the width will be larger. The algorithm is used to solve the feature selection problem of high-dimensional biomedical datasets. For its specific implementation and application, please refer to the algorithm framework code part in next section. 4, pp. Based on the traditional evolutionary algorithm, it introduces the mechanism of affinity maturity, cloning, and memory. Sun, Parallel Feature Selection Based on MapReduce, Springer, China, 2014. The algorithm combines the Fisher filter algorithm and an improved artificial immune algorithm to optimize the search process of the optimal feature subset for high-dimensional data. 2 hours to complete. On these datasets, the performance of the classifier is improved by more than 40%. This fully shows that the HFIA method has the best performance in eliminating redundant features compared with these five methods. W. c. Wang, L. Xu, K. w. Chau, and D. m. Xu, Yin-Yang firefly algorithm based on dimensionally Cauchy mutation, Expert Systems with Applications, vol. It describes the properties of the acquired immunity of the biological immune system. Feature selection provides the optimal subset of features for data mining models. YJ201933); and in part by the China International Postdoctoral Exchange Fellowship Program (Talent-Introduction). [15] proposed a hybrid feature selection method based on a genetic algorithm and embedded regularization. This is even more unrealistic for large-scale datasets. They are derived from the heuristic algorithm and are also the product of the combination of the random algorithm and the local search algorithm. A metaheuristic algorithm is also called an intelligent optimization algorithm. Moreover, its performance on 60% of the datasets achieved an average classification accuracy of 100%. Compared with the classical filtering feature selection method, the quality of the optimal feature subset obtained by the HFIA algorithm has great advantages, and its computational cost is also very competitive. Among them, Ovarian and Lung have the lowest computational cost, which is less than 20% of the lowest value of other methods. This increases the risk of the search algorithm falling into a local optimum and reduces the convergence speed of the algorithm. The computational cost on 62.5% of the datasets is below 10% of the lowest value of other methods. precision farming, yield prediction, irrigation, weed detection), on specific remote sensing platforms (e.g. On these datasets, the optimal number of optimal feature subsets obtained by HFIA is less than 32% of the minimum value of other methods. In recent years, feature selection methods based on metaheuristics have been the focus of scholars because of their good global search ability [5]. Compared with Gaussian distribution, its attenuation speed is slower and allows a larger mutation step. Compared with the five classical filtering feature selection methods, the computational cost of HFIA is lower than the two of them, and it is far better than these five algorithms in terms of the feature reduction rate and classification accuracy improvement. Xiao et al. This work was supported in part by the National Key Research and Development Program of China (No. A variety of feature selection methods have been proposed for different application fields to improve the recognition performance of the model. Computational cost comparison of different feature selection methods. Song, Y. Zhang, D. Gong, H. Liu, and W. Zhang, Surrogate sample-assisted particle swarm optimization for feature selection on high-dimensional data, IEEE Transactions on Evolutionary Computation, vol. Ablation study results of the proposed method for the 25 datasets. This does not mean that the population will continue to increase during iterations. The optimal feature subset obtained by the HFIA algorithm can improve the classification accuracy of the classifier to a great extent. 71727186, 2022. Since Burnet [26] fully elaborated the principle of clone selection in 1959, the algorithm has been generally recognized by the immunology community. 105127, 2021. A* (pronounced "A-star") is a graph traversal and path search algorithm, which is used in many fields of computer science due to its completeness, optimality, and optimal efficiency. Technol. It is compared with 5 classical feature selection methods and 14 hybrid feature selection methods for high-dimensional data reported in the latest literature. 118131, 2017. and are the mean and variance of the category k corresponding to the ith feature, respectively. In fact, many biological phenomena appear in the form of probability distribution of continuous random variables. It should be noted that the experimental data of the comparison algorithm are all from the corresponding literature, and our algorithm adopts the same settings as the comparative literature. 203215, 2018. Nov, 2007. The comparative experimental results fully demonstrate the progressiveness of the algorithm. 4, 2010. This fully shows that, compared with these five feature selection methods, the optimal feature subset obtained by HFIA has a strong competitive advantage in classification performance. 40, no. In terms of the control of the number of features of the optimal feature subset, the following results can be obtained by comparing the data in Table 11. Depth First Iterative Deepening Search is used to find optimal solutions or the best-suited path for a given problem statement. In terms of the classification accuracy of the selected optimal feature subset, the following results are obtained by comparing the data in Table 13. The fundamental purpose of feature selection is to find a better subset of features to represent the entire feature space, that is, to find a subset with less feature redundancy and higher classification accuracy. Yan et al. Compared with other intelligent algorithms based on metaheuristics, the search of the algorithm is not completely random. It can find a satisfactory near-ideal solution in an acceptable time, although this is not the only optimal solution [13]. Due to the semi-blindness of the clone selection algorithm in the search problem, scholars have proposed various mutation strategies to improve the algorithm. The average classification accuracy of KNN is used to evaluate the quality of the selected optimal feature subset in this paper. I. Jain, V. K. Jain, and R. Jain, Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification, Applied Soft Computing, vol. The method solves the problem of selecting optimal feature subsets for high-dimensional data through a two-stage screening operation. Monitoring agriculture from remote sensing is a vast subject that has been widely addressed from multiple viewpoints, sometimes based on specific applications (e.g. Some are memory cells that function as antigen markers. In theory, the more information obtained, the more conducive it is to obtain a more accurate judgment of the object. Moreover, they can make a better trade-off between search quality and development quality of the algorithm. Moreover, this can also make better use of the advantages of the algorithm itself. 2018. The experimental results are shown in Tables 13 and 14. Optimality of A* A* expands nodes in order of increasing f value. 116, Article ID 107933, 2021. It can be seen from the schematic diagram of the probability density function of the Cauchy distribution in Figure 1. Through the comparative analysis of the above experimental results, it can be concluded that, combined with the evaluation results of the two indicators of the quality of the optimal feature subset and the computational cost, the HFIA method has excellent competitive advantages in feature selection of high-dimensional data compared with the 14 feature selection methods reported in the latest literature. The wrapper method uses the classification algorithm to evaluate the quality of the selected features, so as to obtain a higher quality feature subset. Table 14 describes the comparison between the average numbers of optimal feature subsets obtained by these methods. The method first uses the ReliefF algorithm to calculate feature weights and then searches for the optimal feature subset through PSO. Sang, A novel hybrid feature selection method considering feature interaction in neighborhood rough set, Knowledge-Based Systems, vol. Conv. Because of these advantages, the metaheuristic algorithm has attracted extensive attention from researchers. Hall and L. a. Smith, Practical feature subset selection for machine learning, Comput. D. Dua and C. Graff, UCI Machine Learning Repository, Beijing China, 2017. The optimal feature subset with the smallest proportion of features is Leukemia1, Brain Tumor2, Leukemia3, and Lung. According to the statistics in Table 3, on these datasets, the classification accuracy obtained by HFIA is 4.11%32% higher than the maximum value of the other five algorithms. In addition, they also have the features of flexibility and intuition and can be modified according to the specific problems that are to be solved. 12001205, 2015. On these datasets, the number of features of the optimal feature subset obtained by HFIA is less than 6% of the minimum value of other methods. Its computational cost on 40% of the datasets is below 30% of the lowest value of other methods. Moreover, it has been reported that even in terms of the diversity of methods, the Cauchy distribution is better than the Gaussian distribution in the search process of the evolutionary algorithm [3941]. These 4 feature selection methods are PSO, CSO, AMSO, and VLPSO. M. Burnet, The Clonal Selection Theory of Acquired Immunity, Vanderbilt University Press, Nashville, 1959. K. Hussain, N. Neggaz, W. Zhu, and E. H. Houssein, An efficient hybrid sine-cosine Harris hawks optimization for low and high-dimensional feature selection, Expert Systems with Applications, vol. heuristic searches - Free download as Word Doc (.doc / .docx), PDF File (.pdf), Text File (.txt) or read online for free. But how to combine different algorithms more effectively is worthy of further study by scholars. In recent years, hybrid algorithms have received extensive attention in solving optimization problems. This fully shows that the HFIA method has strong competitiveness in improving the performance of the classifier compared with these five feature selection methods. In the experiment, the classification results of the cross-validation of the KNN algorithm are used as the basis for the performance evaluation of the algorithm. From a broad point of view, these methods may generally be divided into filter and wrapper methods. Moreover, they can effectively deal with complex problems that are difficult to be solved by traditional optimization algorithms without being limited by the nature of the problem. Iterative-Deepening Heuristic Hill-Climbing Use heuristic measure of goodness (informed) Best-First of a node. Finally, the pseudo-code of the algorithm is given at the end of this section, and the symbols used in the algorithm are also explained. This paper compares this method and four other advanced feature selection algorithms on 10 microarray datasets. From a biological point of view, although lethal mutations are detrimental to lethal individuals, they are beneficial for maintaining the heterozygous state of the population. Compared with the hybrid feature selection method proposed in the latest literature, the HFIA algorithm obtains the minimum number of selected feature subsets and better average classification accuracy at a lower computational cost. 2-5 Uninformed Search (ii) - Depth-First Search, Depth-Limited Search, Iterative-Deepening Search 14m. Affinity evaluation: calculate the affinity of each antibody in the antibody pool. Although this goal can be achieved through the simplest exhaustive search strategy. Tables 79 describe the comparison of experimental results between HFIA and the feature selection method mentioned in [23]. 88, pp. R. S. Marko and K. Igor, Theoretical and empirical analysis of relief and rreliefF, Machine Learning J, vol. q is the total number of features in the dataset, while p is the number of selected features in the feature subset. The standard Cauchy distribution function is shown in formula (3). Importantly, it overcomes the deficits regarding freedom displayed by central planning. 46, pp. Browse our listings to find jobs in Germany for expats, including jobs for English speakers or those in your native language. In computer science, iterative deepening search or more specifically iterative deepening depth-first search (IDS or IDDFS) is a state space/graph search strategy in which a depth-limited version of depth-first search is run repeatedly with increasing depth limits until the goal is found. 16, no. Moreover, some hybrid algorithms combine the best characteristics of different algorithms to develop new algorithms. Copyright 2022 Yongbin Zhu et al. Experimental comparisons with 19 state-of-the-art feature selection methods are conducted on 25 high-dimensional benchmark datasets. Table 2 shows the details of these datasets. In addition, Table 1 describes the important identifiers used in the algorithm. Iterative deepening depth-first search. According to the target requirements of feature selection in high-dimensional data, this method greatly improves the initialization and mutation strategy of antibody population of the clonal selection algorithm. Brian Williams, Fall 10 48 Depth First Search (DFS) S D A B C G C G D C G Local Rule: After visiting node E. Emary, H. M. Zawbaa, and A. E. Hassanien, Binary ant lion approaches for feature selection, Neurocomputing, vol. 72487258, 2014. First, this paper uses the classification results of KNN as a criterion for evaluating the quality of candidate feature subsets. One major practical drawback is its () space complexity, as it stores all generated nodes in memory. Among them, (0, 1) is a given real number. However, the actual situation may be far from it, because there are often many redundant data irrelevant to the research objectives in these data. M. Rostami, K. Berahmand, E. Nasiri, S. Forouzandeh, and S. Forouzandeh, Review of swarm intelligence-based feature selection methods, Engineering Applications of Artificial Intelligence, vol. Firstly, the feature selection methods and their related domain knowledge are summarized in Section 2. For the feature selection problem in high-dimensional data space, the main idea is to use a hybrid feature selection method. In this section, datasets used in the experiment are first introduced, then the performance evaluation criteria of the classification test are explained, and finally, the parameters setting of the HFIA algorithm in the experiment are described. On 50% of the dataset, it is below 32% of the minimum of other methods. During this process, mutations in cloned individuals are inversely proportional to antigen affinity. The probability density function of the one-dimensional Cauchy distribution is shown in the following formula: The Cauchy distribution has two parameters, and . is the position parameter and is the scale parameter. 9, pp. Iterative Deepening Depth-First Search. 2286322874, 2018. The method is used to solve the problems of numerical optimization and feature selection. 2, pp. These improvements include population initialization, mutation strategy, and population update mechanism of antibodies. L. Yu and H. Liu, Feature selection for high-dimensional data: a fast correlation-based filter solution, Proceedings, Twent. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next depth level. Inf. 27, no. Therefore, most of them have the advantage of low computing costs. The solution obtained by this kind of algorithm is called the optimal solution or satisfactory solution. Among them, 11Tumor and Lung have the lowest computational cost, which is less than 12% of the lowest value of other methods. Moreover, they have a better performance in improving the classification performance of the classifier. Song et al. [29] improved the basic CSA in terms of population initialization, clonal selection method, and population update, so as to obtain better convergence when solving multi-objective optimization problems. In this paper, the cross-validation [46] is used to evaluate the accuracy of the classification algorithm. It outperforms FCBF and ReliefF methods on 20% of the dataset. The datasets with the highest classification performance improvement are NCI9 and CNS. Compared with the 14 hybrid feature selection methods reported in the latest literature, the average winning rates in terms of classification accuracy, feature reduction rate, and computational cost are 85.83%, 88.33%, and 96.67%, respectively. The experimental results are shown in Tables 46. Angelia, Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data, Applied Soft Computing, vol. 193200, 2021. Lu et al. H. Dai, Y. Yang, H. Li, and C. Li, Bi-direction quantum crossover-based clonal selection algorithm and its applications, Expert Systems with Applications, vol. The production of antibodies is the learning process of the immune system. They are the exponential selection strategy, sequential selection strategy, and random selection strategy [10]. Hybrid algorithms are those that combine different algorithms and develop a new or improved algorithm to solve more complex optimization problems. This also fully proves the superiority of the HFIA algorithm proposed in this paper. Among them, SMK-CAN-187, TOX-171, and Pixraw10P have the lowest computational cost, which is less than 7.5% of the lowest value of other methods. Feature selection methods are usually implemented by searching the solution space with the goal of maximizing the correlation with the target class and minimizing the redundancy of the selected features [11]. 260, pp. Best-first search is a graph search which orders all partial solutions (states) according to some heuristic. In particular, the iterative planning constituting the fifth institutional dimension of the Parecon proposal would require immense information complexity (Wright 2010: 2605). Among the 10 datasets participating in the experiment, the classification accuracy of HFIA is higher than that of the other 5 feature selection methods on all datasets. In terms of improving the performance of the classifier, the following results can be obtained from the observation and comparison of the data in Table 4. For the sake of simplicity, only the classification results of the KNN algorithm are used as the analysis indicators to conduct experiments. Song et al. The hybrid feature selection method combines the advantages of different methods. [25] proposed a hybrid feature selection algorithm based on surrogate sample-assisted particle swarm optimization (SS-PSO). TEKNOLOGI The experimental results are analyzed and discussed in Section 5. Among them, TOX-171 has the largest proportion of the average number of features in the optimal feature subset, with a ratio of 0.34%. 5662, 2017. N. Xu, Y. Ding, L. Ren, and K. Hao, Degeneration recognizing clonal selection algorithm for multimodal optimization, IEEE Transactions on Cybernetics, vol. The mutation mechanism plays an important role in the operation steps of the clone selection algorithm. These comparative analyses include classification accuracy, the number of optimal feature subsets, and the computational cost of the algorithm. They are the number of features and classification accuracy of the obtained optimal feature subset and the computational cost paid by the algorithm, respectively. The approximate optimal solution is searched by continuous loop iteration until one of the termination conditions is satisfied. W. Ma, X. Zhou, H. Zhu, L. Li, and L. Jiao, A two-stage hybrid ant colony optimization for high-dimensional feature selection, Pattern Recognition, vol. Hussain et al. However, its calculation cost is unacceptable. (1) The algorithm improves the classification performance of the classifier. The statistical results of the proposed HFIA method on all datasets are very similar to the CFS-iBPSO method. the traveling salesman problem. The experimental results show that the classification accuracy of the features scored and sorted by the Fisher score always oscillates within a certain range after being screened by different thresholds. This fully proves the progressiveness of the HFIA algorithm in solving the feature selection problem of high-dimensional data. W. Luo, X. Lin, T. Zhu, and P. Xu, A clonal selection algorithm for dynamic multimodal function optimization, Swarm and Evolutionary Computation, vol. First, the feature subsets obtained by the filter method have low accuracy, which requires artificial analysis of different datasets and selection of specific filter threshold values for them. In each iteration process, it is first necessary to determine the mutation loci of each antibody in mutation set C. The mutation loci are jointly determined by the generated Cauchy random number sequence and the transformation threshold value . However, current feature selection methods for high-dimensional data also require a better balance between feature subset quality and computational cost. Therefore, compared with the single feature selection method, the hybrid method has a better application value. These datasets cover varying numbers of features from 2000 to 22283. It is a more practical variant on solving mazes.This field of research is based heavily on Dijkstra's algorithm for finding the shortest path on a weighted graph.. Pathfinding is closely related to the shortest path problem, within graph theory, which examines how to identify the [20] proposed a hybrid optimization method integrating the sine-cosine algorithm into Harris hawks. The Fisher algorithm is an effective filtering feature selection method. It is preferred to use this search strategy when a large state space is provided and no information on the depth of solution is mentioned. The computational cost on 30% of the datasets is below 40% of the lowest value of other methods. Thirdly, because the selected optimal feature subsets are different in different classifiers, a fusion framework combining multiple feature selection algorithms and classification algorithms can be considered in order to obtain more effective results. Algorithm 2 lists the main steps to perform a lethal mutation operation on the elite antibodies selected from the population in each iteration. The rest of this paper is organized as follows. The normal distribution is also known as the Gaussian distribution. For the model proposed in this paper, the fusion scheme of HFIA is effective, which is very helpful for the performance improvement of the classifier. Table 3 shows the quantitative comparison between the optimal feature subset (avgNfs) obtained using HFIA and the full features of the dataset. In theory, the higher the dimension of the data, the more detailed the description of things. Iterative deepening search merupakan sebuah strategi umum yang biasanya dikombinasikan dengan depth first tree search, yang akan menemukan berapa depth limit terbaik untuk digunakan. These experimental analyses cover the following three aspects. J. Wan, H. Chen, Z. Yuan, T. Li, X. Yang, and B. In order to eliminate the risk of falling into a local optimum due to excessively rapid fitness decay, it is necessary to enhance the diversity of the population during the iterative process. Among the five feature selection methods participating in the comparative experiments, the VLPSO method has the lowest computational cost. Given a set of labelled data samples, , where c is the number of categories, and represents the number of data samples in the kth category. It obtains a higher affinity for the antigen through the mutation mechanism of the antibody gene. 3, pp. In the basic clonal selection algorithm, the initial population is generated by random distribution. J. G. Chen and H. Inbarani, Hybrid Tolerance Rough SetFirefly based supervised feature selection for MRI brain tumor image classification, Applied Soft Computing, vol. The smallest proportion of features in the optimal feature subset is CNS, ALL-AML, and MLL. The following Figure 4 is the relationship between the increase of the threshold value of the Fisher score of GLI-85 and the classification accuracy. In all experiments, the parameter configuration of the HFIA algorithm is as follows. Q. Gu, Z. Li, and J. Han, Generalized Fisher score for feature selection, Proc. In this paper, we also adopt the binary coding strategy. This algorithm is a combination of BFS and DFS searching techniques. It performs well in improving classification accuracy and computational cost. 24, pp. 176, Article ID 114778, 2021. 16, pp. S. Barak, J. H. Dahooie, and T. Tich, Wrapper ANFIS-ICA method to do stock market timing and feature selection on the basis of Japanese Candlestick, Expert Systems with Applications, vol. This fully shows that, compared with these five feature selection methods, the HFIA method has the lowest computational cost. U1736212, U19A2068, 62032002, and 62002248); in part by the China Postdoctoral Science Foundation (Nos. All experiments are performed on a PC with an Intel Core i5 and 8GB of RAM. They are based on the mechanism of computational intelligence to solve some complex optimization problems. 184, pp. Gradually adds "f-contours" of nodes. 42, no. X. V. Nguyen, J. Chan, S. Romano, and J. Bailey, Effective global approaches for mutual information based feature selection, in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. Therefore, in order to obtain the optimal feature subset more quickly, the Cauchy distribution will be applied to the initialization, mutation, and update stages of the population in this paper. S. Salesi, G. Cosma, and M. Mavrovouniotis, TAGA: tabu Asexual Genetic Algorithm embedded in a filter/filter feature selection approach for high-dimensional data, Information Sciences, vol. This solution can meet various requirements in practical applications when the number of features is not too large. Therefore, for feature selection problems, binary encoding is usually adopted to represent the individuals in the solution. However, as the data dimension continues to expand, especially when the number of features reaches thousands, the computational cost will increase exponentially. Accordingly, the clonal selection algorithm has evolved into many variants and applied to different research fields. L. N. De Castro and F. J. B. H. Nguyen, B. Xue, and M. Zhang, A survey on swarm intelligence approaches to feature selection in data mining, Swarm and Evolutionary Computation, vol. The experimental results are shown in Tables 1012. 29, no. S. N. Qasem and F. Saeed, Hybrid feature selection and ensemble learning methods for gene selection and cancer classification, International Journal of Advanced Computer Science and Applications, vol. Tables 1012 describe the comparison of experimental results between HFIA and the feature selection method mentioned in [45]. Among them, Full represents the method of removing all feature subset evaluation modules. The smallest proportion of features is DLBCL, Prostate Tumor, Leukemia3, and Lung. Among the five feature selection methods involved in the comparative experiments, CFS-iBPSO obtained the highest classification accuracy, outperforming the other four methods on 70% of the datasets. 8, pp. Von Zuben, The clonal selection algorithm with engineering applications, Proceedings of GECCO, vol. It is an intelligent algorithm inspired by the principle, function, and model of biological immunity. Moreover, in some cases, they will guide the learning process in the model to the generation direction of a weak model, resulting in wrong results. Secondly, the related work of current hybrid feature selection methods is summarized. However, with the expansion of search space, especially when the number of features reaches thousands, its calculation cost will increase exponentially [6]. 36, no. This is followed by AMSO and PSO-EMT methods, which are close in the computational cost on 60% of the dataset and outperform the rest of the methods. We compared the experimental results obtained by HFIA with the results without feature selection. To this end, the strategy of incremental update is adopted. When the value of the conversion threshold is smaller, there will be more 1 loci in an antibody, that is, the more loci involved in mutation, and vice versa. In this paper, an efficient hybrid feature selection method (HFIA) based on an artificial immune algorithm is proposed. It is iterative in nature. Based on clonal selection theory, de Castro and Von Zuben proposed a famous clonal selection algorithm (CLONALG, also known as CSA) in 2000 [27]. [24] proposed a hybrid feature selection method that combines neighborhood rough sets and conditional mutual information. Iterative deepening depth-first search (IDDFS): a state space search strategy; Jump point search: an optimization to A* which may reduce computation time by an order of magnitude using further heuristics; Lexicographic breadth-first search (also known as Lex-BFS): a linear time algorithm for ordering the vertices of a graph satellites, Unmanned Aerial Vehicles UAV-, Unmanned Ground Vehicles -UGV-) or sensors 925934, Feb. 2018. Intell. 2, pp. It is obtained by evaluating this subset of features by an evaluator (usually a classifier). We compared the experimental results with the results of 19 feature selection methods mentioned in other pieces of literature. In order to obtain more accurate feature evaluation information, a fusion scheme of multiple metrics, such as rough set theory, can be considered in the evaluation of feature subsets in future research. The classification accuracy, the number of features of the optimal feature subset, and the average and deviation of the computational cost obtained from the experimental results are all statistical results after the algorithm runs 20 times independently on each dataset. In these 10 datasets, the optimal feature subset obtained by HFIA has a lower number of features than other methods. [30] proposed a clonal selection algorithm based on the bidirectional quantum crossover. In addition, most algorithms also have the problem of high computational cost due to the complexity of the algorithm itself or insufficient optimization. Hal ini dilakukan dengan secara menambah limit secara bertahap, mulai dari 0,1, 2, dan seterusnya sampai goal sudah ditemukan. The summary of the experimental datasets. On 80% of the datasets, the number of features of the optimal feature subset obtained by the HFIA method is below 14% of the minimum value of the other methods. Disadvantages of Iterative deepening search. In terms of the computational cost of the algorithm, the following results can be drawn from the data in Table 9. J. Hua, W. D. Tembe, and E. R. Dougherty, Performance of feature-selection methods in the classification of high-dimension data, Pattern Recognition, vol. [22] proposed a hybrid feature selection method that fuses multiple algorithms. precision farming, yield prediction, irrigation, weed detection), on specific remote sensing platforms (e.g. In the 10 datasets participating in the experiment, the classification accuracy of HFIA on all datasets is higher than that of the other 5 classical feature selection methods. They are used to improve the search speed of the algorithm and enhance the diversity of the population, respectively. 92219235, 2015. Then, Section 4 provides a detailed description of the experimental datasets, the evaluation metrics, and the algorithm parameter settings. In all tabular data, the best result for each criterion is identified in bold. Theoretically, for any filtering-type feature selection method, as long as an optimal threshold value is selected, the desired feature subset can be obtained. On these datasets, the classification accuracy obtained by HFIA is more than 17% higher than the maximum value of other methods. In terms of the computational cost of the algorithm, the following results can be obtained after comparing the data in Table 12. Week. The classification accuracy data in the table is the best value obtained after 20 runs on each dataset. The wrapper method focuses on the interaction results between different feature combinations and classifiers. 14, pp. The four feature selection methods are mRMR-mid [57], QPFS [58], SPECCMI [59], and CGA [60], respectively. Commun. In all datasets participating in the experiment, the number of features of the optimal feature subset obtained by the HFIA method is less than 50% of the minimum value of other methods. Using a multi-objective optimization strategy to model the feature selection problem, a set of nondominated feature subsets can be obtained. Uncertain. This fully shows that the HFIA method is the best in improving the performance of the classifier compared with these five feature selection methods. The effective combination of these mechanisms enables the algorithm to obtain a better search ability and lower computational costs. Moreover, its computational cost on 90% of the datasets is better than these five methods. Table 4 presents a comparison of the classification accuracy of the optimal feature subsets obtained by different feature selection methods. In all datasets participating in the experiment, the HFIA method outperformed the other five methods on 75% of the datasets, only 28.57%88.57% of the minimum values of the other methods, and on 62.5% of the datasets, the number of optimal feature subsets obtained by the HFIA method is below 42% of the minimum values of other methods. In this method, the bidirectional quantum crossover mechanism in quantum jump theory is used to replace the hypermutation operation to realize the information exchange between antibodies. The results show that the computational cost of this algorithm is comparable to classical feature selection methods known for their speed. A value of 1 indicates that the feature at this location is selected, otherwise it is not selected. Therefore, for the feature selection problem, we have two optimization objectives, which are classification accuracy and the number of feature subsets. The authors declare that they have no conflicts of interest. Compared with the Gaussian distribution, the Cauchy distribution has a slower decay rate and a larger range of values. In all datasets participating in the experiment, the HFIA method outperformed the other five methods on 90% of the datasets, only 13.46%74.24% of the minimum values of the other methods. Table 7 describes the comparison of the average classification accuracy between the HFIA algorithm and the five methods after 20 repetitions on all experimental datasets. Compared with the classical filter feature selection method and other methods based on an intelligent algorithm, they can reduce feature redundancy and computational cost to a greater extent. The removal rate of feature redundancy for all datasets is above 99%. From the perspective of improving the classification performance of the classification algorithm, in all datasets, the HFIA method improves the classifier performance by 5%48.333%. The following conclusions can be drawn from the above analysis. 272277, Kaohsuing, Taiwan, November 2008. 6, pp. Since it is too expensive to evaluate all possible feature subsets, a method that is acceptable in terms of computational complexity needs to be used to find suitable feature subsets [12]. As shown in Figure 6, the experimental results under each metric are presented. Through the observation and analysis of the experimental results, the following results can be obtained. It identifies the importance of features through the calculated mean and variance. It performs depth-first search to level 1, starts over, executes a complete depth-first search to level 2, and continues in such way till the solution is found. That is, the number of updates for HFIA is N, while the number of updates in the classical algorithm is d (d