• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

ExhauFS: Feature Selection Based on Exhaustive Search for Cancer Survival Classification and Regression

Laboratory of Molecular Physiology HSE University together with colleagues from the Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow State University, the Moscow Center for Fundamental and Applied Mathematics, and the Institute of Nanotechnologies for Microelectronics of the Russian Academy of Sciences have developed ExhauFS, a tool that allows you to conduct an exhaustive search for feature subsets to build the most powerful cancer survival classification and regression models.

The source codes and documentation of the ExhauFS program are available on GitHub. A scientific article describing the principles of the program was published in PeerJ.

Feature selection is one of the main techniques used to prevent overfitting in machine learning applications. The simplest approach to feature selection is exhaustive search: it allows you to enumerate all possible combinations of features and select the model with the highest accuracy. This method, along with its optimizations, is actively used in biomedical research, but so far there has been no publicly available implementation.

ExhauFS program, a convenient command-line implementation of an exhaustive search approach for survival classification and regression. In addition to the description of the tool, the accompanying scientific article also includes three examples of ExhauFS application, which allows you to comprehensively consider the implemented functionality of the program.

As the first example of ExhauFS use, the article considers a cervical cancer toy dataset, on which the authors illustrate the main concepts. The researchers also used multi-cohort breast cancer microarray datasets to construct gene signatures to classify 5-year recurrence. It is worth noting that most of the signatures built using ExhauFS overcame the sensitivity and specificity threshold of 0.65 on all data sets, including the validation one. Moreover, a number of gene signatures have demonstrated reliable performance on an independent RNA-seq dataset without any retuning of the coefficients, i.e. turned out to be cross-platform. Finally, Cox survival regression models for isomiR signatures were used to predict the overall survival of patients with colorectal cancer. As in the previous example, most of the models passed the pre-set concordance index threshold of 0.65 across all datasets.

Additionally, in both real scenarios (breast and colorectal cancer datasets), ExhauFS was compared with state-of-the-art feature selection models, including L1-regularized sparse models. However, using alternative approaches to feature selection, scientists have not been able to build reliable cross-platform classifiers.