HSE Laboratory Research Fellow Anton Zhiyanov Successfully Defends PhD Thesis in Computer Science

On 27 February 2026, Anton P. Zhiyanov, a research fellow at the Laboratory of Molecular Physiology, Faculty of Biology and Biotechnology, HSE University, successfully defended his dissertation “Methods for Evaluating the Quality of Linear Classifiers for microRNA Sequence Analysis” before the HSE University Dissertation Council in Computer Science. The thesis develops an axiomatic framework for comparing classification metrics, proposes new statistical tools for verifying linear classifiers, and applies them to the bioinformatics problem of microRNA isoform prediction. By its decision of 26 March 2026, the Dissertation Council conferred the sought-after degree on Anton.

A linear classifier distinguishing between homogeneous and heterogeneous cleavage at the 5′ end of the 3′ arm based on the nucleotide sequence in the region of the microRNA cleavage position

A linear classifier distinguishing between homogeneous and heterogeneous cleavage at the 5′ end of the 3′ arm based on the nucleotide sequence in the region of the microRNA cleavage position
Anton Zhiyanov

On 27 February 2026, the Dissertation Council in Computer Science at HSE University held the PhD defense of Anton P. Zhiyanov, a research fellow at the Laboratory of Molecular Physiology, Faculty of Biology and Biotechnology, HSE University, whose thesis is entitled “Methods for Evaluating the Quality of Linear Classifiers for microRNA Sequence Analysis.” The thesis was submitted to the HSE University Dissertation Council in Computer Science and was highly praised by the members of the Defense Committee, who unanimously recommended that Anton Zhiyanov be awarded the degree of Candidate of Computer Science. The Dissertation Council in Computer Science at HSE University subsequently decided, at its meeting on 26 March 2026, to confer on Anton Zhiyanov the degree of Candidate of Computer Science and to issue the corresponding diploma.

The research was carried out at the HSE University under the supervision of Alexander G. Tonevitsky, Doctor of Biological Sciences, Full Member of the Russian Academy of Sciences, and Full Professor. The thesis lies at the intersection of machine learning, probability and statistics, and molecular biology, focusing on problems that arise in the analysis of microRNA expression and their isoforms in biomedical datasets. The results underlying this dissertation were obtained in the course of research carried out under the HSE University Basic Research Program within the framework of the ‘Centers of Excellence’ project.

The first chapter introduces an axiomatic framework for evaluating and comparing classification quality metrics. A set of desirable properties (axioms) is formalized and checked for a range of widely used measures, including accuracy, balanced accuracy, Matthews correlation coefficient, Cohen’s kappa, F-measure, and others. A key theoretical result is an incompatibility theorem showing that monotonicity, unbiasedness, and the metric (distance) property cannot all be satisfied by a single binary classification measure. As a constructive response, the thesis proposes a new family of “generalized mean” metrics, which includes Matthews correlation and symmetric balanced accuracy and satisfies most of the axioms.

The second chapter focuses on the statistical verification of linear classifiers. The author studies the probability of near-linear separability of two samples in a two-dimensional feature space, which makes it possible to distinguish between genuine structure in the data and random artifacts due to multiple testing. New, sharper upper bounds on this probability are derived, including bounds for normally distributed samples, which are particularly relevant for biomedical applications. Based on these results, a homogeneity test is constructed that uses the observed number of classification errors to assess the statistical significance of a given classifier.

Special attention is paid to low-dimensional linear classifiers that rely on a small number of features, a common setting in bioinformatics where diagnostic and prognostic signatures are built from the expression of a handful of genes or microRNAs. The developed bounds and tests are shown to effectively account for multiple testing effects and to confirm the “non-randomness” of such classifiers even when independent test sets are relatively small. As an illustration, the methods are applied to previously proposed classifiers for predicting recurrence in ER-positive breast cancer.

The third, applied chapter addresses the prediction of microRNA isoform (isomiR) formation. Using RNA-seq data from The Cancer Genome Atlas (TCGA), the thesis analyzes microRNA isoform expression across multiple human tumor types and investigates how pri-miRNA sequence and hairpin structure affect the precision of Dicer cleavage. A linear classifier is constructed that predicts, from the nucleotide sequence near the cleavage site, whether processing will yield a single canonical microRNA or multiple isoforms.

The classifier achieves an accuracy of 0.71, and the p-value of the linear homogeneity test is below 0.05, indicating that the observed performance is statistically significant. Feature analysis reveals sequence motifs associated with distinct cleavage outcomes: the AGCU motif at the 5′ end of the 3′ arm of the pri-miRNA hairpin is linked to the absence of isomiRs, whereas the CCAG motif is associated with their formation. These motifs were experimentally validated using the MDA-MB-231 cell line transduced with short hairpin RNAs processed by Drosha and Dicer in the same way as endogenous microRNAs.

The main results of the thesis have been published in three peer-reviewed papers, including two articles in Web of Science and Scopus indexed journals and one A-level conference proceedings paper. Findings have been presented at the NeurIPS 2021 conference, in RNA Biology, and at several seminars and conferences in Russia and abroad.

The Laboratory of Molecular Physiology warmly congratulates Anton Zhiyanov on his successful PhD defense and wishes him continued success in his scientific career!