Partager
Actualité

Predicting Monoterpene Indole Alkaloid-Related Genes from Expression Data with Artificial Neural Networks

  • Santé-Sciences-Technologie,
  • Pharmacie,
Date(s)

du 23 février 2022 au 23 février 2025

Lieu(x)

Site Grandmont

EA2106 BBV - Biomolécules et Biotechnologies végétales

In this Chapter of the "Catharanthus roseus, Methods and Protocols" book, using Catharanthus roseus as an example, we show that ANN trained on a minimal set of bait genes results in many true positives (correctly predicted genes) while keeping false positives low (containing possible candidate genes).

Predicting Monoterpene Indole Alkaloid-Related Genes from Expression Data with Artificial Neural Networks


from :Dugé de Bernonville T, Amor Stander E, Dugé de Bernonville G, Besseau S, Courdavault V. 
Methods Mol Biol. 2022;2505:131-140. doi: 10.1007/978-1-0716-2349-7_10.
 

Abstract

Elucidation of biological pathways leading to specialized metabolites remains a complex task. It is however a mandatory step to allow bioproduction into heterologous hosts. Many steps have already been identified using conventional approaches, enlarging the space of known possible chemical steps. In the recent past years, identification of missing steps has been fueled by the generation of genomic and transcriptomic data for nonmodel species. The analysis of gene expression profiles has revealed that in many cases, genes encoding enzymes involved in the same biosynthetic pathways are coexpressed across different tissue types and environmental conditions. Hence, coexpressed studies, either in the form of differential gene expression, gene coexpression network, or unsupervised clustering methods, have helped deciphering missing steps to complete knowledge on biosynthetic pathways. Already identified biosynthetic steps can be used as baits to capture the remaining unknown steps. The present protocol shows how supervised machine learning in the form of artificial neural networks (ANNs) can efficiently classify genes as specialized metabolism related or not according to their expression levels. Using Catharanthus roseus as an example, we show that ANN trained on a minimal set of bait genes results in many true positives (correctly predicted genes) while keeping false positives low (containing possible candidate genes).

Keywords: Alkaloid biosynthesis; Artificial neural networks; Deep learning; Gene expression; RNA-seq