Deep Learning in BiologyThe increase in biological data and the formation of various biomolecule interaction databases enable us to obtain diverse biological networks. These biological networks provide a wealth of raw materials for further understanding biological systems, the discovery of complex diseases, and the search for therapeutic drugs. However, the increase in data also increases the difficulty of biological network analysis. Therefore, algorithms that can handle large, heterogeneous, and complex data are needed to analyze the data of these network structures better and mine their helpful information.[1]
Deep-learning algorithms (see ‘Deep thoughts’) rely on neural networks, a computational model first proposed in the 1940s, in which layers of neuron-like nodes mimic how human brains analyze information.[2]
Neural nets are a means of doing machine learning, in which a computer learns to perform some tasks by analyzing training examples. Usually, the models have been hand-labeled in advance. An object recognition system, for instance, might be fed thousands of labeled images of cars, houses, coffee cups, and so on, and it would find visual patterns in the images that consistently correlate with particular labels. [3]
Modeled loosely on the human brain, a neural net consists of thousands or even millions of simple processing nodes densely interconnected.
Most of today’s neural nets are organized into layers of nodes, and they’re “feed-forward,” meaning that data moves through them in only one direction. An individual node might be connected to several nodes in the layer beneath it, from which it receives data, and several nodes in the layer above it, to which it sends data.[3]
To each of its incoming connections, a node will assign a number known as a “weight.” When the network is active, the node receives a different data item — a different number — over each of its connections and multiplies it by the associated weight. It then adds the resulting products together, yielding a single number. If that number is below a threshold value, the node passes no data to the next layer. If the number exceeds the threshold value, the node “fires,” which in today’s neural nets generally means sending the number — the sum of the weighted inputs — along with all its outgoing connections. [3]
When a neural net is being trained, all of its weights and thresholds are initially set to random values. Training data is fed to the bottom layer — the input layer — and it passes through the successive layers, getting multiplied and added together in complex ways, until it finally arrives, radically transformed, at the output layer. During training, the weights and thresholds are continually adjusted until training data with the same labels consistently yield similar outputs. [3]In biology, deep-learning algorithms dive into data in ways that humans can’t, detecting features that might otherwise be impossible to catch. Researchers are using the algorithms to classify cellular images, make genomic connections, advance drug discovery, and even find links across different data types, from genomics and imaging to electronic medical records. [2]
A key challenge in biomedicine is the accurate classification of diseases and disease subtypes. In oncology, current ‘gold standard’ approaches include histology, which requires interpretation by experts or assessment of molecular markers such as cell surface receptors or gene expression. One example is the PAM50 approach to classifying breast cancer, where the expression of 50 marker genes divides breast cancer patients into four subtypes. Substantial heterogeneity remains within these four subtypes. Given the increasing wealth of molecular data available, more comprehensive subtyping seems possible. Several studies have used deep learning methods to categorize breast cancer patients better: for instance, denoising autoencoders, an unsupervised approach, can be used to cluster breast cancer patients, and CNN's can help mitotic count divisions, a feature that is highly correlated with disease outcome in histological images. Despite these recent advances, several challenges exist in this area of research, most notably the integration of molecular and imaging data with other disparate types of data such as electronic health records (EHRs). [4]
Deep learning can be applied to answer more fundamental biological questions; it is especially suited to leveraging large amounts of data from high-throughput ‘omics’ studies. One classic biological problem where machine learning, and now deep learning, has been extensively applied is molecular target prediction. For example, deep recurrent neural networks (RNNs) have been used to predict gene targets of microRNAs (miRNAs) [28], and CNNs have been applied to predict protein residue-residue contacts and secondary structure. Other recent exciting applications of deep learning include recognition of functional genomic elements such as enhancers and promoters, and prediction of the harmful effects of nucleotide polymorphisms [3,4,5]
Deep learning methods have transformed the analysis of natural images and video, and similar examples are beginning to emerge with medical images. Deep learning has been used to classify lesions and nodules; localize organs, regions, landmarks, and lesions; segment organs, organ substructures, and lesions; retrieve images based on content; generate and enhance images, and combine images with clinical reports [4]
Artificial intelligence is a wave that is just beginning to impact science and industry and has incredible potential for helping to solve intractable problems.[5]
By:
Hasti MehraeiA member of Systems Artificial Intelligence Network (SAIN) Interest Group
References:
1-
https://academic.oup.com/bib/articleabstract/22/2/1902/5826499?redirectedFrom=fulltext
2-
https://www.nature.com/articles/d41586-018-02174-z
3-
-https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414
4-
https://royalsocietypublishing.org/doi/10.1098/rsif.2017.0387
5-
-https://wyss.harvard.edu/news/deep-learning-takes-on-synthetic-biology