The Limited Times

Now you can see non-English news...

Mathematics to create a cellular map of diseases

2022-04-26T20:52:32.898Z


To characterize all human cells, it is necessary to incorporate mathematical methods that allow extracting all the relevant information and at the same time simplifying it


The human body is estimated to contain 30 trillion cells organized into tissues.

Each human cell contains 6.4 billion DNA nucleotides, which are structured around 20,000 coding genes, and each gene can give rise to multiple proteins.

An international consortium of scientists is trying to compose an atlas (

Human Cell Atlas

) to characterize molecularly (DNA, genes, proteins) and morphologically all the cells that make up the human body.

This tremendous technical and economic effort has to incorporate mathematical methods that make it possible to extract all the relevant information and at the same time simplify it, to make it interpretable.

To meet this challenge,

dimensionality reduction techniques have become popular in recent years.

for single cell data analysis.

Currently we can characterize each cell very exhaustively.

On the one hand, thanks to complex molecular biology techniques, we can identify the mutations present in the DNA of a specific cell or quantify the expression of the catalog of genes and proteins specifically expressed in it.

This information is incorporated into a matrix with more than 20,000 rows —the approximate number of genes expressed in an experiment—, and as many columns as there are cells being analyzed, currently tens of thousands.

On the other hand, imaging techniques —with increasingly higher resolution— are used to explore changes in the shape, size or structure of each cell.

Our ability to study this large amount of jointly generated data is very limited, due to both its dimensionality and its heterogeneity.

Dimensionality reduction techniques allow cell maps to be created in just two dimensions, chosen to ensure that as much information as possible is preserved while being synthesized, facilitating the identification of groups of similar cells or

clusters

, their visualization and its subsequent interpretation.

Thanks to these maps, it has been possible to identify and quantify new cell subtypes associated with the genesis and development of different complex diseases, from cancer to cardiovascular diseases.

The most traditional techniques of dimensionality reduction, such as principal component analysis proposed by Karl Pearson more than a century ago, are based on the linear projection of information on a hyperplane, like a photograph projects the three-dimensional world on the plane. of focus.

These techniques have the advantage of respecting real distances relatively well in low-dimensional space, but they are often unable to capture all the complexity contained in the data, especially if the relationship between the system variables is nonlinear, as is the case with the molecular and phenotypic variables that can be measured in a cell.

For this reason, in the last decade new non-linear dimensionality reduction techniques have been proposed.

The idea behind them is to identify a new two-dimensional space that summarizes as much information as possible, preserving distances insanely, to the detriment of losing, to a certain extent, the global structure.

This allows us to identify groups of similar elements, for example cells, in the two-dimensional representation, even though the distances between the different groups are distorted.

Its behavior is similar to that of the Mercator cartographic projection, the most used to make world maps, which increases the distortion of areas and distances as we get closer to the poles.

At a local level, distances are maintained, that is, geographically close areas are on a map, but remote areas do not maintain distances when they cross meridians, which does not prevent the map from continuing to be useful.

To achieve their goal, these new methods use iterative algorithms based on directed graphs, built from the calculation of distances between data neighborhoods, generating attractive or repulsive forces in the new representation space depending on their similarity.

The way in which the concept of neighborhood is defined in each data, together with how and under what circumstances these forces are generated, is the key and the main difference between the different algorithms that we can find, such as

t-distributed stochastic neighbor embedding (t -sne)

or the more recent

Uniform Manifold Approximation and Projection (UMAP)

.

The mathematical theory behind the latter mixes concepts from algebraic topology, Riemanian geometry, and fuzzy logic to generate a representation of the data in the form of a graph;

and probability theory, optimization, and mathematical programming to optimize its representation as faithfully as possible in a space of lower dimensions.

The result is a powerful, fast, and scalable dimensionality reduction method that is highly useful in multidimensional data analysis, and in particular, in single-cell molecular data analysis.

Despite its strengths, understanding the underlying mathematics is crucial to interpreting its results correctly.

These new dimensionality reduction algorithms well represent the type of methodologies that we must continue to develop in order to analyze the large amounts of biomedical data that are being generated, the volume and complexity of which will continue to increase in the coming decades.

Only with the right mathematics will we be able to continue advancing in the understanding of the causal mechanisms of complex diseases, from cancer to Alzheimer's and cardiovascular diseases, and thus, in the implementation of precision medicine.

Fátima Sánchez Cabo

is director of the Bioinformatics Unit of

the National Center for Cardiovascular Research

(CNIC) and associate professor at the

Autonomous University of Madrid

;

Daniel Jiménez Carretero is a senior technician at the

CNIC

's Bioinformatics Unit

.

Coffee and Theorems

is a section dedicated to mathematics and the environment in which it is created, coordinated by the Institute of Mathematical Sciences (ICMAT), in which researchers and members of the center describe the latest advances in this discipline, share meeting points between mathematics and other social and cultural expressions and remember those who marked their development and knew how to transform coffee into theorems.

The name evokes the definition of the Hungarian mathematician Alfred Rényi: “A mathematician is a machine that transforms coffee into theorems”.

Edition and coordination:

Ágata A. Timón G Longoria (ICMAT)

.

You can follow MATERIA on

Facebook

,

Twitter

and

Instagram

, or sign up here to receive

our weekly newsletter

.

Exclusive content for subscribers

read without limits

subscribe

I'm already a subscriber

Source: elparis

All news articles on 2022-04-26

You may like

News/Politics 2024-04-02T04:27:07.975Z
News/Politics 2024-04-13T03:23:00.246Z

Trends 24h

Latest

© Communities 2019 - Privacy

The information on this site is from external sources that are not under our control.
The inclusion of any links does not necessarily imply a recommendation or endorse the views expressed within them.