Commit a2037b01 authored by Carlos GO's avatar Carlos GO
Browse files

clarifications

parent 12cb0a6a
......@@ -1445,3 +1445,13 @@ CONCLUSION: Adaptation time of molecular quasispecies to a given environment is
Publisher = {Oxford Univ Press},
Title = {{NNDB}: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure},
Year = {2009}}
@article{ivica2013paradox,
title={The paradox of dual roles in the {RNA} world: resolving the conflict between stable folding and templating ability},
author={Ivica, Nikola A and Obermayer, Benedikt and Campbell, Gregory W and Rajamani, Sudha and Gerland, Ulrich and Chen, Irene A},
journal={Journal of molecular evolution},
volume={77},
number={3},
pages={55--63},
year={2013},
publisher={Springer}
}
%!TEX root = main_maternal.tex
\section{Discussion}
We provided evidence that in the absence of selective pressure the structure of the evolutionary landscape could have helped to promote the emergence of an RNA-based form of life. To support our hypothesis, we built a comprehensive representation of the evolutionary landscape of RNA molecules, and investigated scenarios based on distinct hypotheses.
We provided evidence that in the absence of selective pressure the structure of the \st{evolutionary} \hlt{mutational} landscape could have helped to promote the emergence of an RNA-based form of life. To support our hypothesis, we built a comprehensive representation of the \st{evolutionary} \hlt{mutational} landscape of RNA molecules, and investigated scenarios based on distinct hypotheses.
Our results offer solid foundations to parsimonious evolutionary scenarios based on undirected molecular self-replications with occasional mutations. In these simple models, the GC content appears as a key feature to determine the probability of discovering stable multi-branched secondary structures. In particular, intermediate GC contents (i.e. 0.5) result in a drift of the population toward a sub-space of the evolutionary landscape that drastically increases the probability of discovering thermodynamically stable complex shapes essential for the emergence of life at the molecular level.
Our results offer solid foundations to parsimonious evolutionary scenarios based on undirected molecular self-replications with occasional mutations. In these simple models, the GC content appears as a key feature to determine the probability of discovering stable multi-branched secondary structures. In particular, intermediate GC contents (i.e. 0.5) result in a drift of \hlt{randomly replicating} populations toward a sub-space of the evolutionary landscape \hlt{uncovered by \RNAmutants} that drastically increases the probability of discovering thermodynamically stable complex shapes essential for the emergence of life at the molecular level.
The preservation of intermediate GC content values appeared to us as a reasonable assumption, which could reflect the availability of various nucleotides in the prebiotic milieu. This nucleotide composition bias can be interpreted as an intrinsic force that favoured the emergence of life. It also offers novel insights into fundamental properties of the genetic alphabet \citep{Gardner:2003aa}.
......@@ -13,7 +14,7 @@ Eventually, our results could be used to put in perspective earlier findings sug
Our analysis completes recent studies that aimed to characterize fundamental properties of genotype-phenotype maps \citep{Greenbury:2015aa,Manrubia:2017aa}, and showed that their structure may contribute to the emergence of functional molecules \citep{Dingle:2015aa}. It also emphasizes the relevance of theoretical models based on a thermodynamical view of prebiotic evolution \cite{Pascal:2013aa}.
The size of the RNA sequences considered in this study has been fixed at 50 nucleotides. This length appears to be the current upper limit for non-enzymatic synthesis \citep{Hill:1993aa}, and therefore maximizes the expressivity of our evolutionary scenario. Variations of the sizes of populations or lengths of RNA sequences could be eventually considered with the implementation of dedicated algorithms \citep{Waldispuhl:2002aa}. Although we do not expect any major impact on our conclusions.
The size of the RNA sequences considered in this study has been fixed at 50 nucleotides. This length appears to be the current upper limit for non-enzymatic synthesis \citep{Hill:1993aa}, and therefore maximizes the expressivity of our evolutionary scenario. Variations of the sizes of populations or lengths of RNA sequences \hlt{resulting from indels} could be eventually considered with the implementation of dedicated algorithms \citep{Waldispuhl:2002aa}. Although we do not expect any major impact on our conclusions.
The error rates considered in this study were chosen to match values used in previous related works (e.g \citep{manrubia2007modular}). This choice is also corroborated by recent experiments suggesting that early life scenarios could sustain high error rates \cite{Rajamani:2010aa}. Nevertheless, lower mutation rates would only increase the number of generations needed to reach the asymptotic behaviour (See \textbf{Fig.~\ref{fig:tamura}}), and thus would not affect our results.
......
......@@ -54,7 +54,7 @@ In the most commonly accepted scenarios, the establishment of a stable, autonomo
Interestingly, \textit{in vitro} experiments revealed the extreme versatility of random nucleic acids \citep{Beaudry:1992aa,Bartel:1993aa,Schultes:2005aa}. Other studies have also suggested that essential RNA molecules such as the hammerhead ribozyme have multiple origins \citep{Salehi-Ashtiani:2001aa}. All together, these observations reinforce the plausibility of a spontaneous emergence of multiple functional sub-units. But they also question us about the likelihood of such events and the existence of intrinsic forces promoting these phenomena.
% models boosting structural complexity
Various theoretical models have been proposed to highlight mechanisms that may have favoured the birth and growth of structural complexity from replications of small monomers. Computational studies have been of tremendous help to validate these theories and quantify their impact. In particular, numerical simulations enabled us to explore the effects of polymerization on mineral surfaces \citep{Szabo:2002aa,Briones:2009aa} or the importance of spatial distribution \citep{Shay:2015aa}. Still, the debate about the necessity for such hypothesis remains open.
Various theoretical models have been proposed to highlight mechanisms that may have favoured the birth and growth of structural complexity from replications of small monomers. Computational studies have been of tremendous help to validate these theories and quantify their impact. In particular, numerical simulations enabled us to explore the effects of polymerization on mineral surfaces \citep{Szabo:2002aa,Briones:2009aa} or the importance of spatial distribution \citep{Shay:2015aa}. \hlt{Another important aspect} of early life models is the tradeoff between stability and structural complexity. Stable folds often lack the complexity necessary to support novel functions but are more resilient to harsh pre-cellular environments ~\cite{ivica2013paradox}. \todo{GC content?} Still, the debate about the necessity for such hypotheses remains open.
%\subsection{Our contribution}
In this work, we show that structural complexity can naturally emerge without the help of any sophisticated molecular mechanisms. We reveal subtle topological features of RNA mutational networks that helped to promote the discovery of functional RNAs at the early stages of the RNA world hypothesis. We demonstrate that in the absence of selective pressure, self-replicating RNA populations naturally drift toward \st{a singular region} \hlt{regions} of the sequence landscape enriched in complex structures, allowing for the simultaneous discovery of all molecular components needed to form a complete functional system.
......
......@@ -57,7 +57,7 @@ $\\\small$^1$ School of Computer Science, McGill University, Montreal, Canada\\\
\begin{abstract}
The RNA world hypothesis relies on the ability of ribonucleic acids to replicate and spontaneously acquire complex structures capable of supporting essential biological functions. Multiple sophisticated evolutionary models have been proposed, but they often assume specific conditions.
%
In this work we explore a simple and parsimonious scenario describing the emergence of complex molecular structures at the early stages of life. We show that at specific GC-content regimes, an undirected replication model is sufficient to explain the apparition of multi-branched RNA secondary structures -- a structural signature of many essential ribozymes. We ran a large scale computational study to map energetically stable structures on complete mutational networks of 50-nucleotide-long RNA sequences. Our results reveal a distinct region of the sequence landscape enriched with multi-branched structures bearing strong similarities to those observed in databases. A random replication mechanism preserving a $50\%$ GC-content suffices to explain a natural drift of RNA populations toward this particular region.
In this work we explore a simple and parsimonious scenario describing the emergence of complex molecular structures at the early stages of life. We show that at specific GC-content regimes, an undirected replication model is sufficient to explain the apparition of multi-branched RNA secondary structures -- a structural signature of many essential ribozymes. We ran a large scale computational study to map energetically stable structures on complete mutational networks of 50-nucleotide-long RNA sequences. Our results reveal \st{a distinct region} \hlt{regions} of the sequence landscape enriched with multi-branched structures bearing strong similarities to those observed in databases. A random replication mechanism preserving a $50\%$ GC-content suffices to explain a natural drift of RNA populations toward \st{this particular region} \hlt{complex stable structures}.
\end{abstract}
......
......@@ -45,7 +45,7 @@ When a sequence is selected for replication, the child sequence is formed by cop
\subsubsection{Controlling population GC content}
\label{sec:gc_control}
There are two obstacles to maintaining evolving populations within the desired GC content range of $\pm 0.1$. First, an initial population of random sequences sampled uniformly from the full alphabet naturally tends converge to a GC content of $0.5$. To avoid this, we sample from the alphabet with probability of sampling GC and AU equal to the desired GC content. This way our initial population has the desired nucleotide distribution. Second, when running the simulation, random mutations are able to move replicating sequences outside of the desired range. Given that we are selecting for stable structures, it is likely to drive the population to higher GC contents. To avoid this, at the selection stage, we do not select mutations that would take the sequence outside of this range. Instead, if a mutation takes a replicating sequence outside the GC range, we simply repeat the mutation process on the sequence until the child sequence has the appropriate GC content (See {\bf Alg. ~\ref{alg:gc}}).\\
There are two obstacles to maintaining evolving populations within the desired GC content range of $\pm 0.1$. First, an initial population of random sequences sampled uniformly from the full alphabet naturally tends converge to a GC content of $0.5$. To avoid this, we sample from the alphabet with probability of sampling GC and AU equal to the desired GC content. This way our initial population has the desired nucleotide distribution. Second, when running the simulation, random mutations are able to move replicating sequences outside of the desired range, \hlt{especially at extremes of mutation rate and GC content.} To avoid this drift, at the selection stage, we do not select mutations that would take the sequence outside of this range. Instead, if a mutation takes a replicating sequence outside the GC range, we simply repeat the mutation process on the sequence until the child sequence has the appropriate GC content (See {\bf Alg. ~\ref{alg:gc}}). Given that populations are initialized in the appropriate GC range, we are likely to find valid mutants relatively quickly and always avoid drifting away from the target GC.\\
\IncMargin{1em}
\begin{algorithm}[H]
......
......@@ -90,7 +90,7 @@ Eventually, we also note a smaller peak of multi-loop occurences closer from the
Our \RNAmutants simulations suggest that mutational networks traversed in an energy based manner can yield stable and diverse structures. We revealed that multi-branched structures reside in \st{specific} regions of the mutational landscape at fixed mutational distances (primarily at 30--40 mutations from random seeds) and GC contents ($0.5$ for the majority of structures). At this point, our main question is to determine if a natural selection process, independent of a particular target, is capable of reaching these regions.
To address this question we build an evolutionary algorithm named \maternal, where the fitness is proportional to the folding energies of the molecules. Intuitively, \maternal enables us to simulate the behaviour of a population of RNAs selecting the most functional sequences (i.e. most stable) regardless of the structures used to carry the functions. The selected structures are therefore by-products of intrinsic adaptive forces.
To address this question we build an evolutionary algorithm named \maternal, where the fitness is proportional to the folding energies of the molecules. Intuitively, \maternal enables us to simulate the behaviour of a population of RNAs selecting the most functional sequences (i.e. most stable) regardless of the structures used to carry the functions. These selected structures are therefore by-products of intrinsic adaptive forces.
We start all simulations from random populations of size $1000$ and sequences of length $50$, and performed $50$ independent simulations for each GC content and various mutation rates. All simulations were run for $1000$ generations. {\bf Fig.~\ref{fig:energy}} shows the mean of the folding energies of \maternal sequences binned by their distance (i.e. number of mutations) from the initial population. We find that populations of random RNA sequences are able to quickly find low energy solutions (less than 100 generations and $\sim 12$ mutations from the initial populations; See {\bf Fig.~\ref{fig:mat_gens}}). Furthermore, we note that although \maternal selects on average more stable sequences in the vicinity of the seeds (i.e. the initial population) than \RNAmutants, this does not hold for higher mutational distance and higher GC contents.
......@@ -99,7 +99,7 @@ More precisely, we observe that at short mutational distances, evolutionary appr
Interestingly, we find that varying the mutation rate does not have a strong effect on the energy of populations obtained (See {\bf Fig.~\ref{fig:mat_muts}}). We see that the average population energy remains within $\pm 5\:\kcalmol$ for all mutation rates except the highest of $0.1$. This apparent property of energy based evolutionary models suggests that the search for stable structures is flexible to external conditions such as varying mutation rates and may provide a mechanism for better exploring phenotype space without sacrificing stability.
%\subsubsection{Structural diversity}
While we observe very efficient energy optimization from the natural selection process, \maternal fails to generate the structural complexity found in real populations, but successfully uncovered by \RNAmutants (See {\bf Table.~\ref{tab:matmut_table}}). This is likely due to the rapid depletion of diversity inherent to selection under fixed population sizes and the strong selection for highly stable yet simple folds. However, our model does allow for some control over the degree of complexity obtained. Notably, we see that the mutation rate has a strong impact on the mean stack content, internal loop, and multi-loop content across populations whereby higher mutation rates promote the discovery of more complex structures (See {\bf Fig.~\ref{fig:matcomplexity}}). This increase in occurrence of complex motifs with high mutation rates, notably multi-loops, does not result in the fixation of these structures in the population, rather in their occasional sampling and subsequent disappearance.
While we observe very efficient energy optimization from the natural selection process, \maternal fails to generate the structural complexity found in \hlt{databases}, but successfully uncovered by \RNAmutants (See {\bf Table.~\ref{tab:matmut_table}}). This is likely due to the rapid depletion of diversity inherent to selection under fixed population sizes and the strong selection for highly stable yet simple folds. However, our model does allow for some control over the degree of complexity obtained. Notably, we see that the mutation rate has a strong impact on the mean stack content, internal loop, and multi-loop content across populations whereby higher mutation rates promote the discovery of more complex structures (See {\bf Fig.~\ref{fig:matcomplexity}}). This increase in occurrence of complex motifs with high mutation rates, notably multi-loops, does not result in the fixation of these structures in the population, rather in their occasional sampling and subsequent disappearance.
\subsection{An undirected evolutionary scenario}
\label{sec:undirected_evolution}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment