Commit c5992274 authored by Vladimir Reinharz's avatar Vladimir Reinharz
Browse files

updated tex

parent aa4fd49d
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
TeX/NAR/Figure5.png

215 KB | W: | H:

TeX/NAR/Figure5.png

112 KB | W: | H:

TeX/NAR/Figure5.png
TeX/NAR/Figure5.png
TeX/NAR/Figure5.png
TeX/NAR/Figure5.png
  • 2-up
  • Swipe
  • Onion skin
TeX/NAR/Figure5b.png

206 KB | W: | H:

TeX/NAR/Figure5b.png

108 KB | W: | H:

TeX/NAR/Figure5b.png
TeX/NAR/Figure5b.png
TeX/NAR/Figure5b.png
TeX/NAR/Figure5b.png
  • 2-up
  • Swipe
  • Onion skin
TeX/NAR/Figure7.png

302 KB | W: | H:

TeX/NAR/Figure7.png

166 KB | W: | H:

TeX/NAR/Figure7.png
TeX/NAR/Figure7.png
TeX/NAR/Figure7.png
TeX/NAR/Figure7.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -10,7 +10,9 @@
\pubyear{XXXXX}
\jvolume{XXXXX}
\jissue{XXXXX}
\makeatletter
\newcommand{\removelatexerror}{\let\@latex@error\@gobble}
\makeatother
\usepackage{graphicx}
%\usepackage{caption}
\usepackage{amssymb}
......@@ -81,7 +83,7 @@
\usepackage[noend,ruled,vlined]{algorithm2e}
%\usepackage{tikz}
%\usepackage{todonotes}
\usepackage{todonotes}
%\newtheorem{theorem}{Theorem}[section]
%\newtheorem{lemma}[theorem]{Lemma}
......@@ -274,15 +276,15 @@ In RNA, a large proportion of observed covariations are adequately explained by
Since this structure is, to a large extent, already revealed by comparative analysis (and already present in the \rfam{} profile taken as input to the method), it does not constitute the primary object of interest of our study. In order to minimize the probability of detecting a local structural compensation, we require a minimal distance $\gamma$ between the mutation and the loops identified for their good NPMI values. This definition is formalized in Algo.~\ref{algo:pos}.
\subsubsection{Interchain Positions}
\subsubsection{\todo{not really interchain anymore}Interchain Positions}
Since both negative and positive correlations can indicate
positions of interest, we use two different, $\zeta^-$ and $\zeta^+$, thresholds for the \NPMI{}s. $\zeta^+$ will be a bound on the positive values of the \NPMI and $\zeta^-$ on the negative ones. Due to the
high number of possible combinations, \NPMI{}s having values $-1$ are frequent and uninformative. They are discarded.
For those loops deemed as regions of interest, we predict that the positions having an \NPMI above $\zeta^+$ (resp. below $\zeta^-$)
have an interchain interactions while the others do not.
\begin{algorithm}[t]
have an \todo{idem}interchain interactions while the others do not.
{\removelatexerror
\begin{algorithm}
\DontPrintSemicolon
\SetAlgoLined
\SetKwFunction{shapeDisruption}{shapeDisruption}
......@@ -300,9 +302,10 @@ $l = D = \varnothing$\;
\Return{D}
\caption{disruptiveMutations$\left(MaM, \delta\right)$}
\label{algo:mut}
\end{algorithm}
\end{algorithm}}
\begin{algorithm}[t]
{\removelatexerror
\begin{algorithm}[H]
\DontPrintSemicolon
\SetAlgoLined
\SetKwFunction{getAllNPMIs}{getAllNPMIs}
......@@ -326,9 +329,10 @@ $\zeta^-\leftarrow -1\times\percentile(-1\times a[-1<a<0], \zeta_c)$\;
\Return{q}
\caption{filterNPMI$\left(m, p, MSA, \zeta_c\right)$}
\label{algo:npmi}
\end{algorithm}
\end{algorithm}}
\begin{algorithm}[t]
{\removelatexerror
\begin{algorithm}[H]
\DontPrintSemicolon
\SetAlgoLined
\SetKwFunction{maxShortestPath}{distance}
......@@ -353,10 +357,10 @@ $g\leftarrow\SGraph(S)$\;
\Return{p}
\caption{filterNearbyPositions$\left(m, S, \gamma\right)$}
\label{algo:pos}
\end{algorithm}
\end{algorithm}}
\begin{algorithm}[t]
{\removelatexerror
\begin{algorithm}[H]
\DontPrintSemicolon
\SetAlgoLined
\SetKwFunction{filterMutations}{disruptiveMutations}
......@@ -371,7 +375,7 @@ $M_{\text{disruption}}\leftarrow \filterMutations(MaM, \delta)$\;
\Return{p}
\caption{aRNhAck$\left(S, MaM, MSA, \delta, \gamma, \zeta_c\right)$}
\label{algo:arnhack}
\end{algorithm}
\end{algorithm}}
%\caption{The \soft algorithm}
......@@ -388,16 +392,15 @@ The whole implementation is freely available at:
\subsection{Dataset}
To evaluate the efficiency of our method, data with the desired properties was available for {\color{red}six} RNAs~\cite{Cordero:2012aa}. These required properties are a mutate-and-map experiment data set, a determined three-dimensional structures interacting with other chain(s) and and \rfam alignment. Those {\color{red}six} RNAs are the 5S ribosomal RNA, {\color{red} the c-di-GMP riboswitch, the cobalamin riboswitch (Puzzle 6), the adenine riboswitch, the phenylalanine tRNA and the glycine riboswitch . }
To evaluate the efficiency of our method, data with the desired properties was available for {\color{red}six} RNAs~\cite{Cordero:2012aa}. These required properties are a mutate-and-map experiment data set, a determined three-dimensional structures interacting with other chain(s) and and \rfam alignment. Those {\color{red}six} RNAs are the 5S ribosomal RNA, {\color{red} the c-di-GMP riboswitch, the cobalamin riboswitch (Puzzle 6), the phenylalanine tRNA, the adenine riboswitch and the glycine riboswitch . }
{\color{red} REF Table SUP MAT FOR ALL BINDING SITES}
\todo{ REF Table SUP MAT FOR ALL BINDING SITES}
{\color{red}The latest gave poor results, potentially due to an artificial hairpin, introduced in the sequence, binding to a small protein to help the crystalisation. The protein was missing in the MaM experiments. It was thus omitted in the present analysis. The results are shown in supplementary material (Fig.~S7).}
\begin{table}[t]
\begin{table}[t!]
\centering
\colorbox{red}{
{\color{red}
\begin{tabular}{lc}
\multicolumn{1}{l|}{RNA} & Bit Score \\\hline
5S & \phantom{0}48.65 \\
......@@ -412,24 +415,29 @@ To evaluate the efficiency of our method, data with the desired properties was a
\label{table:ali_entropy}
\end{table}
The 5S ribosomal RNA is the family \texttt{RF00001} on \rfam. Its seed alignment consist of $713$ sequences. The family also provides the consensus structure. The mutate-and-map protocol was applied to the consensus sequence of $4$ structures which have as PDB identifiers \texttt{2WWQ}~\cite{2WWQ}, \texttt{3OAS} and \texttt{3OFC}~\cite{3OAS_3OFC}, and \texttt{3ORB}~\cite{3ORB}. We present in Fig.~\ref{fig:shape}a, for every position $i$, the value of $\Delta(S, S_i)$, with the aligned \rfam consensus secondary structure below. Those four determined structures have almost the same sequence with slight differences in the length on their $5'$ and $3'$ extremities.
{\color{red}
The phenylalanine tRNA RF00005 we have binding to MG and MN
}
The c-di-GMP riboswitch is present in family \texttt{RF01051} in \rfam, which contains $156$ sequences in its seed alignment, and a consensus structure. The consensus sequence was also built from $4$ structures, with PDB identifiers \texttt{3IWN}~\cite{3IWN} and \texttt{3MXH}, \texttt{3MUV}, \texttt{3MUT}~\cite{3MXH_3MUV_3MUT}. We similarly present in Fig.~\ref{fig:shape}b the values of $\Delta(S, S_i)$, with the aligned \rfam consensus secondary structure below.
Importantly, c-di-GMP is known to bind a pocket inside the 3-way junction at positions 11,12,13, 40,41 and 85 of the sequence on which the mutate-and-map experiments were run~\cite{smith2009structural, kulshina2009recognition}. It is also worth noting that, in order to facilitate the crystallization, the hairpin loop L2 of this molecule has been artificially designed to bind the U1A protein. Here, we included these positions in the set of interacting positions. Nonetheless, for completeness, we also show in the supplementary material the results obtained without these positions, hence with the c-di-gmp binding interface only.
The 5S ribosomal RNA is the family \texttt{RF00001} on \rfam. Its seed alignment consist of $713$ sequences. The family also provides the consensus structure. The mutate-and-map protocol was applied to the consensus sequence of $4$ structures which have as PDB identifiers \texttt{2WWQ}~\cite{2WWQ}, \texttt{3OAS} and \texttt{3OFC}~\cite{3OAS_3OFC}, and \texttt{3ORB}~\cite{3ORB}. Those four determined structures have almost the same sequence with slight differences in the length on their $5'$ and $3'$ extremities.
{\color{red}
The cobalamin riboswitch RF00174
}
The c-di-GMP riboswitch is present in family \texttt{RF01051} in \rfam, which contains $156$ sequences in its seed alignment, and a consensus structure. The consensus sequence was also built from $4$ structures, with PDB identifiers \texttt{3IWN}~\cite{3IWN} and \texttt{3MXH}, \texttt{3MUV}, \texttt{3MUT}~\cite{3MXH_3MUV_3MUT}.
Importantly, c-di-GMP is known to bind a pocket inside the 3-way junction at positions 11,12,13, 40,41 and 85 of the sequence on which the mutate-and-map experiments were run~\cite{smith2009structural, kulshina2009recognition}, and the MaM experiment was done in presence of its ligand. It is also worth noting that, in order to facilitate the crystallization, the hairpin loop L2 of this molecule has been artificially designed to bind the U1A protein. {\color{red}Here, we included only the positions binding to its ligand. Nonetheless, for completeness, we also show in the supplementary material the results obtained with these positions, hence with the c-di-gmp binding interface only.}
{\color{red}
The adenine riboswitch RF00167
The cobalamin riboswitch is in family \texttt{RF00174} which has $430$ seed sequences. The structure bounded to its ligand is known (PDB identifier \texttt{4GXY}). The MaM experiments
were done in the presence of cobalamin ligands.
The yeast phenylalanine tRNA is included in family \texttt{RF00005} which has $960$ seed sequences . Since its known crystal structure (PDB identifier \texttt{1EHZ}) is only
in presence of magnesium and manganese, and there is a lot of variability between species and tRNAs for their binding location, the anticodon site was designated as the binding interface.
The adenine riboswitch belongs to family \texttt{RF00167} which has $133$ seed sequences. The structure with the adenine ligand has PDB identifier \texttt {1Y26}. Three different MaM experiments were done on that particular molecule. Experiments \texttt{Adenine\char`_2} and \texttt{Adenine\char`_3} where done in presence of the ligand, and are shown in this paper.
Experiments \texttt{Adenine\char`_4} which was done in absence of the ligand gave poor results, since the MaM experiment was done on the unbounded structure. Those additional results are shown in the Supp. Mat.
For each molecule, we present in Fig.~\ref{fig:shape}, for every position $i$, the disruption of the shape profile when a mutation occurs at that position (i.e. $\Delta(S, S_i)$), with the aligned \rfam consensus secondary structure below.
}
\begin{figure*}[ht!]
\centering
\colorbox{red}{
......@@ -448,6 +456,7 @@ We omitted the shortest sequences (i.e. \rfam families RF00032, RF00037 and RF00
\subsection{Experimental design}
The \texttt{Infernal 1.1}~\cite{nawrocki2009infernal} software was used with default parameter values to: 1) create a covariance model for each alignment, and; 2) align the sequence from the mutate-and-map experiment with the generated covariance model. The consensus secondary structure was then restricted to gapless positions within the aligned sequence $\Seq$.
\todo{explain each case!!!}
For each mutation over the \shape profile percentile cutoff $\delta$, the data set was composed of the regions of interest given $\gamma$, i.e. the set of positions returned by the Algo~\ref{algo:pos}. For each PDB model, the positive data set is composed of the positions in those regions which have the center of any of their atom at most at $5$\AA\xspace from the center of any atom of another chain the the complex. An implementation using the PyMOL Python API is included in the provided code.
The remaining positions compose the negative dataset. The positions not present in the model were ignored. This highlights one of the challenges of benchmarking. For the 5S rRNA, out of 121 positions, two models had 3 nucleotides missing, one had 4 missing and the other 6. For c-di-GMP, out of 103 positions, one model had 8 nucleotides missing, two others 21 and the last 22.
......@@ -464,15 +473,15 @@ It is important to recall that the set of positives and negatives is influenced
\begin{figure*}[ht!]
\centering
\colorbox{red}{
\begin{subfigure}{1\textwidth}
\colorbox{red}{% \includegraphics[width=0.96\textwidth]{Figure5.png}
\begin{subfigure}{\textwidth}
\includegraphics[width=0.96\textwidth]{Figure5.png}
\caption{MaM}
\label{fig:auc}
\end{subfigure}
}
\colorbox{red}{
\begin{subfigure}{1\textwidth}
\begin{subfigure}{\textwidth}
\includegraphics[width=0.96\textwidth]{Figure5b.png}
\caption{\remu}
\label{fig:auc_remu}
......@@ -499,7 +508,7 @@ We illustrate in Fig.~\ref{fig:roc}a the results for the 5S RNA given $\delta$
In Fig.~\ref{fig:roc}b we show the result for the c-di-GMP riboswitch given a \shape distance threshold at the $93^{\text{th}}$
percentile and a $\gamma$ of $6$.
\begin{figure}[ht!]
\begin{figure}[t!]
\centering
\includegraphics[width=0.47\textwidth]{Figure6.png}
\caption{{\bf Detailed discrimination power analysis of \soft.} ROC curves for 5S rRNA (a.) with $\gamma$ at $23$ and $\delta$ at $96\%$, and c-di-GMP (b.) with $\delta$ at $93\%$ and $\gamma$ at 6.}
......@@ -523,14 +532,14 @@ structural differences. We present in Fig.~\ref{fig:dist} the distribution of pa
\begin{figure*}[t!]
\begin{figure}[htb!]
\centering
\colorbox{red}{
\includegraphics[width=0.96\textwidth]{Figure7.png}
\includegraphics[width=0.47\textwidth]{Figure7.png}
}
\caption{{\bf Distance distribution for pairs of secondary structure elements}, weighted by the numbers of non-shared nucleotides}
\label{fig:dist}%
\end{figure*}
\end{figure}
......@@ -541,7 +550,7 @@ similar to compensatory mutations, but at the level of the quaternary structure.
In this vision, a set of mutations contributes to reestablish the opportunity for participating in complexes,
by compensating the effect of a disruptive mutation.
\begin{figure}[t]
\begin{figure}[ht!]
\centering
\includegraphics[width=0.47\textwidth]{Figure8.png}
\caption{{\bf Predicted positions and interacting chains of the 5S rRNA \texttt{3OFC} structure.} In red on the top right behind purple spheres is the disrupting mutation, in green the predicted position with high mutual information. The spheres around the RNA represent the subset of nucleotides at most at $5$\AA\xspace from the rRNA, from other chains in the complex. The other spheres belong to other molecules. Each sphere is color-coded to indicate its chain as follows. Chain A is black, Z purple, W pink, V light blue, O beige, F yellow and M orange.}
......@@ -578,7 +587,7 @@ not provided in the PDB structures are considered as negative.
The poorest results are achieved in family RF01118. Interestingly, one of the conserved feature of this family structure is the presence of a pseudoknot, which is not modeled in the thermodynamic model underlying \remu. For those particular cases, only chemical experiments such as MaM can provide us trustworthy information about the destabilization produced by single point mutations. This reinforces the importance of producing further experimental data to reach the best performances.
\begin{figure*}[ht!]
\begin{figure*}[t!]
\centering
\includegraphics[width=0.96\textwidth]{Figure9.png}
\caption{ {\bf Performance of \soft for \remu-predicted disruptions.} For each \rfam family, we consider all PDBs having less than 150 nucleotides, and having maximal matching score to family. For a set of extreme percentile cutoff of the \shape profile disruption in the first column (computational \remu disruption in the second
......
......@@ -225,14 +225,16 @@ The results for c-di-GMP having as positive positions the ones interacting in th
\end{figure}
\section{Dataset binding positions}
\begin{table}[ht!]
\rotatebox{90}{
\begin{tabular}{lllll}
RNA & Binding to & RFAM & PDB(s) &Binding Positions on PDB \\\hline\hline
5S & Prots. & RF00001 & 2WWQ & $7-13,27-33,38,41-57,59-60,70,73-84,88-104,112-116$\\
& & & 3OAS & $6-12,26-33,37-38,41-52,54-57,59,70,73-84,88-104,112-116$\\
& & & 3OFC & $6-12,27-31,33,37-38,41-52,54-59,73-84,88-104,112-117$\\
& & & 3ORB & $6-12,27-31,33,37-38,41-52,54-59,73-84,88-104,112-116$\\\hline
tRNA & Prots. and DNA & RF00005 &1EHZ & $1, 19, 34-36, 56-57, 73-76$\\\hline
tRNA & anticodon & RF00005 &1EHZ & $34-36$\\\hline
c-di-GMP ribo. & c-di-GMP & RF01051 & 3IWN & $8-10,28,38,53-64,66-72,82$\\
& & & 3MUT & $18-20,38,48,61-64,75,92$\\
& & & 3MUV & $18-20,34,38,48,60-64,75,92$\\
......@@ -241,7 +243,8 @@ The results for c-di-GMP having as positive positions the ones interacting in th
adenine ribo. & adenine & RF00167 & 1Y26 & $21-22,47,50-52,73-75$\\\hline
glycine ribo. & glycine & RF00504 & 3P49 & $35-39, 46, 48-42, 110-114,137, 139-143$\\
\end{tabular}
\caption{{\color{red}For each RNA some info}}
}
\caption{For each RNA some info}
\label{table:datasetinfo}
\end{table}
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment