Commit c5c53b6c authored by Vladimir Reinharz's avatar Vladimir Reinharz
Browse files

updated sup mat

parent 655bd888
TeX/NAR/Figure6.png

400 KB | W: | H:

TeX/NAR/Figure6.png

166 KB | W: | H:

TeX/NAR/Figure6.png
TeX/NAR/Figure6.png
TeX/NAR/Figure6.png
TeX/NAR/Figure6.png
  • 2-up
  • Swipe
  • Onion skin
TeX/NAR/Figure7.png

166 KB | W: | H:

TeX/NAR/Figure7.png

688 KB | W: | H:

TeX/NAR/Figure7.png
TeX/NAR/Figure7.png
TeX/NAR/Figure7.png
TeX/NAR/Figure7.png
  • 2-up
  • Swipe
  • Onion skin
TeX/NAR/Figure8.png

688 KB | W: | H:

TeX/NAR/Figure8.png

431 KB | W: | H:

TeX/NAR/Figure8.png
TeX/NAR/Figure8.png
TeX/NAR/Figure8.png
TeX/NAR/Figure8.png
  • 2-up
  • Swipe
  • Onion skin
TeX/NAR/FigureS7.png

167 KB | W: | H:

TeX/NAR/FigureS7.png

744 KB | W: | H:

TeX/NAR/FigureS7.png
TeX/NAR/FigureS7.png
TeX/NAR/FigureS7.png
TeX/NAR/FigureS7.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -772,3 +772,14 @@
year={2007},
publisher={Public Library of Science}
}
@article{schwarz1976codon,
title={Codon-dependent rearrangement of the three-dimensional structure of phenylalanine {tRNA}, exposing the T-$\psi$-CG sequence for binding to the 50S ribosomal subunit},
author={Schwarz, Ulrich and Menzel, Heinrich M and Gassen, Hans G},
journal={Biochemistry},
volume={15},
number={11},
pages={2484--2490},
year={1976},
publisher={ACS Publications}
}
......@@ -277,13 +277,13 @@ In RNA, a large proportion of observed covariations are adequately explained by
Since this structure is, to a large extent, already revealed by comparative analysis (and already present in the \rfam{} profile taken as input to the method), it does not constitute the primary object of interest of our study. In order to minimize the probability of detecting a local structural compensation, we require a minimal distance $\gamma$ between the mutation and the loops identified for their good NPMI values. This definition is formalized in Algo.~\ref{algo:pos}.
\subsubsection{\todo{not really interchain anymore}Interchain Positions}
\subsubsection{{\color{red}Binding interfaces positions}}
Since both negative and positive correlations can indicate
positions of interest, we use two different, $\zeta^-$ and $\zeta^+$, thresholds for the \NPMI{}s. $\zeta^+$ will be a bound on the positive values of the \NPMI and $\zeta^-$ on the negative ones. Due to the
high number of possible combinations, \NPMI{}s having values $-1$ are frequent and uninformative. They are discarded.
For those loops deemed as regions of interest, we predict that the positions having an \NPMI above $\zeta^+$ (resp. below $\zeta^-$)
have an \todo{idem}interchain interactions while the others do not.
{\color{red} are binding interfaces} while the others do not.
{\removelatexerror
\begin{algorithm}
\DontPrintSemicolon
......@@ -395,7 +395,6 @@ The whole implementation is freely available at:
To evaluate the efficiency of our method, data with the desired properties was available for {\color{red}six} RNAs~\cite{Cordero:2012aa}. These required properties are a mutate-and-map experiment data set, a determined three-dimensional structures interacting with other chain(s) and and \rfam alignment. Those {\color{red}six} RNAs are the 5S ribosomal RNA, {\color{red} the c-di-GMP riboswitch, the cobalamin riboswitch (Puzzle 6), the phenylalanine tRNA, the adenine riboswitch and the glycine riboswitch . }
\todo{ REF Table SUP MAT FOR ALL BINDING SITES}
{\color{red}The latest gave poor results, potentially due to an artificial hairpin, introduced in the sequence, binding to a small protein to help the crystalisation. The protein was missing in the MaM experiments. It was thus omitted in the present analysis. The results are shown in supplementary material (Fig.~S7).}
......@@ -420,6 +419,11 @@ To evaluate the efficiency of our method, data with the desired properties was a
The 5S ribosomal RNA is the family \texttt{RF00001} on \rfam. Its seed alignment consist of $713$ sequences. The family also provides the consensus structure. The mutate-and-map protocol was applied to the consensus sequence of $4$ structures which have as PDB identifiers \texttt{2WWQ}~\cite{2WWQ}, \texttt{3OAS} and \texttt{3OFC}~\cite{3OAS_3OFC}, and \texttt{3ORB}~\cite{3ORB}. Those four determined structures have almost the same sequence with slight differences in the length on their $5'$ and $3'$ extremities.
{\color{red}
The yeast phenylalanine tRNA is included in family \texttt{RF00005} which has $960$ seed sequences . Since its known crystal structure (PDB identifier \texttt{1EHZ}) is only
in presence of magnesium and manganese, we also considered structures of the two tRNAs in the structure of the yeast 80S ribosome-tRNA complexes (PDB identifier \texttt{3J78}).
}
The c-di-GMP riboswitch is present in family \texttt{RF01051} in \rfam, which contains $156$ sequences in its seed alignment, and a consensus structure. The consensus sequence was also built from $4$ structures, with PDB identifiers \texttt{3IWN}~\cite{3IWN} and \texttt{3MXH}, \texttt{3MUV}, \texttt{3MUT}~\cite{3MXH_3MUV_3MUT}.
Importantly, c-di-GMP is known to bind a pocket inside the 3-way junction at positions $11\mhyphen 13$, $40\mhyphen 41$ and 85 of the sequence on which the mutate-and-map experiments were run~\cite{smith2009structural, kulshina2009recognition}, and the MaM experiment was done in presence of its ligand. It is also worth noting that, in order to facilitate the crystallization, the hairpin loop L2 of this molecule has been artificially designed to bind the U1A protein. {\color{red}Here, we included only the positions binding to its ligand. Nonetheless, for completeness, we also show in the supplementary material the results obtained with these positions, hence with the c-di-gmp binding interface only.}
......@@ -427,12 +431,6 @@ Importantly, c-di-GMP is known to bind a pocket inside the 3-way junction at pos
The cobalamin riboswitch is in family \texttt{RF00174} which has $430$ seed sequences. The structure bounded to its ligand is known (PDB identifier \texttt{4GXY}). The MaM experiments
were done in the presence of cobalamin ligands.
The yeast phenylalanine tRNA is included in family \texttt{RF00005} which has $960$ seed sequences . Since its known crystal structure (PDB identifier \texttt{1EHZ}) is only
in presence of magnesium and manganese, and there is a lot of variability between species and tRNAs for their binding location, the anticodon site was designated as the binding interface.
Additionally, we located in the two tRNAs in the structure of the yeast 80S ribosome-tRNA complexes (PDB identifier \texttt{3J78}) positions at most at $5$\AA to another chain.
We used \texttt{LocaRNA}\cite{will2007inferring} to align those tRNAs to our yeast phenylalanine tRNA sequence, resulting in the positions 1, 19, $34\mhyphen 36$, $56\mhyphen 57$, $73 \mhyphen76$ (containing the anticodon) as a positive set.
The results for those positions are shown in the Supp. Mat., where we notice extremely good results only at positions $56\mhyphen57$. They are known to follow the extremely conserved pseudouridine at position $55$~\cite{becker1997yeast}.
The adenine riboswitch belongs to family \texttt{RF00167} which has $133$ seed sequences. The structure with the adenine ligand has PDB identifier \texttt {1Y26}. Three different MaM experiments were done on that particular molecule. Experiments \texttt{Adenine\char`_2} and \texttt{Adenine\char`_3} where done in presence of the ligand, and are shown in this paper.
Experiments \texttt{Adenine\char`_4} which was done in absence of the ligand gave poor results, since the MaM experiment was done on the unbounded structure. Those additional results are shown in the Supp. Mat.
......@@ -471,9 +469,15 @@ For each mutation over the \shape profile percentile cutoff $\delta$, the data s
For the 5S RNA, for each PDB model, the positive data set is composed of the positions in those regions which have the center of any of their atom at most at $5$\AA\xspace from the center of any atom of another chain the the complex. An implementation using the PyMOL Python API is included in the provided code.
For the tRNA, we only considered the three positions of the anticodon, identified using \texttt{tRNAscan-SE}~\cite{schattner2005trnascan}. Binding sites are quite different from a tRNA to the other and since there is no crystal structure of the tRNA binding with other proteins we didn't consider other possibilities. \todo{more explanations about tRNA?}
For the tRNA, we located positions at most at $5$\AA to another chain in the two tRNAs inside
the structure of the yeast 80S ribosome-tRNA complexes (PDB identifier \texttt{3J78}).
Because those were note the phenyla;anine tRNA, we used \texttt{LocaRNA}\cite{will2007inferring} to align them to our sequence, resulting in the positions 1, 19, $34\mhyphen 36$, $56\mhyphen 57$, $73 \mhyphen76$ (containing the anticodon) as the positive set. Given the poor results (shown in Supp. Mat.) we located two sets of positions known as binding interfaces.
The anticodon, identified using \texttt{tRNAscan-SE}~\cite{schattner2005trnascan}, and the T-$\psi$-C-G motif
which is known to bind the 5S RNA in the 50S ribosomal subunit~\cite{schwarz1976codon}.
For the riboswitches, from their respective crystal structures we used \texttt{Ligand Explorer}~\cite{moreland2005molecular} to identify nucleotide at most $5$\AA\xspace from the ligand.
For the riboswitches, from their respective crystal structures we used \texttt{Ligand Explorer}~\cite{moreland2005molecular} to identify nucleotide at most $5$\AA\xspace from the ligand.}
The set of all positions is found in the Supp. Mat. Table 1.}
The remaining positions compose the negative dataset. The positions not present in the model were ignored. This highlights one of the challenges of benchmarking. For the 5S rRNA, out of 121 positions, two models had 3 nucleotides missing, one had 4 missing and the other 6. {\color{red} For c-di-GMP, out of 103 positions, one model had 8 nucleotides missing, two others 21 and the last 22. Which explains some of the discrepancies between the models.}
......@@ -523,13 +527,13 @@ The ROC curves for the 4 PDB structures are shown on the same graph, for a speci
We illustrate in Fig.~\ref{fig:roc}a the results for the 5S RNA given $\delta$ at the $96^{\text{th}}$ percentile and a $\gamma$ of $23$.
In Fig.~\ref{fig:roc}b we show the result for the c-di-GMP riboswitch given a \shape distance threshold at the $93^{\text{th}}$
percentile and a $\gamma$ of $6$.
\begin{figure}[t!]
\centering
\includegraphics[width=0.47\textwidth]{Figure6.png}
\caption{{\bf Detailed discrimination power analysis of \soft.} ROC curves for 5S rRNA (a.) with $\gamma$ at $23$ and $\delta$ at $96\%$, and c-di-GMP (b.) with $\delta$ at $93\%$ and $\gamma$ at 6.}
\label{fig:roc}
\end{figure}
%
%\begin{figure}[t!]
% \centering
% \includegraphics[width=0.47\textwidth]{Figure6.png}
% \caption{{\bf Detailed discrimination power analysis of \soft.} ROC curves for 5S rRNA (a.) with $\gamma$ at $23$ and $\delta$ at $96\%$, and c-di-GMP (b.) with $\delta$ at $93\%$ and $\gamma$ at 6.}
% \label{fig:roc}
%\end{figure}
In the case of the 5S RNA,
we hypothesize that two main reasons explain the discrepancy between the different PDB models. A few nucleotides are missing from each,
......@@ -551,7 +555,7 @@ structural differences. We present in Fig.~\ref{fig:dist} the distribution of pa
\begin{figure}[htb!]
\centering
\colorbox{red}{
\includegraphics[width=0.47\textwidth]{Figure7.png}
\includegraphics[width=0.47\textwidth]{Figure6.png}
}
\caption{{\bf Distance distribution for pairs of secondary structure elements}, weighted by the numbers of non-shared nucleotides}
\label{fig:dist}%
......@@ -568,7 +572,7 @@ by compensating the effect of a disruptive mutation.
\begin{figure}[ht!]
\centering
\includegraphics[width=0.47\textwidth]{Figure8.png}
\includegraphics[width=0.47\textwidth]{Figure7.png}
\caption{{\bf Predicted positions and interacting chains of the 5S rRNA \texttt{3OFC} structure.} In red on the top right behind purple spheres is the disrupting mutation, in green the predicted position with high mutual information. The spheres around the RNA represent the subset of nucleotides at most at $5$\AA\xspace from the rRNA, from other chains in the complex. The other spheres belong to other molecules. Each sphere is color-coded to indicate its chain as follows. Chain A is black, Z purple, W pink, V light blue, O beige, F yellow and M orange.}
\label{fig:5s_3d}
\end{figure}
......@@ -585,12 +589,6 @@ The Boltzmann conformational ensemble $\mathds B_w$ of a sequence $w$ is the pro
\sum_{S\in \mathcal S}\mathbb P(S\mid\mathds B_{wt})\log\left(\frac{\mathbb P(S\mid\mathds B_{wt})}{\mathbb P(S\mid\mathds B_m}\right).
$$
%\begin{figure}[ht!]
% \centering
% \includegraphics[width=0.47\textwidth]{Figure9.png}
% \caption{AUC results where $\delta$ is the cutoff for the relative entropy, averaged over the 4 RNA structures of 5S in (a) and those of c-di-GMP in (b)}
% \label{fig:auc_remu}
%\end{figure}
We report our results in Fig.~\ref{fig:auc_remu}. Here again, our data unveil a signal that shows a correlation between the mutation identified with \soft and the RNA-binding interfaces. Nonetheless, the strength of the signal extracted with \remu is of lower magnitude than the one achieved with the \shape experiments and the mutate-and-map protocol.
......@@ -605,7 +603,7 @@ The poorest results are achieved in family RF01118. Interestingly, one of the co
\begin{figure*}[t!]
\centering
\includegraphics[width=0.96\textwidth]{Figure9.png}
\includegraphics[width=0.96\textwidth]{Figure8.png}
\caption{ {\bf Performance of \soft for \remu-predicted disruptions.} For each \rfam family, we consider all PDBs having less than 150 nucleotides, and having maximal matching score to family. For a set of extreme percentile cutoff of the \shape profile disruption in the first column (computational \remu disruption in the second
column) $\delta$ and a minimal distance $\gamma$ from the mutation. Note that the PDB models considered for the 5S family (RF0001) do not match those investigated by MaM, which explains the discrepancies observed between the results above and those of Fig.~\ref{fig:aucremumam}.}
\label{fig:rfam_best_bit}
......
......@@ -116,7 +116,7 @@ The first one is the $l^2$ norm between the whole profiles of w and wi, to evalu
\begin{figure}[ht!]
\centering
\includegraphics[width=0.49\textwidth]{FigureS1}
\includegraphics[width=1.1\textwidth]{FigureS1.png}
\caption{$\delta$ is evaluated on the $l^2$ norm between the whole profiles, to evaluate the global SHAPE disruption}
\label{fig:all}
\end{figure}
......@@ -126,7 +126,7 @@ The second is a variant which considers the maximally contributing window over t
\begin{figure}[ht!]
\centering
\includegraphics[width=0.49\textwidth]{FigureS2}
\includegraphics[width=1.1\textwidth]{FigureS2.png}
\caption{$\delta$ is evaluated on the maximally contributing window over the whole sequence}
\label{fig:max}
\end{figure}
......@@ -136,7 +136,7 @@ The second is a variant which considers the maximally contributing window over t
\begin{figure}[ht!]
\centering
\includegraphics[width=0.49\textwidth]{FigureS3}
\includegraphics[width=1.1\textwidth]{FigureS3.png}
\caption{$\delta$ is evaluated on the positions in the SSE where the mutation lies}
\label{fig:sse}
\end{figure}
......@@ -199,11 +199,11 @@ are:
\end{figure}
\subsection{glycine riboswitch}
Results for the glycine riboswitch with the 4 different distance models is shown in Fig.~\ref{fig:glyc_dist_4}.
Results for the glycine riboswitch with the 4 different distance MaM experiments is shown in Fig.~\ref{fig:glyc_dist_4}.
\begin{figure}[ht!]
\centering
\includegraphics[width=0.47\textwidth]{FigureS7.png}
\caption{AUC average for the glycine riboswitch four models}
\includegraphics[width=0.96\textwidth]{FigureS7.png}
\caption{AUC average for the glycine riboswitch on the four MaM experiments}
\label{fig:glyc_dist_4}
\end{figure}
......@@ -234,7 +234,9 @@ The results for c-di-GMP having as positive positions the ones interacting in th
& & & 3OAS & $6-12,26-33,37-38,41-52,54-57,59,70,73-84,88-104,112-116$\\
& & & 3OFC & $6-12,27-31,33,37-38,41-52,54-59,73-84,88-104,112-117$\\
& & & 3ORB & $6-12,27-31,33,37-38,41-52,54-59,73-84,88-104,112-116$\\\hline
tRNA & anticodon & RF00005 &1EHZ & $34-36$\\\hline
tRNA & anticodon & RF00005 & & $34-36$\\
& T-$psi$-C-G & & & $54-57$\\
& Prot. DNA & & 1EHZ & 1, 19, $34-36$, $56- 57$, $73-76$\\ \hline
c-di-GMP ribo. & c-di-GMP & RF01051 & 3IWN & $8-10, 28, 38, 82$\\
& & & 3MUT & $18-20, 38, 48, 92$\\
& & & 3MUV & $18-20, 38, 48, 92$\\
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment