Commit d4a70d55 authored by Vladimir Reinharz's avatar Vladimir Reinharz
Browse files

Merge branch 'master' of jwgitlab.cs.mcgill.ca:vreinharz/arnhack

parents de5cb1b5 3f701eac
......@@ -38,7 +38,7 @@ Dear Dr. Alan Kimmel,
We thank you for giving us an opportunity to improve our manuscript. We do agree with the reviewer that additional validations will improve the impact and significance of our work.\\
We thank the reviewer for pointing at us novel mutate-and-map experiments that have been released after the initial submission of our manuscript. In this revision, we include all new mutate-and-map experiments for which we could (i) identify evolutionary conserved intermolecular interactions, (ii) find an experimentally determined structure in the PDB, and (iii) obtain an alignment from the Rfam family. We provide below a list of all experiments included in our updated benchmark, and a justification for the ones that we could not included inside.
We thank the reviewer for pointing at us novel mutate-and-map experiments that have been released after the initial submission of our manuscript. In this revision, we include all new mutate-and-map experiments for which we could (i) identify evolutionary conserved intermolecular interactions, (ii) find an experimentally determined structure in the PDB, and (iii) obtain an alignment from the Rfam family. We provide below a list of all experiments included in our updated benchmark, and a justification for the ones that we could not included inside. We also include at the end of this letter, detailed answers to the reviewers comments.
\section*{Molecules included in the benchmark}
......@@ -83,7 +83,7 @@ The following molecules were not included in our new benchmark because we have n
\item Class I Ligase.
\end{itemize}
The following molecules were not integrated in our benchmark because these are artificial sequences:
The following molecules were not integrated in our benchmark because they are artificial sequences:
\begin{itemize}
\setlength{\itemsep}{-1pt}
\item Hobartner bistable switch
......@@ -94,13 +94,13 @@ The following molecules were not integrated in our benchmark because these are a
\section*{Detailed answers to the reviewer}
\textbf{Comment:}The authors have not used all the possible experimental data that are available. Just a quick look at a repository https://rmdb.stanford.edu/browse/, I would guess that many of these data sets correspond to structures that have been solved experimentally and for which there are RFAM alignments. These include "add Adenine Riboswitch, V. vulnificus", "16S rRNA Four-Way Junction", "tRNA Phenylalanine, S. cerevisiae", "RNA Puzzle 6" (this is the vitamin B12 riboswitch), and " RNA Puzzle 8" (a SAM riboswitch). These all bind to ligands or proteins, like the 5S RNA and the cyclic di-GMP riboswitch studies in the text, and most of the data sets are associated with publications. It seems that aRNhAck could receive a better test.\\
\textbf{Comment:} The authors have not used all the possible experimental data that are available. Just a quick look at a repository https://rmdb.stanford.edu/browse/, I would guess that many of these data sets correspond to structures that have been solved experimentally and for which there are RFAM alignments. These include "add Adenine Riboswitch, V. vulnificus", "16S rRNA Four-Way Junction", "tRNA Phenylalanine, S. cerevisiae", "RNA Puzzle 6" (this is the vitamin B12 riboswitch), and " RNA Puzzle 8" (a SAM riboswitch). These all bind to ligands or proteins, like the 5S RNA and the cyclic di-GMP riboswitch studies in the text, and most of the data sets are associated with publications. It seems that aRNhAck could receive a better test.\\
\textbf{Answer:} \textit{We thank the reviewer to point at the novel experiments posted on the mutate-and-map repository. We do agree that extending the test set will strengthen our manuscript. In this revision, we included all experiments for which we could also find a Rfam alignment and reference structure in the PDB (See above).}\\
\textbf{Comment:} I don't think delta or gamma are defined in the main text (I only figured out what they were by reading the algorithm pseudocode). In fact, there is a statement that delta 'was set to 10'; I think the authors meant lambda.\\
\textbf{Answer:} \textit{This is correct. We added formal definitions of $\delta$ and $\gamma$ in the text, and fixed the typo (Indeed, $\lambda$ is set to 10).}\\
\textbf{Answer:} \textit{We added a formal definition of $\delta$ in the subsection ``Structural Disruption'', clarified the definition of $\gamma$ in the subsection ``Proximity filtering'', and fixed the typo about $\lambda$. When appropriate, we also included a description of these parameters in the captions of the figures.}\\
\textbf{Comment:} The main text also skips a step near the end; while it explains which residue *pairs* were considered long-distance functional connections, it does not explain which residues were considered to be interface residues -- both members of the pair?\\
......@@ -108,7 +108,13 @@ The following molecules were not integrated in our benchmark because these are a
\textbf{Comment:} In Figure 8, it remains difficult to see the 'bridge' between the red and green parts of the RNA; I suggest putting arrows to mark the two positions, and showing more of the rest of the ribosome, perhaps as white spheres in a cutaway view, to show how the residues are connected via the surrounding complex.\\
\textbf{Answer:} \textit{We updated the Figure as suggested and added arrows to mark the positions. However, we did not managed to produce a clearer picture using a cutaway view.}
\textbf{Answer:} \textit{We updated the Figure as suggested and added arrows to mark the positions. Unfortunately, we did not managed to produce a clearer picture using a cutaway view. Nonetheless, we believe that the arrows already improve the clarity of the figure.}\\
Finally, in this revision we also removed Figures 4 (illustration of a disruption profile) and~6 (example of a ROC curve) from the previous manuscript as we do not think they bring much information, neither help to clarify the message of this work.\\
Overall, we would like to thank the reviewer for his careful reviews and helpful comments.
\end{document}
......
......@@ -230,7 +230,7 @@ $$
We also tested three alternative measures: The first one is $l^2$ norm between the whole profiles of $\Seq$ and $\Seq_i$, to evaluate the global \shape disruption (Fig.~S1); The second is a variant of $\Delta$ which considers the maximally contributing window over the whole sequence (instead of only considering the one centered on $i$ for $\Delta$), and aims at identifying non-local structural rearrangements (Fig.~S2); The third restrains the positions for the $l^2$ norm between $\Seq$ and $\Seq_i$ to the SSE where the mutation lies, in order to assess the local three-dimensional disruption of the \shape profile (Fig.~S3). Although some of these measures showed potential for applications, we only report our analysis on the $\Delta$ measure, whose signal was clearest.
{\color{red}Given a percentile $\delta$ and a mutate-and-map (MaM) experiment, the procedure selects the mutations at position $i$ such that the \shape profile disruption $\Delta(w, w_i)$ is in the $\delta$ percentile of all profile disruptions, as shown in Algo.~\ref{algo:mut}.}
{\color{red}In the following, we use a parameter $\delta$ to identity mutations associated with significant changes of the structure. More precisely, given a percentile $\delta$ and a mutate-and-map (MaM) experiment, we select the mutations at position $i$ with a \shape profile disruption $\Delta(w, w_i)$ in the $\delta$ percentile of all profile disruptions (See Algo.~\ref{algo:mut}).}
\subsection{Evolutionary Information}
......@@ -252,10 +252,7 @@ To identify those pairs of nucleotides at specific positions which vary together
$$\displaystyle \text{\NPMI}(x,y)=\frac{\log\frac{\mathbb P\left(x,y\right)}{\mathbb P\left(x\right)\,\mathbb P\left(y\right)}}{-\log \mathbb P\left(x, y\right)} \in [-1,1]$$
where probabilities $\mathbb{P}(\cdot)$ are estimated from their frequencies in the multiple sequence alignment.
An \NPMI of $ -1$ indicates that $x$ and $y$ never appear together. On the opposite side of the spectrum, a value of $1$ signifies a perfect correlation. If $x$ and $y$ can be considered as two independent random variables, then the \NPMI will be $0$.
Starting from an \rfam alignment of total length $m$, the \NPMI of every $25{m\choose2}$ pairs of possible mutations is computed, where $m$ is the length of the alignment.
For every ${m\choose2}$ pair of positions, the nucleotides can be either \Ab, \Cb, \Gb, \Ub\xspace or a gap \gapb. The set of
all \NPMI{}s greater than $-1$ is called $\zeta$.
An \NPMI of $ -1$ indicates that $x$ and $y$ never appear together. On the opposite side of the spectrum, a value of $1$ signifies a perfect correlation. If $x$ and $y$ can be considered as two independent random variables, then the \NPMI will be $0$. Starting from an \rfam alignment of total length $m$, the \NPMI of every $25{m\choose2}$ pairs of possible mutations is computed, where $m$ is the length of the alignment. For every ${m\choose2}$ pair of positions, the nucleotides can be either \Ab, \Cb, \Gb, \Ub\xspace or a gap~\gapb. The set of all \NPMI{}s greater than $-1$ is called $\zeta$.
The procedure to compute the positions over a cutoff percentile $\zeta_c$ given a mutation $m$, a list of positions $p$ and a multiple sequence alignment $MSA$ is described in Algo.~\ref{algo:npmi}.
......@@ -280,18 +277,17 @@ In RNA, a large proportion of observed covariations are adequately explained by
%\item Stems, due to their rigidity, constitute the scaffold on an RNA structure. It is however the loops that have the flexibility to create complex three dimensional structures~\cite{stombaugh2009frequency} \todo{probably better ref} allowing the RNAs to uphold their functions.
%Given our goal of detecting interfaces between the RNA and other chains, we restrict our attention to positions located inside loops;
%\item
Since the secondary structure is, to a large extent, already revealed by comparative analysis (and already present in the \rfam{} profile taken as input to the method), it does not constitute the primary object of interest of our study. In order to minimize the probability of detecting a local structural compensation, we require a minimal distance $\gamma$ between the mutation and the loops identified for their good NPMI values. This definition is formalized in Algo.~\ref{algo:pos}.
Since the secondary structure is, to a large extent, already revealed by comparative analysis (and already present in the \rfam{} profile taken as input to the method), it does not constitute the primary object of interest of our study. {\color{red} In order to minimize the probability of detecting local structural compensations, we require a minimal distance $\gamma$ between the index of the mutation and the position of the loops selected for their good NPMI values. This criterion is formally implemented in Algo.~\ref{algo:pos}.}
\subsubsection{{\color{red}Binding interfaces positions}}
Since both negative and positive correlations can indicate
positions of interest, we use two different, $\zeta^-$ and $\zeta^+$, thresholds for the \NPMI{}s. $\zeta^+$ will be a bound on the positive values of the \NPMI and $\zeta^-$ on the negative ones. Due to the
high number of possible combinations, \NPMI{}s having values $-1$ are frequent and uninformative. They are discarded.
{\color{red} For those loops deemed as regions of interest, we predict that the set of positions with an \NPMI above $\zeta^+$ or below $\zeta^-$ are nucleotides in binding interfaces while the others are not.}
{\color{red} For those loops deemed as regions of interest, we predict that the set of positions with an \NPMI above $\zeta^+$ or below $\zeta^-$ are nucleotides in binding interfaces while the others are not.}\\
{\removelatexerror
\begin{algorithm}
\begin{algorithm}[H]
\DontPrintSemicolon
\SetAlgoLined
\SetKwFunction{shapeDisruption}{shapeDisruption}
......
......@@ -104,11 +104,30 @@
\newcommand{\rnasnp}{\texttt{RNAsnp}\xspace}
%% for last Table
\usepackage{array}
\newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
\newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
\newcolumntype{R}[1]{>{\raggedleft\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
\usepackage{diagbox}
\newcommand*\mc[1]{\multicolumn{1}{c}{#1}}
\newcommand{\raisehdr}[1][.5\normalbaselineskip]{\raisebox{#1}}
% Title
\title{Supplementary Material for ``Combining structure probing data on RNA mutants with evolutionary information reveals RNA-binding interfaces''}
\author{Vladimir Reinharz$^1$, Yann Ponty$^2$, J\'er\^{o}me Waldisp\"{u}hl$^1$}
\date{\small $^1$ School of Computer Science, McGill University, Montreal, Canada\\$^2$ Laboratoire d'informatique, \'Ecole Polytechnique, Palaiseau, France.}
\begin{document}
\section{Supplementary data}
\subsection{Alternative Measures}
\maketitle
\section*{Alternative Measures}
The results when different measures are used to evaluate the \shape profile distances.
......@@ -142,8 +161,7 @@ The second is a variant which considers the maximally contributing window over t
\end{figure}
\subsection{\remu and \shape intersection}
\section*{\remu and \shape intersection}
Given the mutation ranked by structural disruption by \shape experiments and \remu. At every percentile, we look at the
intersection between the elements in those set over their respective percentile. In Fig.~\ref{fig:percentile} we show on the $y$ axis the percentiles and on the $x$ the ratio of those sizes over the maximal number possible.
\begin{figure}[ht!]
......@@ -163,7 +181,7 @@ We then evaluated the AUCs using for every cutoff percentile $\delta$ the mutati
\label{fig:remushape}
\end{figure}
\subsection{Rfam matches and remuRNA}
\section*{Rfam matches and remuRNA}
Evaluations of AUCs on every sequence in the PDB database matching a Rfam family with a length smaller than 150 nucleotides is shown in Fig.~\ref{fig:all_rfam_auc}. The 14 families
are:
\begin{itemize}
......@@ -191,7 +209,7 @@ are:
\label{fig:all_rfam_auc}
\end{figure}
\subsection{glycine riboswitch}
\section*{glycine riboswitch}
Results for the glycine riboswitch with the 4 different distance MaM experiments is shown in Fig.~\ref{fig:glyc_dist_4}.
\begin{figure}[ht!]
\centering
......@@ -200,64 +218,68 @@ Results for the glycine riboswitch with the 4 different distance MaM experiments
\label{fig:glyc_dist_4}
\end{figure}
\subsection{c-di-GMP pocket}
\section*{c-di-GMP pocket}
The results for c-di-GMP having as positive positions the ones interacting in the binding pocket and with or without the interaction with the other molecules in the PDB is shown in Fig.~\ref{fig:cdigmp_hairpinornot}
\begin{figure}[ht!]
\centering
\begin{subfigure}[b]{0.45\textwidth}
\includegraphics[width=1.4\textwidth]{FigureS8}
\begin{subfigure}[b]{0.4\textwidth}
\includegraphics[width=1\textwidth]{FigureS8}
\caption{MaM}
\end{subfigure}
\begin{subfigure}[b]{0.45\textwidth}
\includegraphics[width=1.4\textwidth]{FigureS8b}
\begin{subfigure}[b]{0.4\textwidth}
\includegraphics[width=1\textwidth]{FigureS8b}
\caption{remuRNA}
\end{subfigure}
\caption{On the first row the AUC values given the pocket and the protein, while on the second row we only consider the pocket as positive values.}
\label{fig:cdigmp_hairpinornot}
\end{figure}
\subsection{adenine MaM without ligand}
\newpage
\section*{adenine MaM without ligand}
We show in Fig.~\ref{fig:adenine_4} the results for the adenine riboswitch when the disruption is based on a MaM experiment in absence of the adenine ligand.
\begin{figure}[ht!]
\centering
\includegraphics[width=\textwidth]{FigureS9}
\includegraphics[width=0.4\textwidth]{FigureS9}
\caption{AUC results for the adenine riboswitch when the disruption is based on a MaM experiment in absence of the ligand}
\label{fig:adenine_4}
\end{figure}
\subsection{tRNA positions extrapolated from PDB \texttt{3J78}}
\section*{tRNA positions extrapolated from PDB \texttt{3J78}}
We show in Fig.~\ref{fig:trna} the results for the tRNA when binding interfaces are extrapolated form PDB \texttt{3J78}.
\begin{figure}[ht!]
\centering
\includegraphics[width=\textwidth]{FigureS10}
\includegraphics[width=0.4\textwidth]{FigureS10}
\caption{AUC results for the tRNA when the binding interfaces are extrapolated form PDB \texttt{3J78}}
\label{fig:trna}
\end{figure}
\vspace{0em}
\newpage
\section{Dataset binding positions}
\section*{Dataset of binding positions}
\begin{table}[ht!]
\rotatebox{90}{
\begin{tabular}{lllll}
%\rotatebox{90}{
\scalebox{0.8}{
\begin{tabular}{llllL{6cm}}
RNA & Binding to & RFAM & PDB(s) &Binding Positions on PDB \\\hline\hline
5S & Prots. & RF00001 & 2WWQ & $7-13,27-33,38,41-57,59-60,70,73-84,88-104,112-116$\\
& & & 3OAS & $6-12,26-33,37-38,41-52,54-57,59,70,73-84,88-104,112-116$\\
& & & 3OFC & $6-12,27-31,33,37-38,41-52,54-59,73-84,88-104,112-117$\\
& & & 3ORB & $6-12,27-31,33,37-38,41-52,54-59,73-84,88-104,112-116$\\\hline
5S & \raisehdr{Prots.} & \raisehdr{RF00001} & \raisehdr{2WWQ} & $7-13,27-33,38,41-57,59-60,70,73-84,88-104,112-116$\\
& & & \raisehdr{3OAS} & $6-12,26-33,37-38,41-52,54-57,59,70,73-84,88-104,112-116$\\
& & & \raisehdr{3OFC} & $6-12,27-31,33,37-38,41-52,54-59,73-84,88-104,112-117$\\
& & & \raisehdr{3ORB} & $6-12,27-31,33,37-38,41-52,54-59,73-84,88-104,112-116$\\\hline
tRNA & anticodon & RF00005 & & $34-36$\\
& T-$psi$-C-G & & & $54-57$\\
& Prot. DNA & & 1EHZ & 1, 19, $34-36$, $56- 57$, $73-76$\\ \hline
c-di-GMP ribo. & c-di-GMP & RF01051 & c-di-GMP pocket & $11-13, 40-41, 83$\\\hline
cobalamin ribo. & B1Z & RF00174 & 4GXY & $41-43,64-66,72-78,106,108-109,124,148-150,155-157,159-162$\\\hline
cobalamin ribo. & \raisehdr{B1Z} & \raisehdr{RF00174} & \raisehdr{4GXY} & $41-43,64-66,72-78,106,108-109,124,148-150,155-157,159-162$\\\hline
adenine ribo. & adenine & RF00167 & 1Y26 & $21-22,47,50-52,73-75$\\\hline
glycine ribo. & glycine & RF00504 & 3P49 & $35-39, 46, 48-42, 110-114,137, 139-143$\\
glycine ribo. & \raisehdr{glycine} & \raisehdr{RF00504} & \raisehdr{3P49} & $35-39, 46, 48-42, 110-114,137, 139-143$\\
\end{tabular}
}
\caption{For each RNA some info}
\caption{Binding positions for the mutate-and-map dataset. This table also provide the reference PDB structure and Rfam family used in our benchmark.}
\label{table:datasetinfo}
\end{table}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment