Commit 59a57f61 authored by Vladimir Reinharz's avatar Vladimir Reinharz
Browse files

update description data

parent c5992274
......@@ -730,3 +730,23 @@
publisher={Nature Publishing Group}
}
@article{schattner2005trnascan,
title={The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs},
author={Schattner, Peter and Brooks, Angela N and Lowe, Todd M},
journal={Nucleic acids research},
volume={33},
number={suppl 2},
pages={W686--W689},
year={2005},
publisher={Oxford Univ Press}
}
@article{moreland2005molecular,
title={The Molecular Biology Toolkit (MBT): a modular platform for developing molecular visualization applications},
author={Moreland, John L and Gramada, Apostol and Buzko, Oleksandr V and Zhang, Qing and Bourne, Philip E},
journal={BMC bioinformatics},
volume={6},
number={1},
pages={21},
year={2005},
publisher={BioMed Central Ltd}
}
......@@ -31,7 +31,7 @@
%\usepackage{natbib}
%\usepackage[disable]{todonotes}
\usepackage{soul}
%\usepackage{soul}
%\usepackage[margin=0.95in]{geometry}
%\usepackage{soul}
......@@ -456,10 +456,21 @@ We omitted the shortest sequences (i.e. \rfam families RF00032, RF00037 and RF00
\subsection{Experimental design}
The \texttt{Infernal 1.1}~\cite{nawrocki2009infernal} software was used with default parameter values to: 1) create a covariance model for each alignment, and; 2) align the sequence from the mutate-and-map experiment with the generated covariance model. The consensus secondary structure was then restricted to gapless positions within the aligned sequence $\Seq$.
\todo{explain each case!!!}
For each mutation over the \shape profile percentile cutoff $\delta$, the data set was composed of the regions of interest given $\gamma$, i.e. the set of positions returned by the Algo~\ref{algo:pos}. For each PDB model, the positive data set is composed of the positions in those regions which have the center of any of their atom at most at $5$\AA\xspace from the center of any atom of another chain the the complex. An implementation using the PyMOL Python API is included in the provided code.
For each mutation over the \shape profile percentile cutoff $\delta$, the data set was composed of the regions of interest given $\gamma$, i.e. the set of positions returned by the Algo~\ref{algo:pos}.
{\color{red}
To determine the positive datasets, we proceeded in three different ways.
For the 5S RNA, for each PDB model, the positive data set is composed of the positions in those regions which have the center of any of their atom at most at $5$\AA\xspace from the center of any atom of another chain the the complex. An implementation using the PyMOL Python API is included in the provided code.
For the tRNA, we only considered the three positions of the anticodon, identified using \texttt{tRNAscan-SE}~\cite{schattner2005trnascan}. Binding sites are quite different from a tRNA to the other and since there is no crystal structure of the tRNA binding with other proteins we didn't consider other possibilities. \todo{more explanations about tRNA?}
For the riboswitches, from their respective crystal structures we used \texttt{Ligand Explorer}~\cite{moreland2005molecular} to identify nucleotide at most $5$\AA\xspace from the ligand.}
The remaining positions compose the negative dataset. The positions not present in the model were ignored. This highlights one of the challenges of benchmarking. For the 5S rRNA, out of 121 positions, two models had 3 nucleotides missing, one had 4 missing and the other 6. For c-di-GMP, out of 103 positions, one model had 8 nucleotides missing, two others 21 and the last 22.
The remaining positions compose the negative dataset. The positions not present in the model were ignored. This highlights one of the challenges of benchmarking. For the 5S rRNA, out of 121 positions, two models had 3 nucleotides missing, one had 4 missing and the other 6. {\color{red} For c-di-GMP, out of 103 positions, one model had 8 nucleotides missing, two others 21 and the last 22. Which explains some of the discrepancies between the models.}
The set $\zeta$ is composed of the NPMI between every pairs of positions and every possible nucleotide (i.e. \Ab, \Cb, \Gb, \Ub\xspace and \gapb{}) in the resulting alignment. The thresholds on the NPMIs, $\zeta^+$ (resp. $\zeta^-$) was sliced from the $0^{\text{th}}$ to the $100^{\text{th}}$ percentile of the positive values of $\zeta$ (resp. negative values of $\zeta$).
......
......@@ -235,10 +235,10 @@ The results for c-di-GMP having as positive positions the ones interacting in th
& & & 3OFC & $6-12,27-31,33,37-38,41-52,54-59,73-84,88-104,112-117$\\
& & & 3ORB & $6-12,27-31,33,37-38,41-52,54-59,73-84,88-104,112-116$\\\hline
tRNA & anticodon & RF00005 &1EHZ & $34-36$\\\hline
c-di-GMP ribo. & c-di-GMP & RF01051 & 3IWN & $8-10,28,38,53-64,66-72,82$\\
& & & 3MUT & $18-20,38,48,61-64,75,92$\\
& & & 3MUV & $18-20,34,38,48,60-64,75,92$\\
& & & 3MXH & $18-20,38,48,61-64,75,92$\\\hline
c-di-GMP ribo. & c-di-GMP & RF01051 & 3IWN & $8-10, 28, 38, 82$\\
& & & 3MUT & $18-20, 38, 48, 92$\\
& & & 3MUV & $18-20, 38, 48, 92$\\
& & & 3MXH & $18-20, 38, 48, 92$\\\hline
cobalamin ribo. & B1Z & RF00174 & 4GXY & $41-43,64-66,72-78,106,108-109,124,148-150,155-157,159-162$\\\hline
adenine ribo. & adenine & RF00167 & 1Y26 & $21-22,47,50-52,73-75$\\\hline
glycine ribo. & glycine & RF00504 & 3P49 & $35-39, 46, 48-42, 110-114,137, 139-143$\\
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment