Commit 997bc595 authored by Carlos GO's avatar Carlos GO
Browse files

beamer

parent 568639b8
File added
File added
This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) (preloaded format=pdflatex 2016.5.22) 12 AUG 2019 10:01
entering extended mode
restricted \write18 enabled.
file:line:error style messages enabled.
%&-line parsing enabled.
**phd.tex
(./phd.tex
LaTeX2e <2016/03/31>
Babel <3.9r> and hyphenation patterns for 83 language(s) loaded.
./phd.tex:4: Undefined control sequence.
l.4 \psset
{xunit=.5pt,yunit=.5pt,runit=.5pt}
?
./phd.tex:4: Emergency stop.
l.4 \psset
{xunit=.5pt,yunit=.5pt,runit=.5pt}
End of file on the terminal!
Here is how much of TeX's memory you used:
5 strings out of 493014
111 string characters out of 6133351
53601 words of memory out of 5000000
3648 multiletter control sequences out of 15000+600000
3640 words of font info for 14 fonts, out of 8000000 for 9000
1141 hyphenation exceptions out of 8191
5i,0n,1p,62b,8s stack positions out of 5000i,500n,10000p,200000b,80000s
./phd.tex:4: ==> Fatal error occurred, no output PDF file produced!
File added
This diff is collapsed.
This diff is collapsed.
......@@ -902,4 +902,11 @@
organization={Springer}
}
@inproceedings{mairal2009supervised,
title={Supervised dictionary learning},
author={Mairal, Julien and Ponce, Jean and Sapiro, Guillermo and Zisserman, Andrew and Bach, Francis R},
booktitle={Advances in neural information processing systems},
pages={1033--1040},
year={2009}
}
......@@ -13,6 +13,7 @@
\usepackage[ruled,vlined,linesnumbered,noresetcount]{algorithm2e}
\usepackage{tikz}
\usepackage{chronosys}
\usepackage{enumerate}
\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png}
......@@ -43,11 +44,11 @@
\section{Abstract}
RNA (Ribonucleic Acid) is a family of biological molecules which controls vital cellular processes in all kingdoms of life.
Specific sets of physical interactions within an RNA molecule defines its 3D architecture and thus determine its function.
The unique flexibility and complexity of RNA structures supports a diverse range of functions which can explain many important biological phenomena and disorders.
From a computational perspective, this phenomenon raises questions such as: Can we derive information about an RNA's function from its structure? Given a large set of RNA structures, which parts are functionally relevant?
Answering these questions is a necessary step for guiding further biological experimentation and to decoding the resulting data.
RNA (Ribonucleic Acids) is a family of biological molecules which controls vital cellular processes in all kingdoms of life.
Specific sets of physical interactions within an RNA molecule define its 3D architecture and thus determine function.
Interestingly, the unique flexibility and complexity of RNA structures supports a diverse range of functions which can explain many important biological phenomena and disorders.
From a computational perspective, this property raises questions such as: Can we derive information about an RNA's function from its structure? Given a large set of RNA structures, which parts are functionally relevant?
Answering these questions is a necessary step for guiding further biological experimentation and decoding the resulting data.
While a large body of algorithms have addressed these and many others questions using simplified models of RNA structure (2D), evidence that complex structural patterns occurring at the 3D level are key components of RNA function is building.
However, complex RNA structures are characterized by the presence of non-standard interactions which break algorithmic assumptions used by many established tools (e.g. planarity), and for which interaction energies are unknown.
In the absence of such theoretical constraints, data-driven methods are a promising alternative.
......@@ -256,7 +257,7 @@ Indeed, graphical representations of RNA base pairing networks have been develop
We show for the first time that base-pairing networks can be used to automatically predict the binding of small molecules to RNAs.
More specifically, we train a machine learning algorithm to use structural patterns in crystal structures of known RNA-ligand complexes to make predictions which allow us to identify potentially active ligands.
In machine-learning terms, the RNA-ligand complex is treated as an input-output pair where the target structure is the input to the model and the ligand is the output.
In machine learning terms, the RNA-ligand complex is treated as an input-output pair where the target structure is the input to the model and the ligand is the output.
In order to allow for ligand-based applications, we use molecular fingerprints of ligands as the outputs of our model.
These are vector-based representations of chemicals designed for ligand space similarity searches and which can be conveniently handled by machine learning models.
The prediction thus serves as a ligand-based tool since it can be used to search for active compounds in the ligand space.
......@@ -320,7 +321,7 @@ Therefore as future work we would like to:
\begin{itemize}
\item Include data from other sources (binding screens, docking, computational structure predictions)
\item Faster alignment method (see GARL)
\item Faster alignment method (see \garl)
\item GCN version for end to end learning
\end{itemize}
......@@ -331,13 +332,13 @@ Studying the networks of 3D interactions in RNA structures led to the observatio
This resulted in the hypothesis that RNA function at the 3D level can be encoded by an alphabet of simple sub-structures which we discover as recurrent substructures ~\cite{leontis2006building}.
Indeed, it has been shown that several of these sub-structures are linked to important functions ~\cite{nissen2001rna}.
The first motifs were identified through manual inspection of 3D structures in search for similar sub-structures.
Recent efforts have attempted to automate this process ~\cite{reinharz2018mining, djelloul2009algorithmes, parlea2016rna,ge2018novo} to increase the repertoire of known RNA 3D motifs.
Recent efforts have attempted to automate this process ~\cite{reinharz2018mining, djelloul2009algorithmes, parlea2016rna,ge2018novo} in order to increase the repertoire of known RNA 3D motifs.
Interestingly, several of those approaches rely on similar graph-based representations of RNA to those used in \rnamigos.
In computational terms, the task of identifying structural motifs from a large set of full RNA crystal structures requires (i) structure comparison (recognizing similar sub-structures) , and (ii) sub-structure searching (a way to navigate large structures to test for similarity).
The main challenge is that comparing structures is computationally expensive, and the number of possible structures to explore explodes for even small structures.
State-of-the-art techniques for mining structural motifs rely on two major constraints to address the computational challenges.
The first is a limitation on which sub-structures to evaluate (i.e. only certain kinds of loops ~\cite{ge2018novo} or base-pairings ~\cite{reinharz2018mining}).
And the second is that they assume instances of motifs will be exactly identical to each other (with the exception of ~\cite{ge2018novo} which is limited in the first assumption).
And the second is the assumption that instances of motifs will be exactly identical to each other (with the exception of ~\cite{ge2018novo} which is limited in the first assumption).
For a molecule as flexible as RNA, it is very likely that motifs will adopt a range of possible conformations, not to mention errors in 3D structure annotation will introduce noise in the graphs.
Therefore, our current view of the repertoire of RNA structural motifs repertoire remains limited.
......@@ -349,7 +350,7 @@ Because comparisons between vectors are much faster than comparing graphs direct
And the continuous nature of vector spaces allows us to obtain non-exact matches which will boost the diversity of motifs discovered.
As seen earlier, RNA graphs are characterized by graphs containing a set of pairwise interactions which can be modelled as relation types using a slight generalization of GCNs, Relational Graph Convolutional Networks ~\cite{schlichtkrull2018modeling} (see {\bf Fig. ~\ref{fig:rnamigos}} for an example of an RNA graph and its edge types).
An RGCN takes a graph as input and produces a $d$ dimensional layer-wise embedding for node $i$ $h_i \in \mathbb{R}^d$ as follows:
An RGCN takes a graph as input and produces a $d$ dimensional layer-wise embedding for node $i$ $z_i \in \mathbb{R}^d$ as follows:
\begin{equation}
z_i^{(l+1)} = \sigma \big(\sum_{r \in \mathcal{R}} \sum_{j \in \mathcal{N}_i^r} \frac{1}{c_{i,r}} W_r^{(l)}z_j^{(l)} + W_0^{(l)}z_{i}^{(l)} \big)
......@@ -357,7 +358,7 @@ An RGCN takes a graph as input and produces a $d$ dimensional layer-wise embeddi
Where the node embedding $z$ at layer $(l+1)$ is a sigmoid-transformed sum over the embeddings of a node's neighbours $\mathcal{N}_{i}^r$ for each relation type $r \in \mathcal{R}$.
The aggregator has a trainable weight matrix $W$ that operates on the embedding of its neighbours $h_{j}$. This operation is repeated in `layers' to get aggregation over larger neighbourhoods.
In our setting, we tag edges with one of two relation types indicating whether the edge belongs to an input graph or to the mapping.
In our setting, we label edges with with their corresponding 3D base-pairing geometry class.
We train the parameters of the embedding function to respect distance relationships between the nodes, together with a `motif' term to identify clusters of subgraphs.
For the node distance we define a function $k_{L}(u,v)$ which operates directly on the training graphs and returns a distance value taken over a radius of $L$ hops from each node.
......@@ -386,10 +387,10 @@ To bypass this costly step, we observe that the definition of a motif is not sim
Indeed, it is entirely possible that a motif contains a set of nodes that each receive different embeddings according to $k$ but that jointly recur in the data.
For this reason, we include a second term in the training loss which reflects this property.
We use a modification of dictionary learning which is typically used to learn implicit features of datasetes (CITE).
We use a modification of dictionary learning which is typically used to learn implicit features of datasetes ~\cite{mairal2009supervised}.
The model assumes that there exist $m$ motifs which we can represent as a matrix of orthogonal trainable vectors $E$ with the same dimensionality as the embeddings.
A second network takes node embeddings $Z \in \mathbb{R}^{n \times d}$ and for each node predicts a soft assignment $\sigma$ over the $m << n$ motifs.
Using the predicted assignments $\sigma$ for each node we can compute a resulting embedding for motif $i$ and dimension $j$ over all nodes $k=1$ to $n$:
Using the predicted assignments $\sigma$ for each node we can compute an aggregate embedding for motif $i$ and dimension $j$ over all nodes $k=1$ to $n$:
\begin{equation}
(\Sigma^T Z)_{ij} = \frac{\sum_{k=1}^{n} \sigma_{ik} z_{jk}}{\norm{\sigma_i}}
......@@ -450,9 +451,9 @@ We can then use these parsimonious alignments to highlight conserved regions or
With an alignment we also obtain a measure of similarity between the two objects which is often very necessary in many pattern recognition tasks ~\cite{bunke2008graph}.
Aligning RNA graphs has resulted in many interesting tools (including \rnamigos) such as CARNAVAL~\cite{reinharz2018mining} (discussed previously), and BayesPairing ~\cite{sarrazin2019automated} which uses RNA graph alignments to learn statistical models for sequence-based 3D structure prediction.
Unlike pairwise sequence alignments, graphs are very high dimensional objects which makes them difficult to align.
Indeed, optimally aligning two graphs is known to be NP-Hard.
Indeed, optimally aligning two graphs is known to be NP-Hard ~\cite{bunke2008graph}.
Furthermore, the majority RNA tools built on graph comparison only accomodate exact matching between graphs.
This includes, databases built by CARNAVAL, 3D motif hub ~\cite{petrov2013automated}, and BayesPairing.
This includes, databases built by CARNAVAL, 3D motif hub ~\cite{petrov2013automated}, and BayesPairing ~\cite{sarrazin2019automated}.
However, RNA are known to be highly flexible molecules and their graph representations often contain noise.
In order to properly capture biological variation, alignments need to accommodate inexact matching, and be guided by domain-specific cost functions.
In the case of RNA, this necessitates considerations such as edge type comparison ~\cite{stombaugh2009frequency}, additive stacking stability contributions ~\cite{zuker1981optimal}, and backbone ordering.
......@@ -539,13 +540,13 @@ As validation data we will be using the alignments produced in \rnamigos which u
\section{Conclusion}
The result of this thesis will be a set of tools that tackle several challenges in dealing with complex RNA structures:
The result of this thesis will be a set of tools that tackle several challenges in posed by complex RNA structures:
\begin{enumerate}
\begin{enumerate}[(i)]
\item With \maternal we use evolutionary algorithms and dynamic programming methods to illustrate the landscape of RNA structural elements that can support loci of structural complexity (unpaired regions and high-degree junctions).
\item Looking closer at such regions we observe 3D structural signatures for important functional roles such as RNA-small molecule binding. We develop \rnamigos to exploit these signatures and assist in the discovery of novel RNA drugs.
\item While some of these local RNA structural signatures are unveiled by observing phenomena such as ligand binding, some are hidden in the 3D RNA data as recurrent structural patterns. We develop \vernal to automatically retrieve these patterns and discover new functionally relevant structural signatures (motifs).
\item In order to interpret these collections of motifs and build more robust statistical models of RNA structure we need efficient and customizable tools for comparing RNA structures. For this reason we develop \garl learns from expert knowledge of discriminating features between RNA structures to quickly align RNA.
\item In order to interpret these collections of motifs and build more robust statistical models of RNA structure we need efficient and customizable tools for comparing RNA structures. For this reason we develop \garl which learns from expert knowledge of discriminating features between RNA structures to quickly align RNA.
\end{enumerate}
\clearpage
......
\documentclass{beamer}
% \usepackage{beamerthemesplit} // Activate for custom appearance
\usepackage{chronosys}
\usepackage{xspace}
\newcommand{\maternal}{\texttt{mateRNAl}\xspace}
\newcommand{\rnamigos}{\texttt{RNAmigos}\xspace}
\newcommand{\vernal}{\texttt{veRNAl}\xspace}
\newcommand{\garl}{\texttt{garl}\xspace}
\newcommand{\rnamutants}{\texttt{RNAmutants}\xspace}
\newcommand{\norm}[1]{\left\lVert#1\right\rVert}
\graphicspath{{Figs/}}
\title{Computational Tools for Studying Complex RNA Structures}
\author{Carlos G. Oliver \\ Supervisor: Jerome Waldispuhl \\ PhD Proposal Exam}
\date{\today}
\begin{document}
\frame{\titlepage}
\section[Outline]{}
\frame{\tableofcontents}
\begin{frame}
\frametitle{Timeline}
\begin{figure}[h]
\setupchronology{startyear=2016,color=blue,stopyear=2021,dates=false,arrow=false}
\setupchronoevent{textstyle=\it,date=false}
\setupchronoperiode{dates=false,textdepth=-15pt}
\startchronology
\chronograduation{1}
\chronoevent[markdepth=70pt]{8/2019}{Proposal}
\chronoevent[markdepth=30pt]{6/8/2019}{\maternal Accepted}
\chronoevent[markdepth=30pt]{1/1/2017}{\maternal Submitted}
\chronoevent[markdepth=45pt]{1/2019}{\rnamigos Submitted}
\chronoevent[markdepth=70pt]{9/2017}{Comprehensive}
\chronoevent[markdepth=70pt]{9/2020}{Defense}
\chronoperiode[dates=false]{2016}{2017}{\maternal}
\chronoperiode{2017}{2019}{\rnamigos}
\chronoperiode{2019}{2020}{\vernal}
\chronoperiode{2020}{2021}{\garl}
\stopchronology
\caption{Timeline for my PhD.}
\label{fig:timeline}
\end{figure}
%\begin{itemize}
% \item {\bf Oliver, C. G.,} Reinharz, V., \& Waldisphl, J. (2019). {\it The necessary emergence of structural complexity in self-replicating RNA populations.} bioRxiv, 218990. (Accepted for publication at the RNA Journal) ~\cite{oliver2017necessary}
% \item {\bf Oliver, C. G.,} Gendron, R. S., Moitessier, N., Mallet, V., Reinharz, V., \& Waldisphl, J. (2019). {\it Extended RNA base pairing networks imprint small molecule binding preferences.} bioRxiv, 701326. (Submitted for publication at PNAS) ~\cite{oliver2019extended}
% \item Sarrazin-Gendron, R., Reinharz, V., {\bf Oliver, C. G.,} Moitessier, N., \& Waldisphl, J. (2019). {\it Automated, customizable and efficient identification of 3D base pair modules with BayesPairing.} Nucleic acids research, 47(7), 3321-3332. ~\cite{sarrazin2019automated}
% \item Mallet, V., {\bf Oliver, C. G.,} Moitessier, N., \& Waldispuhl, J. (2019). {\it Leveraging binding-site structure for drug discovery with point-cloud methods.} arXiv preprint arXiv:1905.12033. (In preparation for resubmission) ~\cite{mallet2019leveraging}
%\end{itemize}
\end{frame}
\section{Backgound}
\begin{frame}
\frametitle{RNA}
\begin{figure}[h!]
\centering
\includegraphics[width=0.6\textwidth]{struc.jpeg}
\label{fig:rna}
\end{figure}
\end{frame}
\begin{frame}
\frametitle{Overview of Projects}
\begin{figure}
\includegraphics[width=\textwidth]{phd.pdf}
\end{figure}
\end{frame}
\section{{\bf Project I:} \maternal}
\begin{frame}
\frametitle{How do complex structures evolve?}
\begin{figure}
\includegraphics[width=\textwidth]{ml.pdf}
\end{figure}
\end{frame}
\begin{frame}
\frametitle{How do complex structures evolve?}
\begin{figure}
\includegraphics[width=\textwidth]{maternal.pdf}
\end{figure}
\end{frame}
\begin{frame}
\frametitle{How do complex structures evolve?}
\begin{figure}
\includegraphics[width=\textwidth]{matmut.png}
\end{figure}
\end{frame}
\section{{\bf Project II:} \rnamigos}
\begin{frame}
\frametitle{How do complex structures evolve?}
\begin{figure}
\includegraphics[width=\textwidth]{rnamigos.pdf}
\end{figure}
\end{frame}
\section{{\bf Project III:} \vernal}
\begin{frame}
\frametitle{How do complex structures evolve?}
\begin{figure}
\includegraphics[width=\textwidth]{vernal.pdf}
\end{figure}
\end{frame}
\section{{\bf Project IV:} \garl}
\begin{frame}
\frametitle{How do complex structures evolve?}
\begin{figure}
\includegraphics[width=\textwidth]{garl.pdf}
\end{figure}
\end{frame}
\end{document}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment