Commit 721587b8 authored by Roman Sarrazin-Gendron's avatar Roman Sarrazin-Gendron
Browse files

Update README.md

parent 2234c543
......@@ -47,15 +47,15 @@ pip install .
BayesPairing2 comes with three pre-assembled datasets you can immediately start searching sequences with. To search with a specific dataset, use the ``-d`` option with the name of the dataset.
* ``3DmotifAtlas_RELIABLE``: A subset of 60 modules from the RNA 3D Motif Atlas with the highest number of occurrences and highest sequence variation. We are confident in the prediction of those modules given the high quality data we have to train them. This is the default dataset we use.
* ``3DmotifAtlas_ALL``: A dataset containing all the modules we were able to convert from the 3D Motif Atlas to BayesPairing2 models (426). Some of those only had one occurrence and/or may have been trained on limited/incomplete data.
* ``rna3dmotif`` : A dataset containing the 75 most recurrent modules as identified via an exhaustive search of loops in the full PDB database with rna3dmotif.
* ``RELIABLE``: A subset of 60 modules from the RNA 3D Motif Atlas with the highest number of occurrences and highest sequence variation. We are confident in the prediction of those modules given the high quality data we have to train them. This is the default dataset we use.
* ``ALL``: A dataset containing all the modules we were able to convert from the 3D Motif Atlas to BayesPairing2 models (426). Some of those only had one occurrence and/or may have been trained on limited/incomplete data.
* (DEPRECATED) ``rna3dmotif`` : A dataset containing the 75 most recurrent modules as identified via an exhaustive search of loops in the full PDB database with rna3dmotif.
#### Interpreting dataset-specific output
#### Interpreting dataset-specific output (currently under change)
BayesPairing2 returns results by index, where the indexes correspond to modules in the relevant database. The rna3dmotif modules are described by graphs and sequence logos found in ``bayespairing/DBData/rna3dmotif``.
BayesPairing2 returns results by index, where the indexes correspond to modules in the relevant database.
The 3D Motif Atlas modules match to entries in that database. The correspondences between indexes of the two BayesPairing2 Atlas databases and the online 3D motif atlas database (with link to each relevant model) are found in the file ``bayespairing/DBData/3DmotifAtlas/3DmotifAtlas_info.csv``.
The 3D Motif Atlas modules match to entries in that database. The details of the database can be found in ``bayespairing/models/``.
In this csv file, you can observe that sometimes, more than one module maps to the same entry of the Atlas; this is because the 3D Motif Atlas modules are clustered in 3D, and sometimes it is not possible to represent all occurrences accurately with the same graph, so they must be searched separately.
......@@ -90,11 +90,11 @@ The scripts described in this section should be run from the ``bayespairing/src`
The first time you use BayesPairing with a full dataset, it will train all its models before searching a sequence. Those models will not need to be trained again. If you want to reset those models, you can use the ``init`` option. With the ``-d`` option, we are using the rna3dmotif dataset, which includes 75 pre-trained modules.
``python3 parse_sequences.py -seq "UUUUUUAAGGAAGAUCUGGCCUUCCCACAAGGGAAGGCCAAAGAAUUUCCUU" -samplesize 1000 -d rna3dmotif``
``python3 parse_sequences.py -seq "UUUUUUAAGGAAGAUCUGGCCUUCCCACAAGGGAAGGCCAAAGAAUUUCCUU" -samplesize 1000 -d RELIABLE``
The output is very large, so we can raise the threshold to have a better idea of the dominating modules.
``python3 parse_sequences.py -seq "UUUUUUAAGGAAGAUCUGGCCUUCCCACAAGGGAAGGCCAAAGAAUUUCCUU" -samplesize 1000 -t 4 -d rna3dmotif``
``python3 parse_sequences.py -seq "UUUUUUAAGGAAGAUCUGGCCUUCCCACAAGGGAAGGCCAAAGAAUUUCCUU" -samplesize 1000 -t 4 -d RELIABLE``
```
=========================================================================================
......@@ -128,7 +128,7 @@ TOTAL TIME: 2.581
To assess what module the module ID matches, we can generate graphs and sequence logos for all modules and store them in the Graphs directory.
``python3 display_modules.py -n "rna3dmotif"``
``python3 display_modules.py -n "RELIABLE"``
![](bayespairing/DBData/rna3dmotif/default_logo28.png)
![](bayespairing/DBData/rna3dmotif/default_graph28.png)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment