Commit 1c086b5a authored by Carlos GO's avatar Carlos GO
Browse files

readme

parent 545c9830
......@@ -14,7 +14,6 @@ Data used for training, models and binding pocket visualizations available [here
NOTE: The user friendly API is currently being built and tested. Sample usage described below is still being improved.
## Requirements
* Python 3.6+
* Networkx 2.1+
* BioPython
......@@ -91,7 +90,7 @@ You can convert this output to a distance matrix and a list indicating the graph
>>> from RNAmigos.post_ged import data_prepare
>>> geds = '../data/geds_delta.pickle'
>>> fps = '../data/all_rna_ligands_fingerprints.pickle'
>>> DM, L, graphlist = data_prepare(geds, fps)
>>> DM, L, graphlist = prepare_data(geds, fps)
```
The distance matrix can be passed to a prototype selector to get the indices in the DM selected as prototypes.
......@@ -99,11 +98,34 @@ The distance matrix can be passed to a prototype selector to get the indices in
```python
>>> from RNAmigos.dissimilarity_embed import prototype_select
>>> prototypes = prototype_select(DM, 20, heuristic='spanning')
>>> prototypes
[288,
469,
593,
6,
428,
121,
503,
533,
548,
16,
368,
13,
240,
378,
526,
28,
86,
118,
145,
180]
```
### Embedding a full dataset
To embed the graphs used in the GED comparisons and select prototypes in one call to get a matrix of size N x r where N is the nubmer of graphs in DM and r is the number of prototypes used.
We can embed the graphs used in the GED comparisons and select prototypes in one call to get a matrix of size N x r where N is the nubmer of graphs in DM and r is the number of prototypes used.
Here, we ebmed each graph in `graphlist` with 20 prorotypes using our previously computed distance matrix and the k-centers prototype selector.
```python
>>> from RNAmigos.dissimilarity_embed import full_embed
......@@ -119,10 +141,22 @@ array([ 8., 10., 10., 8., 22., 10., 6., 8., 8., 18., 20., 8., 10.,
```
### Embedding a single graph
```python
>>> G = nx.read_gpickle('/data/1jau.nxpickle')
>>> prototoypes = pickle.load('/data/sample_prototoypes.pickle')
>>> graph_embed(G, prototypes)
[1, 0, 2, 4, 1, 4, 5]
User-friendly API coming soon.
```
User-friendly API coming soon.
## Fingerprint Prediction
User-friendly API coming soon.
Once all graphs are embedded we have the standard machine learning input matrix $X$ with $n$ examples as rows and $r$ features as distances to each prototype.
Any type of classification can now be performed using a label (output) vector for single-class classification or matrix for multi-output classification.
Alternatively, we can classify graphs using k-nearest neighbours and skip the embedding procedure.
......@@ -169,7 +169,7 @@ def k_centers(DM, k, return_assignments=False):
# return protos
def prototypes(DM,m, heuristic='spanning'):
def prototype_select(DM,m, heuristic='spanning'):
"""
Compute set of m prototype graphs.
......@@ -202,9 +202,7 @@ def graph_embed(G, prototypes):
"""
embedding = np.zeros(len(prototypes))
for p in prototypes:
g = pickle.load(open(G, 'rb'))
p = pickle.load(open(p, 'rb'))
ops,_,_ = ged((g1,p), source_only=True)
ops,_,_ = ged((G,p), source_only=True)
embedding.append(ops.cost)
return embedding
......@@ -217,7 +215,7 @@ def full_embed(D, m, DM=None, dist_mat=None, heuristic='spanning'):
"""
if dist_mat == None:
DM = distance_matrix_para(D)
P_idx = prototypes(DM,m, heuristic=heuristic)
P_idx = prototype_select(DM,m, heuristic=heuristic)
logging.info("Embedding graphs.")
embeddings = np.zeros((len(D), m))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment