Nextflow Pipeline for Cif/Pdb File Processing

This pipeline automates the process of querying a Neo4j database, downloading structures, rendering videos, and uploading files to an S3 bucket. The pipeline is managed using Nextflow, and requires cypher-shell to interact with the Neo4j database.

Prerequisites

Before running this pipeline, ensure you have the following installed:

Neo4j: Follow the instruction to install the docker Neo4j Reactome database, don't forget to run it. If you want to use a Neo4j Deskop modify the address in nextflow.config. to run with docker launch :
```
docker run -p 7474:7474 -p 7687:7687 -e NEO4J_dbms_memory_heap_maxsize=8g reactome/graphdb:latest
```
Nextflow: Follow the installation instructions on the Nextflow website. Java is required.
Cypher Shell: This is required to run Cypher queries against the Neo4j database. Download it from the Neo4j website.
Molstar: Before installing the Mol* packages, be sure to have nodes.js (17+) installed. You must have python and python3 installed. The way to do it depend of your configuration, check on the web. Install all of these packages.

apt-get update && sudo apt-get install -y \
    default-jre \
    wget \
    unzip \
    net-tools \
    ffmpeg \
    pkg-config \
    libx11-dev \
    libxi-dev \
    libcairo2-dev \
    libpango1.0-dev \
    libjpeg-dev \
    libgif-dev \
    librsvg2-dev \
    build-essential \
    libglx-dev \
    libgl-dev \
    libgl1-mesa-glx \
    libgl1-mesa-dri \
    xvfb \
    libsm6 \
    libxext6 \
    libgl1-mesa-dev \
    libosmesa6-dev \
    xorg \
    xserver-xorg \
    libxext-dev \
    libglapi-mesa \
    mesa-utils \

Then you can run :

   cd molstar
   npm install 
   npm run rebuild
   cd ../

AWS: Ensure the connection with an AWS s3 server.

brew install awscli
aws credential

Before launching the pipeline, you may need to install requests python package.

   python3 -m pip install requests
   (you might need to : --break-system-packages)

or use venv

   python3 -m venv 'path/to/venv'
   source path/to/venv/bin/activate
   python3 -m pip install requests

Pipeline Parameters

input: Directory where pre-downloaded AlphaFold .gz files are saved.
output: Directory where PDB files will be saved (default: "Cif_files").
cypherScript: Path to the Cypher query script (default: "queryCyph.cyp").
version: Version of the pipeline (default: "0.1").
cyphershell_version: Version of the CypherShell (default: "5.21.0")
address: Address of the Neo4j database (default: "bolt://localhost:7687").

Workflow

The pipeline consists of several processes:

1. Neo4j Query (`neo4j`)

Queries the Neo4j database using a Cypher script and extracts UniProt IDs.

Output: uniProtID.txt

2. Search and Download Structure (`searchAndDownloadStructure`)

Searches for PDB structures using the extracted UniProt IDs and downloads them.

Inputs: uniProtID Outputs:

PDB structure files (${params.output}/${uniProtID}.cif)
Additional files in ${params.output}/files/

3. Molstar Rendering (`molstar`)

Renders images of PDB structures using Mol*.

Inputs: PDB structure files (*.cif) Outputs: Rendered images in output_molstar/

4. S3 JSON Upload (`s3_json`)

Uploads JSON files to an S3 bucket.

Inputs: path_s Outputs: Files uploaded to s3://download.reactome.org/structures/

5. S3 Videos Upload (`s3_videos`)

Uploads video files to an S3 bucket.

Inputs: path_m Outputs: Files uploaded to s3://download.reactome.org/structures/

Running the Pipeline

Configure Nextflow: Ensure Nextflow is properly installed and configured on your system.
Prepare Cypher Script: Create or modify the Cypher script (queryCyph.cyp) that contains the query to extract UniProt IDs.
Run Nextflow: Execute the following command to start the pipeline:
```
nextflow run main.nf
```
(in case of error think about the -resume)

Example Cypher Script (`queryCyph.cyp`)

An example Cypher script to extract UniProt IDs might look like this:

MATCH (n:Protein)
RETURN n.uniProtID

Notes

Ensure that the Neo4j database is running and accessible at the specified address.
Make sure AWS CLI is configured with the appropriate permissions to upload files to the specified S3 bucket.
The molstar process assumes that webm_renderer.js from the Mol* library is available at the specified path.

For any issues or questions, please refer to the Nextflow documentation or the relevant tool documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.idea		.idea
molstar @ ded5f37		molstar @ ded5f37
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
README.md		README.md
clean_completed_steps.sh		clean_completed_steps.sh
docker-compose.yaml		docker-compose.yaml
main.nf		main.nf
nextflow.config		nextflow.config
nf_cleanup_consumed.py		nf_cleanup_consumed.py
no-structure.txt		no-structure.txt
queryCyph.cyp		queryCyph.cyp
search_pdb.py		search_pdb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nextflow Pipeline for Cif/Pdb File Processing

Prerequisites

Pipeline Parameters

Workflow

1. Neo4j Query (`neo4j`)

2. Search and Download Structure (`searchAndDownloadStructure`)

3. Molstar Rendering (`molstar`)

4. S3 JSON Upload (`s3_json`)

5. S3 Videos Upload (`s3_videos`)

Running the Pipeline

Example Cypher Script (`queryCyph.cyp`)

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nextflow Pipeline for Cif/Pdb File Processing

Prerequisites

Pipeline Parameters

Workflow

1. Neo4j Query (neo4j)

2. Search and Download Structure (searchAndDownloadStructure)

3. Molstar Rendering (molstar)

4. S3 JSON Upload (s3_json)

5. S3 Videos Upload (s3_videos)

Running the Pipeline

Example Cypher Script (queryCyph.cyp)

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Neo4j Query (`neo4j`)

2. Search and Download Structure (`searchAndDownloadStructure`)

3. Molstar Rendering (`molstar`)

4. S3 JSON Upload (`s3_json`)

5. S3 Videos Upload (`s3_videos`)

Example Cypher Script (`queryCyph.cyp`)

Packages