Skip to content

reactome/structure-video-pipeline

 
 

Repository files navigation

Nextflow Pipeline for Cif/Pdb File Processing

This pipeline automates the process of querying a Neo4j database, downloading structures, rendering videos, and uploading files to an S3 bucket. The pipeline is managed using Nextflow, and requires cypher-shell to interact with the Neo4j database.

Prerequisites

Before running this pipeline, ensure you have the following installed:

  1. Neo4j: Follow the instruction to install the docker Neo4j Reactome database, don't forget to run it. If you want to use a Neo4j Deskop modify the address in nextflow.config. to run with docker launch :
    docker run -p 7474:7474 -p 7687:7687 -e NEO4J_dbms_memory_heap_maxsize=8g reactome/graphdb:latest
  2. Nextflow: Follow the installation instructions on the Nextflow website. Java is required.
  3. Cypher Shell: This is required to run Cypher queries against the Neo4j database. Download it from the Neo4j website.
  4. Molstar: Before installing the Mol* packages, be sure to have nodes.js (17+) installed. You must have python and python3 installed. The way to do it depend of your configuration, check on the web. Install all of these packages.
apt-get update && sudo apt-get install -y \
    default-jre \
    wget \
    unzip \
    net-tools \
    ffmpeg \
    pkg-config \
    libx11-dev \
    libxi-dev \
    libcairo2-dev \
    libpango1.0-dev \
    libjpeg-dev \
    libgif-dev \
    librsvg2-dev \
    build-essential \
    libglx-dev \
    libgl-dev \
    libgl1-mesa-glx \
    libgl1-mesa-dri \
    xvfb \
    libsm6 \
    libxext6 \
    libgl1-mesa-dev \
    libosmesa6-dev \
    xorg \
    xserver-xorg \
    libxext-dev \
    libglapi-mesa \
    mesa-utils \

Then you can run :

   cd molstar
   npm install 
   npm run rebuild
   cd ../
  1. AWS: Ensure the connection with an AWS s3 server.
brew install awscli
aws credential
  1. Before launching the pipeline, you may need to install requests python package.
   python3 -m pip install requests
   (you might need to : --break-system-packages)

or use venv

   python3 -m venv 'path/to/venv'
   source path/to/venv/bin/activate
   python3 -m pip install requests 

Pipeline Parameters

  • input: Directory where pre-downloaded AlphaFold .gz files are saved.
  • output: Directory where PDB files will be saved (default: "Cif_files").
  • cypherScript: Path to the Cypher query script (default: "queryCyph.cyp").
  • version: Version of the pipeline (default: "0.1").
  • cyphershell_version: Version of the CypherShell (default: "5.21.0")
  • address: Address of the Neo4j database (default: "bolt://localhost:7687").

Workflow

The pipeline consists of several processes:

1. Neo4j Query (neo4j)

Queries the Neo4j database using a Cypher script and extracts UniProt IDs.

Output: uniProtID.txt

2. Search and Download Structure (searchAndDownloadStructure)

Searches for PDB structures using the extracted UniProt IDs and downloads them.

Inputs: uniProtID Outputs:

  • PDB structure files (${params.output}/${uniProtID}.cif)
  • Additional files in ${params.output}/files/

3. Molstar Rendering (molstar)

Renders images of PDB structures using Mol*.

Inputs: PDB structure files (*.cif) Outputs: Rendered images in output_molstar/

4. S3 JSON Upload (s3_json)

Uploads JSON files to an S3 bucket.

Inputs: path_s Outputs: Files uploaded to s3://download.reactome.org/structures/

5. S3 Videos Upload (s3_videos)

Uploads video files to an S3 bucket.

Inputs: path_m Outputs: Files uploaded to s3://download.reactome.org/structures/

Running the Pipeline

  1. Configure Nextflow: Ensure Nextflow is properly installed and configured on your system.

  2. Prepare Cypher Script: Create or modify the Cypher script (queryCyph.cyp) that contains the query to extract UniProt IDs.

  3. Run Nextflow: Execute the following command to start the pipeline:

    nextflow run main.nf

    (in case of error think about the -resume)

Example Cypher Script (queryCyph.cyp)

An example Cypher script to extract UniProt IDs might look like this:

MATCH (n:Protein)
RETURN n.uniProtID

Notes

  • Ensure that the Neo4j database is running and accessible at the specified address.
  • Make sure AWS CLI is configured with the appropriate permissions to upload files to the specified S3 bucket.
  • The molstar process assumes that webm_renderer.js from the Mol* library is available at the specified path.

For any issues or questions, please refer to the Nextflow documentation or the relevant tool documentation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 60.0%
  • Nextflow 20.8%
  • Dockerfile 6.9%
  • Cypher 6.2%
  • Shell 6.1%