This pipeline automates the process of querying a Neo4j database, downloading structures, rendering videos, and
uploading files to an S3 bucket. The pipeline is managed using Nextflow, and requires cypher-shell to interact with
the Neo4j database.
Before running this pipeline, ensure you have the following installed:
- Neo4j: Follow the instruction to install the docker Neo4j Reactome database, don't
forget to run it. If you want to use a Neo4j Deskop modify the address in nextflow.config.
to run with docker launch :
docker run -p 7474:7474 -p 7687:7687 -e NEO4J_dbms_memory_heap_maxsize=8g reactome/graphdb:latest
- Nextflow: Follow the installation instructions on the Nextflow website. Java is required.
- Cypher Shell: This is required to run Cypher queries against the Neo4j database. Download it from the Neo4j website.
- Molstar: Before installing the Mol* packages, be sure to have nodes.js (17+) installed. You must have python and python3 installed. The way to do it depend of your configuration, check on the web. Install all of these packages.
apt-get update && sudo apt-get install -y \
default-jre \
wget \
unzip \
net-tools \
ffmpeg \
pkg-config \
libx11-dev \
libxi-dev \
libcairo2-dev \
libpango1.0-dev \
libjpeg-dev \
libgif-dev \
librsvg2-dev \
build-essential \
libglx-dev \
libgl-dev \
libgl1-mesa-glx \
libgl1-mesa-dri \
xvfb \
libsm6 \
libxext6 \
libgl1-mesa-dev \
libosmesa6-dev \
xorg \
xserver-xorg \
libxext-dev \
libglapi-mesa \
mesa-utils \Then you can run :
cd molstar
npm install
npm run rebuild
cd ../- AWS: Ensure the connection with an AWS s3 server.
brew install awscli
aws credential- Before launching the pipeline, you may need to install requests python package.
python3 -m pip install requests
(you might need to : --break-system-packages)or use venv
python3 -m venv 'path/to/venv'
source path/to/venv/bin/activate
python3 -m pip install requests input: Directory where pre-downloaded AlphaFold .gz files are saved.output: Directory where PDB files will be saved (default: "Cif_files").cypherScript: Path to the Cypher query script (default: "queryCyph.cyp").version: Version of the pipeline (default: "0.1").cyphershell_version: Version of the CypherShell (default: "5.21.0")address: Address of the Neo4j database (default: "bolt://localhost:7687").
The pipeline consists of several processes:
Queries the Neo4j database using a Cypher script and extracts UniProt IDs.
Output: uniProtID.txt
Searches for PDB structures using the extracted UniProt IDs and downloads them.
Inputs: uniProtID
Outputs:
- PDB structure files (
${params.output}/${uniProtID}.cif) - Additional files in
${params.output}/files/
Renders images of PDB structures using Mol*.
Inputs: PDB structure files (*.cif)
Outputs: Rendered images in output_molstar/
Uploads JSON files to an S3 bucket.
Inputs: path_s
Outputs: Files uploaded to s3://download.reactome.org/structures/
Uploads video files to an S3 bucket.
Inputs: path_m
Outputs: Files uploaded to s3://download.reactome.org/structures/
-
Configure Nextflow: Ensure Nextflow is properly installed and configured on your system.
-
Prepare Cypher Script: Create or modify the Cypher script (
queryCyph.cyp) that contains the query to extract UniProt IDs. -
Run Nextflow: Execute the following command to start the pipeline:
nextflow run main.nf
(in case of error think about the -resume)
An example Cypher script to extract UniProt IDs might look like this:
MATCH (n:Protein)
RETURN n.uniProtID- Ensure that the Neo4j database is running and accessible at the specified address.
- Make sure AWS CLI is configured with the appropriate permissions to upload files to the specified S3 bucket.
- The
molstarprocess assumes thatwebm_renderer.jsfrom the Mol* library is available at the specified path.
For any issues or questions, please refer to the Nextflow documentation or the relevant tool documentation.