LCAS EPrint Cache

This repository automatically exports and caches publication data from Figshare for LCAS (Lincoln Centre for Autonomous Systems) researchers.

Overview

The system:

Retrieves publication metadata from Figshare repository
Processes author information and generates BibTeX entries
Exports data in CSV and BibTeX formats
Publishes to Nexus repository for public access

Setup

Prerequisites

Python 3.10+
Figshare API token (required)

Configuration

Figshare API Token

This application requires a Figshare API token to function properly. To set up:

Create a Figshare account: Visit https://figshare.com and create an account
Generate an API token:
- Log in to Figshare
- Go to Account Settings → Applications
- Create a new personal token
- Copy the token securely
For local development: Set the environment variable
```
export FIGSHARE_TOKEN="your_token_here"
```
For GitHub Actions: Add the token as a repository secret named FIGSHARE_TOKEN
- Go to repository Settings → Secrets and variables → Actions
- Create a new secret named FIGSHARE_TOKEN
- Paste your Figshare API token

Note: Without a valid API token, requests to the Figshare API will fail with 403 errors.

Installation

# Install dependencies
pip install -r requirements-frozen.txt

Usage

Command Line

# Run with default authors list
python figshare.py

# Run with specific authors
python figshare.py --authors "Marc Hanheide" "Tom Duckett"

# Run with authors from file
python figshare.py --authors-file staff.json

# Force refresh (ignore cache)
python figshare.py --force-refresh

# Adjust rate limiting (default is 1 second delay between requests)
python figshare.py --rate-limit-delay 2.0

# Enable debug logging
python figshare.py --debug

# Custom output filenames
python figshare.py --output my_articles.csv --output-all my_articles_all.csv

Arguments

-a, --authors: List of author names to process
-f, --authors-file: Path to file containing author names (one per line)
-s, --since: Process only publications since this date (YYYY-MM-DD), default: 2021-01-01
-o, --output: Output CSV filename for deduplicated publications, default: figshare_articles.csv
-O, --output-all: Output CSV filename for all publications (with duplicates), default: figshare_articles_all.csv
--force-refresh: Force refresh data instead of loading from cache
--rate-limit-delay: Delay in seconds between Figshare API requests, default: 1.0
--debug: Enable debug logging

Output Files

The script generates several output files:

lcas.bib: Combined BibTeX file with all publications (deduplicated)
figshare_articles.csv: CSV with deduplicated articles
figshare_articles_all.csv: CSV with all articles (includes duplicates when multiple authors)
{author_name}.bib: Individual BibTeX files per author
{author_name}.csv: Individual CSV files per author
{author_name}.db: Cached data per author (shelve database)

Cache Files

The application uses several cache files to minimize API calls:

figshare_cache.pkl: Cached Figshare API responses
bibtext_cache: Cached BibTeX entries from DOI lookups
shortdoi_cache: Cached short DOI mappings
crossref_cache.db: Cached Crossref API responses for DOI guessing

GitHub Actions Workflow

The workflow runs automatically:

Weekly on Tuesdays at 02:30 UTC (uses cache by default)
On push to main branch (uses cache by default)
On pull requests (uses cache by default)
Can be manually triggered via workflow_dispatch with optional force refresh

Manual Workflow Trigger

When manually triggering the workflow:

Go to Actions → figshare-cache workflow
Click "Run workflow"
Choose whether to force refresh:
- false (default): Uses cached data, faster and respects rate limits
- true: Ignores cache and fetches fresh data from Figshare API

Note: Force refresh should only be used when you need to ensure the latest data, as it makes many API requests and takes longer to complete.

Workflow Steps

Checkout repository
Restore cache
Install Python dependencies
Run Figshare exporter (with or without --force-refresh based on trigger)
Publish results to Nexus repository
Upload artifacts

Rate Limiting

The script includes built-in rate limiting with a 1-second delay between API requests to avoid hitting Figshare API rate limits. This helps ensure reliable operation even with authenticated requests.

Troubleshooting

403 Forbidden Errors

If you encounter 403 errors when accessing the Figshare API:

Ensure the FIGSHARE_TOKEN environment variable is set
Verify the token is valid and hasn't expired
Check that the token has appropriate permissions (read access to public articles)

For detailed information about the 403 error and resolution steps, see FIGSHARE_API_RESEARCH.md.

Empty Results

If no articles are found:

Check that author names match exactly as they appear in Figshare
Verify the articles are in the Lincoln repository (https://repository.lincoln.ac.uk)
Use --debug flag for detailed logging

JSON Decode Errors

The application includes validation for JSON responses. If issues persist:

Check your internet connection
Verify Figshare API is accessible
Review logs for specific error messages

Development

Running Tests

# Run with a single test author
python figshare.py --authors "Marc Hanheide" --debug

Code Structure

figshare.py: Main script with FigShare API client and processing logic
doi2bib: Class for DOI to BibTeX conversion
FigShare: Class for Figshare API interactions
Author: Class for author-specific processing

License

[Add license information here]

Contact

For issues or questions, please open an issue in the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 214 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
README.md		README.md
author.py		author.py
config.py		config.py
doi2bib.py		doi2bib.py
doi_utils.py		doi_utils.py
figshare_api.py		figshare_api.py
figshare_bibtex.py		figshare_bibtex.py
figshare_fetch.py		figshare_fetch.py
generate_stats.py		generate_stats.py
lcas-bib-export-generator-eprints.py		lcas-bib-export-generator-eprints.py
lcas-bib-export-generator.py		lcas-bib-export-generator.py
lcas-bib.json		lcas-bib.json
orcid_to_bibtex.py		orcid_to_bibtex.py
requirements-frozen.txt		requirements-frozen.txt
requirements.txt		requirements.txt
staff.json		staff.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LCAS EPrint Cache

Overview

Setup

Prerequisites

Configuration

Figshare API Token

Installation

Usage

Command Line

Arguments

Output Files

Cache Files

GitHub Actions Workflow

Manual Workflow Trigger

Workflow Steps

Rate Limiting

Troubleshooting

403 Forbidden Errors

Empty Results

JSON Decode Errors

Development

Running Tests

Code Structure

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

LCAS/eprint_cache

Folders and files

Latest commit

History

Repository files navigation

LCAS EPrint Cache

Overview

Setup

Prerequisites

Configuration

Figshare API Token

Installation

Usage

Command Line

Arguments

Output Files

Cache Files

GitHub Actions Workflow

Manual Workflow Trigger

Workflow Steps

Rate Limiting

Troubleshooting

403 Forbidden Errors

Empty Results

JSON Decode Errors

Development

Running Tests

Code Structure

License

Contact

About

Resources

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages