This repository automatically exports and caches publication data from Figshare for LCAS (Lincoln Centre for Autonomous Systems) researchers.
The system:
- Retrieves publication metadata from Figshare repository
- Processes author information and generates BibTeX entries
- Exports data in CSV and BibTeX formats
- Publishes to Nexus repository for public access
- Python 3.10+
- Figshare API token (required)
This application requires a Figshare API token to function properly. To set up:
- Create a Figshare account: Visit https://figshare.com and create an account
- Generate an API token:
- Log in to Figshare
- Go to Account Settings → Applications
- Create a new personal token
- Copy the token securely
- For local development: Set the environment variable
export FIGSHARE_TOKEN="your_token_here"
- For GitHub Actions: Add the token as a repository secret named
FIGSHARE_TOKEN- Go to repository Settings → Secrets and variables → Actions
- Create a new secret named
FIGSHARE_TOKEN - Paste your Figshare API token
Note: Without a valid API token, requests to the Figshare API will fail with 403 errors.
# Install dependencies
pip install -r requirements-frozen.txt# Run with default authors list
python figshare.py
# Run with specific authors
python figshare.py --authors "Marc Hanheide" "Tom Duckett"
# Run with authors from file
python figshare.py --authors-file staff.json
# Force refresh (ignore cache)
python figshare.py --force-refresh
# Adjust rate limiting (default is 1 second delay between requests)
python figshare.py --rate-limit-delay 2.0
# Enable debug logging
python figshare.py --debug
# Custom output filenames
python figshare.py --output my_articles.csv --output-all my_articles_all.csv-a, --authors: List of author names to process-f, --authors-file: Path to file containing author names (one per line)-s, --since: Process only publications since this date (YYYY-MM-DD), default: 2021-01-01-o, --output: Output CSV filename for deduplicated publications, default: figshare_articles.csv-O, --output-all: Output CSV filename for all publications (with duplicates), default: figshare_articles_all.csv--force-refresh: Force refresh data instead of loading from cache--rate-limit-delay: Delay in seconds between Figshare API requests, default: 1.0--debug: Enable debug logging
The script generates several output files:
lcas.bib: Combined BibTeX file with all publications (deduplicated)figshare_articles.csv: CSV with deduplicated articlesfigshare_articles_all.csv: CSV with all articles (includes duplicates when multiple authors){author_name}.bib: Individual BibTeX files per author{author_name}.csv: Individual CSV files per author{author_name}.db: Cached data per author (shelve database)
The application uses several cache files to minimize API calls:
figshare_cache.pkl: Cached Figshare API responsesbibtext_cache: Cached BibTeX entries from DOI lookupsshortdoi_cache: Cached short DOI mappingscrossref_cache.db: Cached Crossref API responses for DOI guessing
The workflow runs automatically:
- Weekly on Tuesdays at 02:30 UTC (uses cache by default)
- On push to main branch (uses cache by default)
- On pull requests (uses cache by default)
- Can be manually triggered via workflow_dispatch with optional force refresh
When manually triggering the workflow:
- Go to Actions → figshare-cache workflow
- Click "Run workflow"
- Choose whether to force refresh:
- false (default): Uses cached data, faster and respects rate limits
- true: Ignores cache and fetches fresh data from Figshare API
Note: Force refresh should only be used when you need to ensure the latest data, as it makes many API requests and takes longer to complete.
- Checkout repository
- Restore cache
- Install Python dependencies
- Run Figshare exporter (with or without --force-refresh based on trigger)
- Publish results to Nexus repository
- Upload artifacts
The script includes built-in rate limiting with a 1-second delay between API requests to avoid hitting Figshare API rate limits. This helps ensure reliable operation even with authenticated requests.
If you encounter 403 errors when accessing the Figshare API:
- Ensure the
FIGSHARE_TOKENenvironment variable is set - Verify the token is valid and hasn't expired
- Check that the token has appropriate permissions (read access to public articles)
For detailed information about the 403 error and resolution steps, see FIGSHARE_API_RESEARCH.md.
If no articles are found:
- Check that author names match exactly as they appear in Figshare
- Verify the articles are in the Lincoln repository (https://repository.lincoln.ac.uk)
- Use
--debugflag for detailed logging
The application includes validation for JSON responses. If issues persist:
- Check your internet connection
- Verify Figshare API is accessible
- Review logs for specific error messages
# Run with a single test author
python figshare.py --authors "Marc Hanheide" --debugfigshare.py: Main script with FigShare API client and processing logicdoi2bib: Class for DOI to BibTeX conversionFigShare: Class for Figshare API interactionsAuthor: Class for author-specific processing
[Add license information here]
For issues or questions, please open an issue in the GitHub repository.