Skip to content

hcvinzo/PlayWrightScraper

Repository files navigation

Analytics Report Automation

A .NET 8 console application that automates browser workflows using Playwright to log in to analytics websites, apply report filters, download CSV files, and return file paths for downstream processing.

Features

  • Browser Automation: Uses Playwright with Chromium for reliable web automation
  • Authentication: Handles login workflows with configurable selectors
  • Filter Management: Applies multiple report filters dynamically
  • CSV Download: Downloads reports and saves them to specified locations
  • Error Handling: Implements retry logic and comprehensive error handling
  • Flexible Configuration: Supports both config files and environment variables
  • Cross-Platform: Runs on Windows, macOS, and Linux with .NET 8

Quick Start

1. Setup and Installation

# Clone or download the project files
# Navigate to project directory

# Run the setup script (Windows PowerShell)
.\setup.ps1

# Or manually:
dotnet restore
dotnet build --configuration Release
pwsh bin/Release/net8.0/playwright.ps1 install

2. Configuration

Option A: Using config.json

{
  "websiteUrl": "https://your-analytics-site.com/login",
  "username": "your-username",
  "password": "your-password",
  "downloadPath": "C:\\Reports\\Downloads",
  "maxRetries": 3,
  "timeoutSeconds": 30,
  "headlessMode": true,
  "filters": {
    "dateRange": "last-30-days",
    "reportType": "summary"
  }
}

Option B: Using Environment Variables

# Windows
set ANALYTICS_URL=https://your-analytics-site.com
set ANALYTICS_USERNAME=your-username
set ANALYTICS_PASSWORD=your-password
set DOWNLOAD_PATH=C:\Reports
set REPORT_FILTERS={"dateRange":"last-30-days","reportType":"summary"}

# Linux/macOS
export ANALYTICS_URL=https://your-analytics-site.com
export ANALYTICS_USERNAME=your-username
export ANALYTICS_PASSWORD=your-password
export DOWNLOAD_PATH=/home/user/reports
export REPORT_FILTERS='{"dateRange":"last-30-days","reportType":"summary"}'

3. Running the Application

# Using config file
dotnet run --project . -- config.json

# Using environment variables
dotnet run --project .

# Published executable
.\publish\AnalyticsReportAutomation.exe config.json

Configuration Options

Property Description Default Required
websiteUrl Login page URL - Yes
username Login username/email - Yes
password Login password - Yes
downloadPath Directory to save CSV files Temp directory No
maxRetries Number of retry attempts 3 No
timeoutSeconds Element timeout in seconds 30 No
headlessMode Run browser in headless mode true No
reportsPageUrl Direct URL to reports page - No
filters Key-value pairs of filters to apply {} No

Advanced Selector Configuration

Customize selectors for your specific website:

{
  "usernameSelector": "input[name='username']",
  "passwordSelector": "input[name='password']", 
  "loginButtonSelector": "button[type='submit']",
  "filterContainerSelector": ".filter-panel",
  "downloadButtonSelector": ".export-csv"
}

Output

The application outputs the absolute path of the downloaded CSV file to stdout on success:

C:\Reports\Downloads\analytics_report_20241218_143052.csv

Error messages are written to stderr, making it easy to integrate with other scripts and systems.

Error Handling

  • Retry Logic: Automatically retries failed operations up to maxRetries times
  • Progressive Delays: Implements exponential backoff between retries
  • Graceful Failures: Provides detailed error messages for troubleshooting
  • Timeout Management: Configurable timeouts for all operations

Integration Examples

PowerShell Integration

$csvPath = & ".\AnalyticsReportAutomation.exe" "config.json"
if ($LASTEXITCODE -eq 0) {
    Write-Host "Report downloaded: $csvPath"
    # Process the CSV file
    Import-Csv $csvPath | ForEach-Object { /* processing logic */ }
} else {
    Write-Error "Report automation failed"
}

Batch File Integration

@echo off
AnalyticsReportAutomation.exe config.json > output.txt 2> error.txt
if %errorlevel% equ 0 (
    set /p CSV_PATH=<output.txt
    echo Success: %CSV_PATH%
    REM Continue with processing
) else (
    echo Error occurred. Check error.txt
    exit /b 1
)

Python Integration

import subprocess
import sys

try:
    result = subprocess.run([
        'AnalyticsReportAutomation.exe', 
        'config.json'
    ], capture_output=True, text=True, check=True)
    
    csv_path = result.stdout.strip()
    print(f"CSV downloaded: {csv_path}")
    
    # Process the CSV
    import pandas as pd
    df = pd.read_csv(csv_path)
    # Continue processing...
    
except subprocess.CalledProcessError as e:
    print(f"Error: {e.stderr}")
    sys.exit(1)

Troubleshooting

Common Issues

  1. Login Failures

    • Verify credentials and website URL
    • Check if the site uses CAPTCHA or 2FA
    • Try running with headlessMode: false to debug visually
  2. Element Not Found

    • Update selectors in config for your specific website
    • The application tries multiple common selector patterns
  3. Download Issues

    • Ensure download directory has write permissions
    • Check if the website requires specific filters before allowing downloads
  4. Timeout Issues

    • Increase timeoutSeconds for slower websites
    • Check network connectivity and website responsiveness

Debug Mode

Run with headlessMode: false to see the browser in action:

{
  "headlessMode": false,
  "timeoutSeconds": 60
}

Security Considerations

  • Store credentials securely (consider Azure Key Vault, environment variables, etc.)
  • Use least-privilege accounts for automation
  • Regularly rotate automation account passwords
  • Monitor for unusual authentication patterns

Deployment

Self-Contained Deployment

dotnet publish -c Release -r win-x64 --self-contained -o publish

Framework-Dependent Deployment

dotnet publish -c Release -o publish

Docker Deployment

FROM mcr.microsoft.com/dotnet/runtime:8.0
WORKDIR /app
COPY publish/ .

# Install Playwright dependencies
RUN apt-get update && apt-get install -y wget
RUN pwsh playwright.ps1 install-deps

ENTRYPOINT ["./AnalyticsReportAutomation"]

Support

For issues related to:

  • Playwright: Check Playwright .NET documentation
  • Website-specific selectors: Use browser developer tools to identify correct selectors
  • Configuration: Refer to the config.json example and this README

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors