- 1. Introduction
- Recent Changes & Architectural Enhancements
- 2. System Architecture
- 3. Data Model (Google Cloud Firestore)
- 4. Component Deep Dive:
LoginManager.py - 5. Component Deep Dive:
AppIdApplicationOrchestrator.py- 5.1. Overview & Responsibilities
- 5.2. Deployment & Execution Environment
- 5.3. Optimized In-Memory User Cache
- 5.4. Real-time Firestore Listener
- 5.5. Job Polling & Discovery
- 5.6. Schedule Retrieval
- 5.7. Enhanced User-Job-Schedule Matching Logic
- 5.8. Application Submission Workflow (API & WebSocket)
- 5.9. GlobalJobIdMap Management
- 5.10. Concurrency Model
- 5.11. Recently Processed Job-Schedule Cache
- 5.12. Proactive Token Expiry Management
- 5.13. Logging & Error Handling
- 6. Workflow Diagrams
- 7. Configuration & Environment Variables
- 8. Setup, Running & Maintenance
- 9. Security Considerations
- 10. Future Enhancements & Roadmap
The system has undergone significant architectural improvements to enhance speed, reliability, and scalability. The major enhancements are:
Purpose: Dramatically reduce application submission latency by pre-creating application IDs during user login rather than during the critical job discovery phase.
Key Changes:
- LoginManager Enhancement: After successful login, the system now analyzes the user's preferences against a
GlobalJobIdMapand pre-creates applications for all matching job types using thecreate-applicationAPI withscheduleId: null. - Firestore Storage: Pre-created application IDs are stored in the
pre_created_applicationsfield within each user'sUserJobProfilesdocument. - Orchestrator Optimization:
AppIdApplicationOrchestrator.pynow checks for pre-existing application IDs before attempting to create new ones, skipping the create step and directly proceeding to the update step. - Performance Impact: Reduces job-to-application latency by 50-200ms per application by eliminating the create API call during the critical path.
Purpose: Centralized mapping system that enables pre-creation and allows the system to dynamically learn about new job types.
Architecture:
- New Firestore Collection:
SystemConfiguration→GlobalJobIdMap_CAdocument containing:job_map: Maps structured keys (LOCATION_UPPERCASE-HOUR_TYPE_UPPERCASE) to job IDsreverse_job_map: Maps job IDs to structured keyslast_updated_timestamp: Tracking updates
- Dynamic Learning: When the orchestrator encounters unknown job IDs, it derives the structured key from job schedules and updates the global map
- Consistency: Transactional updates ensure data consistency across multiple orchestrator instances
Purpose: Replace O(n×m) user iteration with O(1) indexed lookups for dramatic performance improvements.
Implementation:
- Indexed Caches:
location_to_users_index: Maps locations to sets of interested user IDshours_to_users_index: Maps hour preferences to sets of user IDs
- Algorithm Enhancement: Instead of checking every user against every job, the system now:
- Gets candidate users by location lookup:
location_to_users_index[job_location] - Gets users by hour preference:
hours_to_users_index[schedule_type] - Computes intersection for final candidates
- Gets candidate users by location lookup:
- Performance Impact: Reduces user-job matching time from O(n) to O(1) for typical job counts
Purpose: Provide more granular and context-aware error handling, especially for application conflicts.
Key Improvements:
- APPLICATION_ALREADY_EXIST_CAN_BE_RESET Handling:
- LoginManager Context: Sets user status to
LOGIN_FAILEDwith descriptive error message - Orchestrator Context: Sets user status to
APPLICATION_FAILEDand logs the conflict
- LoginManager Context: Sets user status to
- Auth Failure Differentiation: Clear distinction between token expiry vs. other API failures
- Contextual Logging: Error messages now include source context (LoginManager vs. Orchestrator)
Purpose: Comprehensive performance tracking for optimization and troubleshooting.
New Monitoring:
- Timing Logs: Millisecond-precision timing for all major operations:
get_schedule: Schedule fetching durationexecute_api_calls: Create and update API call timingconnect_and_update_via_websocket: WebSocket workflow durationapply_for_job_and_update_status: End-to-end application timingload_or_refresh_global_job_id_map: GlobalJobIdMap operationsprocess_single_job_concurrently: User matching performance
- PERF Log Level: Dedicated performance log category for easy filtering
- Cache Debug Logs: Detailed logging of index updates and cache state changes
Migration Path:
- Backward Compatibility: New
AppIdApplicationOrchestrator.pymaintains full compatibility with existing data structures - Gradual Rollout: Pre-creation system gracefully handles cases where applications don't exist, falling back to the original create-then-update flow
- Zero Downtime: Changes can be deployed without service interruption
This document provides a comprehensive technical description of the Automated Job Application System. It details the system's architecture, components, data models, workflows, and operational aspects. It is intended for developers, maintainers, and anyone seeking to understand the inner workings of the system.
The primary purpose of this system is to automate the process of discovering and applying for job postings on a target website (e.g., Amazon Hiring). It aims to achieve this with high speed and reliability on behalf of multiple configured users, maximizing their chances of securing positions, especially those filled on a first-come, first-served basis.
Core Goals:
- Speed & Efficiency: Minimize latency from job discovery to application submission through asynchronous operations, optimized matching, and pre-created applications.
- Reliability & Robustness: Ensure continuous operation with comprehensive error handling, session management, and fault tolerance.
- Scalability: Design components to handle an increasing number of users and job application volumes through indexed caching and optimized algorithms.
- Accuracy: Precisely match jobs to user-defined preferences (location, hours).
- Security: Protect user credentials and sensitive session data at all stages.
- Maintainability: Provide clear logging and a modular design for ease of maintenance and future development.
In Scope:
- Secure storage and management of user credentials and job preferences.
- Automated login to the target job website with pre-creation of applications for matching job types.
- Real-time (high-frequency polling) job monitoring from a specified GraphQL API.
- Intelligent matching of jobs against user profiles using optimized indexed algorithms (location, job hours).
- High-speed, concurrent application submissions via the target website's internal APIs (REST and WebSocket).
- Dynamic learning and management of job type mappings through the GlobalJobIdMap system.
- Detailed logging of all operations, attempts, successes, and failures to Google Firestore and console.
- Management of user session validity, including proactive and reactive re-login triggers.
- Performance monitoring and optimization through comprehensive timing logs.
Out of Scope (for current version):
- A web-based administration UI (planned for the future).
- Direct interaction with email services for OTPs (currently uses a third-party API for OTPs associated with temporary emails).
- Advanced AI/ML for job description analysis or preference learning.
- Support for multiple target job websites or job source APIs without code modification.
| Term | Definition |
|---|---|
| User | An individual whose job preferences and credentials are managed by the system. |
user_id |
A unique identifier for each user, consistent across Firestore collections (typically the document ID). |
| Target Job Website | The specific website (e.g., hiring.amazon.ca) where users have accounts and jobs are applied for. |
| Job Source API | The external GraphQL API used to discover new job postings. |
candidate_id |
A user-specific identifier on the Target Job Website. |
| Session Data | website_access_token and website_cookies required to interact with the Target Job Website's APIs. |
| Pre-created Application | An application ID created during login for a specific job type, stored for later use during job discovery. |
| GlobalJobIdMap | Centralized mapping system that correlates structured keys (location-hour_type) with job IDs. |
| Structured Key | A standardized format (LOCATION_UPPERCASE-HOUR_TYPE_UPPERCASE) used for job type identification. |
| Firestore | Google Cloud Firestore: a NoSQL cloud database used for data storage and real-time synchronization. |
| Selenium | Browser automation framework used for logging into the Target Job Website. |
| ChromeDriver | A WebDriver executable that Selenium uses to control Google Chrome. |
| OTP | One-Time Password, used for login verification. |
| EC2 | Amazon Elastic Compute Cloud, where the ApplicationOrchestrator runs. |
| API | Application Programming Interface. |
| REST API | Representational State Transfer API. |
| WebSocket API | A communication protocol providing full-duplex communication channels over a single TCP connection. |
| GraphQL | A query language for APIs and a server-side runtime for executing those queries. |
aiohttp |
An asynchronous HTTP client/server framework for Python. |
websockets (library) |
A Python library for building WebSocket clients and servers. |
| Epoch Milliseconds (ms) | Time represented as the number of milliseconds that have elapsed since January 1, 1970, at 00:00:00 UTC. |
The system employs a decoupled, service-oriented architecture with significant performance and reliability enhancements. Two primary Python services, LoginManager.py and AppIdApplicationOrchestrator.py, operate independently but coordinate their activities through Google Cloud Firestore and the new GlobalJobIdMap system. This design allows for specialized deployment environments: the LoginManager (requiring a GUI for Selenium) runs locally, while the AppIdApplicationOrchestrator (headless, I/O-bound) runs on a cloud server (AWS EC2).
Firestore acts as the central nervous system, storing all persistent data (user profiles, preferences, session tokens, application logs, job mappings) and facilitating real-time updates between components.
-
LoginManager.py(Local Machine) - Enhanced:- Purpose: Manages user login sessions and pre-creates applications for matching job types.
- New Functionality: After successful login, analyzes user preferences against the GlobalJobIdMap and pre-creates applications using the create-application API with
scheduleId: null. - Deployment: Runs on a user's local machine (Windows, macOS, or Linux with a desktop environment).
- Interaction: Reads user credentials and GlobalJobIdMap from Firestore, performs login via Selenium, pre-creates applications, and writes updated session data and pre-created application IDs back to Firestore.
-
AppIdApplicationOrchestrator.py(AWS EC2) - Optimized:- Purpose: Discovers new jobs, matches them against active user profiles using optimized algorithms, and submits applications at high speed using pre-created application IDs when available.
- New Features:
- Utilizes indexed user caches for O(1) user-job matching
- Manages and updates the GlobalJobIdMap dynamically
- Leverages pre-created application IDs to skip create API calls
- Deployment: Runs on a headless Linux server (e.g., AWS EC2 instance).
- Interaction:
- Maintains optimized in-memory caches with preference indexes, synchronized with Firestore via a real-time listener.
- Polls an external Job Source API (GraphQL).
- Manages the GlobalJobIdMap for job type discovery and learning.
- Interacts with the Target Job Website's internal APIs (REST and WebSocket) for application submission.
- Logs all application attempts and outcomes to Firestore with detailed performance metrics.
-
Google Cloud Firestore (Cloud Database) - Extended:
- Purpose: Centralized data persistence, state synchronization, and job type mapping.
- Key Roles:
- Stores
UserJobProfiles(credentials, preferences, session details, status, pre-created applications). - Stores
ApplicationLogs(detailed records of each application attempt with performance metrics). - Stores
SystemConfiguration(GlobalJobIdMap for job type management). - Enables real-time updates to the
ApplicationOrchestrator's optimized cache when user data changes.
- Stores
-
External APIs:
- Job Source API (GraphQL): Polled by
ApplicationOrchestratorfor new job listings. - Target Job Website APIs (REST & WebSocket): Used by both components for application creation/submission.
- OTP API (
temporamail.co): Used byLoginManagerto retrieve OTPs for specific email accounts.
- Job Source API (GraphQL): Polled by
| Category | Technology | Version (Typical) | Justification |
|---|---|---|---|
| Programming Language | Python | 3.9+ | Rich ecosystem, excellent libraries for web automation (Selenium), async operations (asyncio), and cloud integration. |
| Browser Automation | Selenium WebDriver | Latest | Industry standard for browser automation, necessary for complex login flows on the Target Job Website. |
| ChromeDriver | Matching Chrome | WebDriver for Google Chrome. | |
| Asynchronous HTTP | aiohttp |
Latest | High-performance asynchronous HTTP client for the ApplicationOrchestrator, enabling concurrent API calls. |
websockets (library) |
Latest | Efficient Python library for WebSocket client interactions, used by ApplicationOrchestrator. |
|
| Database | Google Cloud Firestore | N/A (Cloud Service) | Scalable, real-time NoSQL database. Excellent for distributed systems and synchronizing state. |
| Cloud SDK | google-cloud-firestore, firebase-admin |
Latest | Official Python libraries for interacting with Firestore. |
| Environment Mgmt | python-dotenv |
Latest | Manages environment variables from a .env file for configuration. |
| Deployment (Local) | Standard Python Environment | N/A | LoginManager runs directly. |
| Deployment (Cloud) | AWS EC2 (e.g., t3.micro/small with Amazon Linux 2) | N/A | Cost-effective and reliable platform for running the headless ApplicationOrchestrator. |
| Process Management | systemd (Linux) or supervisor (Alternative) |
System Default | Ensures the ApplicationOrchestrator service runs continuously and restarts on failure on the EC2 instance. |
| Configuration | .env file, Python constants |
N/A | Easy to manage and deploy different configurations. |
Firestore is used to store all persistent data for the system. The primary collections are:
Stores all information related to a user's profile, their preferences for job applications, their credentials for the target job website, their current session status, and their pre-created applications.
- Document ID:
user_id(String) - A unique identifier for the user (e.g., "user_alpha", "john_doe"). - Fields:
email: (String) The email address used for logging into the Target Job Website. Example: "user@example.com"pin: (String) The PIN or password for the Target Job Website. Note: TheLoginManager.pyexpects this in plaintext. For production, consider encrypting this field and decrypting it only within theLoginManager.token: (String) A secondary identifier, often the same asuser_idor a related token used for OTP retrieval (e.g., the part of the email before "@" if using a service liketemporamail.co). Example: "user_alpha"status: (String) Current operational status of the user profile. Enum:ACTIVE: Profile is active, session is expected to be valid. Processed byApplicationOrchestrator.INACTIVE: Profile is manually disabled. Ignored by all services.LOGIN_REQUIRED: Session is invalid, expired, or initial login needed. Processed byLoginManager.LOGIN_IN_PROGRESS:LoginManageris currently attempting to log this user in.LOGIN_FAILED: The last login attempt byLoginManagerfailed.APPLICATION_SUCCESS: TheApplicationOrchestratorsuccessfully submitted an application and completed the WebSocket workflow.APPLICATION_FAILED: TheApplicationOrchestratorencountered an application conflict (e.g., APPLICATION_ALREADY_EXIST_CAN_BE_RESET).SHIFT_PICKED: TheApplicationOrchestratorsubmitted an application, but the WebSocket workflow indicated a "shift picked" state (may imply success or a specific intermediate state).
website_access_token: (String, Nullable) The authentication token (e.g., Bearer token) obtained after successful login to the Target Job Website. Used for API calls byApplicationOrchestrator.website_cookies: (String (JSON) or Array of Objects, Nullable) Cookies obtained from the Target Job Website session. Stored as a JSON string byLoginManagerif they are a list of dicts.ApplicationOrchestratorparses this. Example (as JSON string):'[{"name": "session_id", "value": "xyz123"}, ...]'candidate_id: (String, Nullable) The unique identifier for the user on the Target Job Website (e.g., Amazon'sbbCandidateId).user_agent: (String) The User-Agent string to be used for API calls and WebSocket connections, mimicking the browser used for login. Example: "Mozilla/5.0..."login_expiry_timestamp: (Integer, Nullable) Epoch milliseconds timestamp indicating when the currentwebsite_access_tokenandwebsite_cookiesare expected to expire.last_login_attempt_timestamp: (Timestamp, Nullable) Firestore server timestamp of the last login attempt byLoginManager.last_login_success_timestamp: (Timestamp, Nullable) Firestore server timestamp of the last successful login byLoginManager.last_error_message: (String, Nullable) Details of the last error encountered byLoginManagerorApplicationOrchestratorfor this user.location_preferences: (Array of Strings) A list of preferred location codes/cities. Case-insensitive matching is performed. "ANY" is a special value. Example:["Toronto", "Vancouver", "ANY"]job_hours_preferences: (String) User's preference for job hours. Case-insensitive matching. Expected values for direct matching:"FULL_TIME","PART_TIME","FLEX_TIME". Special values:"ANY","PART_TIME_OR_FLEX". Example: "ANY"pre_created_applications: (Map, New) Stores pre-created application IDs organized by job ID for faster application submission.- Key:
job_id(String) - The job ID from the GlobalJobIdMap - Value: (Map) containing:
application_id: (String) The pre-created application IDretrieved_at_timestamp: (Timestamp) When this application ID was created/verifiedsource_job_key: (String) The structured key from GlobalJobIdMap that matched this job
- Key:
application_details: (Map, Nullable) Stores details of the last successful application submission.application_id: (String) The ID of the application created on the Target Job Website.job: (Map) Information about the job applied for (e.g.,jobId,jobTitle).schedule: (Map) Information about the schedule chosen (e.g.,scheduleId,hoursPerWeek).updated_at: (Timestamp) Firestore server timestamp of when these details were last updated.updated_at_iso: (String) ISO 8601 formatted string ofupdated_at.
created_at: (Timestamp) Firestore server timestamp when the user profile was created.updated_at: (Timestamp) Firestore server timestamp of the last update to this document.
Records every job application attempt made by the ApplicationOrchestrator, providing a detailed audit trail with performance metrics.
- Document ID: Auto-generated by Firestore.
- Fields:
user_id: (String) Foreign key referencing theuser_idinUserJobProfiles.job_id_external: (String) The external identifier of the job from the Job Source API.schedule_id_external: (String, Nullable) The external identifier of the schedule chosen for the job.application_id: (String, Nullable) The application ID returned by the Target Job Website's API after successful creation.applied_at_timestamp: (Timestamp) Firestore server timestamp when the application attempt was made.status: (String) Outcome of the application attempt. Enum:SUCCESS_API: API calls (create-application,update-application) were successful.SUCCESS_WS: WebSocket workflow completed successfully after successful API calls.FAILED_AUTH: Authentication failed during API calls (token likely expired).FAILED_API: API calls failed for reasons other than auth (e.g., server error, bad request, schedule unavailable).FAILED_WS: WebSocket connection or workflow failed after successful API calls.SKIPPED_RECENTLY_PROCESSED: Skipped because this job-schedule was processed recently for another user.- (Other specific failure reasons could be added)
api_response_status_code: (Integer, Nullable) HTTP status code from the primary application API call (e.g.,create-applicationorupdate-application).api_response_body: (String, Nullable) A snippet or full body of the API response, especially for errors.error_details: (String, Nullable) A more specific error message or traceback snippet.matched_preferences: (Map, Nullable) Details of how the user's preferences matched the job/schedule.location_matched_on: (String) The specific location that matched.user_location_preferences: (Array of Strings) User's original location preferences.user_hours_preference: (String) User's original hours preference.matched_schedule_type: (String) The type of the schedule that matched (e.g., "FLEX_TIME").matched_schedule_hours: (Number) The hours per week of the matched schedule.
attempt_duration_ms: (Float, Nullable) Time taken for the entire application attempt for this user and job, in milliseconds.country_code: (String) The country code (e.g., "CA", "US") for which the application was made.
New collection that stores system-wide configuration and mapping data.
- Purpose: Centralized mapping between structured job keys and job IDs for Canada region.
- Fields:
job_map: (Map) Forward mapping from structured keys to job IDs- Key: Structured key format (
LOCATION_UPPERCASE-HOUR_TYPE_UPPERCASE) - Value: Job ID (e.g., "JOB-CA-0000000354")
- Example:
"TORONTO-FULL_TIME": "JOB-CA-0000000267"
- Key: Structured key format (
reverse_job_map: (Map) Reverse mapping from job IDs to structured keys- Key: Job ID
- Value: Structured key
- Example:
"JOB-CA-0000000267": "TORONTO-FULL_TIME"
last_updated_timestamp: (Timestamp) Firestore server timestamp of the last updatesource_script: (String, Optional) Identifies which script last updated the maplast_upload_time_utc: (String, Optional) ISO timestamp for manual tracking
UserJobProfilesCollection:- Composite index on
(status == "ACTIVE"). Used byApplicationOrchestratorfor initial cache load and by the listener. - Composite index on
(status == "LOGIN_REQUIRED"). Used byLoginManager. - Composite index on
(status == "ACTIVE", login_expiry_timestamp <= NOW + GRACE_PERIOD). Used byLoginManagerand potentially the proactive token checker to find users needing re-login due to impending expiry. Firestore supports range filters on one field per composite index.
- Composite index on
ApplicationLogsCollection:- Indexes on
user_idandapplied_at_timestampwould be beneficial for querying logs for specific users or time ranges. - Index on
statusfor analyzing success/failure rates.
- Indexes on
SystemConfigurationCollection:- Single-field indexes on document IDs are sufficient for the current access patterns.
Firestore automatically creates single-field indexes. Composite indexes need to be created manually via the Firebase console or firebase.json if not prompted automatically by query errors.
The LoginManager.py script is a crucial component responsible for maintaining active login sessions for users on the Target Job Website and pre-creating applications for efficient job application submission. It automates the often complex login process, which may involve multi-factor authentication (OTP) and CAPTCHA challenges, and then strategically pre-creates applications for job types that match the user's preferences.
- Deployment: Runs on a user's local machine (Windows, macOS, or Linux with a desktop environment).
- Reasoning: This is due to its reliance on Selenium WebDriver, which automates a real web browser (Google Chrome). CAPTCHA solving extensions and the visual feedback for debugging OTP/login issues are more effectively managed in an environment with a GUI. Running this on a headless server would be significantly more complex and brittle.
- Execution: Designed to run periodically (e.g., via a cron job, Task Scheduler, or a simple
while Trueloop withtime.sleep()). The script itself implements a continuous loop that runs every 4 minutes by default.
The script operates in a main loop:
-
Initialization:
- Loads environment variables (e.g.,
FIREBASE_SERVICE_ACCOUNT_KEY_PATH). - Initializes the Firebase Admin SDK and Firestore client.
- Fetches the
GlobalJobIdMapfrom Firestore for use in pre-creation logic. - Enters a
while Trueloop for continuous operation.
- Loads environment variables (e.g.,
-
User Identification (
get_users_needing_loginfunction):- Queries the
UserJobProfilescollection in Firestore for users matching two criteria:- Users with
status == "LOGIN_REQUIRED". - Users with
status == "ACTIVE"ANDlogin_expiry_timestamp <= (current_time_utc + PROACTIVE_RELOGIN_WINDOW_SECONDS). This proactively identifies users whose sessions are about to expire.
- Users with
- Converts the stream of Firestore documents to a list.
- Returns a list of user data dictionaries.
- Queries the
-
Per-User Login Processing (within
mainfunction loop):- If no users need processing, sleeps and continues the main loop.
- For each identified user:
- Updates the user's status in Firestore to
LOGIN_IN_PROGRESS. - WebDriver Setup (
setup_chrome_driver):- Configures Chrome options:
- Disables automation-controlled flags (
--disable-blink-features=AutomationControlled). - Disables infobars.
- Excludes "enable-automation" switch.
- Sets
useAutomationExtensiontoFalse. - Configures preferences to disable password manager and enable JavaScript.
- Loads the CAPTCHA solver extension (
NopeCHA) from the localextensions/guidirectory. - Optionally runs in headless mode if
RUN_HEADLESSenvironment variable is "true" (though typically run with GUI for reliability).
- Disables automation-controlled flags (
- Instantiates
webdriver.Chromewith these options.
- Configures Chrome options:
- Login Sequence (
perform_login_sequence):- Retrieves user's
emailandpinfrom the provideduser_data. - Navigates to the CAPTCHA solver extension's configuration page (
CONFIG["CAPTCHA_SOLVER_CONFIG_URL"]) to ensure it's active. - Navigates to the Target Job Website's login URL.
- Checks if already logged in by verifying the current URL.
- Handles cookie consent pop-ups (attempts to press Escape).
- Enters email, clicks continue.
- Enters PIN, clicks continue.
- Handles radio button selection for OTP delivery method (if present).
- Clicks "Send Verification Code" and records the current UTC milliseconds (
send_code_utc_ms) for OTP validation. - CAPTCHA Handling (
wait_for_captcha_solve):- Checks for the presence of the
awswaf-captchashadow DOM element. - If found, waits for the CAPTCHA solver extension to remove the CAPTCHA modal (up to a timeout).
- Checks for the presence of the
- OTP Retrieval (
get_latest_otp_with_retry):- Calls the OTP API (
temporamail.co/api/otp?tokenName={user_token}) with retries. - Validates that the OTP timestamp (part of the API response) is recent enough relative to
send_code_utc_ms.
- Calls the OTP API (
- Enters the retrieved OTP into the OTP input field.
- Clicks continue after OTP (if the button exists).
- Success Verification: Waits for the URL to contain the job search page partial URL (
CONFIG["JOB_WEBSITE_JOBSEARCH_URL_PARTIAL"]). - Data Extraction (on success):
- Executes JavaScript to get
localStorage.getItem('accessToken')andlocalStorage.getItem('bbCandidateId'). - Gets all browser cookies using
driver.get_cookies().
- Executes JavaScript to get
- Retrieves user's
- New: Pre-creation Logic (
process_pre_creation_for_user):- Fetches fresh user data from Firestore after successful login
- Analyzes user's location and hour preferences against the
GlobalJobIdMap - For each matching job type:
- Calls
_make_create_application_callwithscheduleId: null - Stores successful application IDs in
pre_created_applications
- Calls
- Updates Firestore with the pre-created application map
- Handles specific errors like
APPLICATION_ALREADY_EXIST_CAN_BE_RESET
- Error Handling: Enhanced to include pre-creation failures and specific error codes
- Driver Teardown:
driver.quit()is called in afinallyblock to ensure the browser closes.
- Updates the user's status in Firestore to
- A short random delay is introduced between processing users.
-
Loop Continuation:
- After processing all users in the current batch, sleeps for 4 minutes (or
CONFIG["LOGIN_CHECK_INTERVAL_SECONDS"]). - Handles
KeyboardInterruptfor graceful shutdown. - Catches any other exceptions in the main loop, logs them, and waits before retrying.
- After processing all users in the current batch, sleeps for 4 minutes (or
- Chrome Options: The script uses several Chrome options to make Selenium less detectable and to configure the browser environment (see
setup_chrome_driver). - CAPTCHA Solver Extension: It relies on a pre-loaded Chrome extension (specified by
CONFIG["CAPTCHA_SOLVER_EXTENSION_PATH"], e.g., NopeCHA) to automatically solve CAPTCHAs encountered during login. The script navigates to the extension's configuration URL (CONFIG["CAPTCHA_SOLVER_CONFIG_URL"]) to ensure its settings are applied. - The
wait_for_captcha_solvefunction specifically looks for anawswaf-captchaelement and waits for its shadow DOM content (the CAPTCHA modal) to disappear, indicating the extension has solved it.
- The
get_latest_otp_with_retryfunction is responsible for fetching OTPs. - It makes HTTP GET requests to an external API endpoint:
https://temporamail.co/api/otp?tokenName={token}.{token}is derived from the user's profile (fieldtokeninUserJobProfiles).
- The API response is expected to be JSON, containing the OTP and a timestamp (e.g.,
{"otp": "123456_1678886400000", "exists": true}). - The script parses the OTP and its timestamp.
- Timestamp Validation: Crucially, it checks if the OTP's timestamp is greater than or equal to (
send_code_utc_ms-tolerance_ms). This ensures that an old, potentially re-used OTP is not accepted if the "Send Code" button was clicked multiple times. - Retries: The function retries fetching the OTP
max_retriestimes with await_timeinterval if an OTP is not found or is too old.
- Extracted Data:
accessToken(fromlocalStorage) ->website_access_tokenbbCandidateId(fromlocalStorage) ->candidate_id- Browser cookies (
driver.get_cookies()) ->website_cookies(stored as a JSON string of the list of cookie objects)
- Expiry Calculation:
login_expiry_timestampis calculated asdatetime.now(timezone.utc) + timedelta(seconds=CONFIG["SESSION_DURATION_SECONDS"]), then converted to epoch milliseconds. The default duration is 1 hour and 42 minutes (1.70 hours). - Storage: All session data is written to the corresponding user's document in the
UserJobProfilescollection in Firestore.
New Feature: After successful login, the system now pre-creates applications for job types that match the user's preferences.
Process:
- Preference Analysis: Compares user's
location_preferencesandjob_hours_preferencesagainst theGlobalJobIdMap - Matching Logic:
- Location: Matches if user has "ANY" in preferences OR specific location matches
- Hours: Handles "ANY", "PART_TIME_OR_FLEX", and specific hour type matching
- API Calls: For each match, calls
create-applicationAPI with:jobId: From the GlobalJobIdMapscheduleId: null (allows scheduling flexibility)activeApplicationCheckEnabled: true
- Storage: Successful application IDs stored in user's
pre_created_applicationsfield - Error Handling:
APPLICATION_ALREADY_EXIST_CAN_BE_RESET: Sets user toLOGIN_FAILEDstate- Auth failures: Sets user to
LOGIN_REQUIREDfor retry
Benefits:
- Reduces job application latency by 50-200ms
- Enables immediate application updates when jobs are discovered
- Provides better error handling for application conflicts
- Logging (
log_stepfunction):- Provides timestamped, color-coded console output.
- Log levels:
INFO,DEBUG,WARNING,ERROR,CRITICAL. - Uses ANSI escape codes for colors.
- Error Handling:
- Extensive
try-exceptblocks cover Selenium errors, network issues, API errors, and OTP retrieval failures. - Detailed error messages, including tracebacks for critical errors, are logged.
- User status in Firestore is updated to
LOGIN_FAILEDupon critical failures during their processing, along with alast_error_message. - The main loop has a catch-all exception handler to prevent the script from crashing entirely, logging the error and continuing after a delay.
- Extensive
The AppIdApplicationOrchestrator.py is the enhanced core engine of the job application system. It runs continuously, identifying new job opportunities and rapidly applying for them on behalf of all eligible users using optimized algorithms and pre-created applications. It is designed for high concurrency, low latency, and dynamic learning of new job types.
- Deployment: Runs on a headless Linux server, typically an AWS EC2 instance.
- Reasoning: Its tasks are I/O-bound (network requests) and benefit from an
asyncio-based architecture. No GUI is required. A cloud server provides the necessary uptime and network connectivity. - Execution: Intended to be run as a persistent service, managed by a process manager like
systemdorsupervisor.
Enhanced from previous version with indexed data structures for O(1) user lookup:
- Purpose: Provide microsecond-level access to active user data and enable instant user-job matching through preference indexes.
- Structure: Three interconnected data structures:
user_details_cache:Dict[str, Dict[str, Any]]- Full user data by user_idlocation_to_users_index:Dict[str, Set[str]]- Maps locations to user_id setshours_to_users_index:Dict[str, Set[str]]- Maps hour preferences to user_id sets
- New Cached Data:
pre_created_applications: Pre-created application IDs for faster submissionprocessed_location_preferences: Normalized uppercase location listprocessed_job_hours_preference: Normalized uppercase hour preference
- Index Management:
_add_user_to_indexes(): Adds user to preference indexes_remove_user_from_indexes(): Removes user from preference indexes- Automatically maintained during cache updates
Optimized from O(n) to O(1) complexity:
Previous Approach:
For each job:
For each user: # O(n) users
Check location match
Check hour preference match
New Approach:
For each job:
candidate_users = location_to_users_index[job_location] ∪ location_to_users_index["ANY"] # O(1)
For each schedule_category in [FLEX_TIME, FULL_TIME, PART_TIME]:
matching_hour_users = hours_to_users_index[relevant_hour_prefs] # O(1)
final_candidates = candidate_users ∩ matching_hour_users # O(min(sets))
Performance Impact: For typical scenarios with 100+ users and 10+ jobs, this reduces matching time from ~100ms to ~1ms per job.
Enhanced with pre-created application support:
-
execute_api_calls(Enhanced):- New Parameter:
pre_existing_application_id(optional) - Optimization: If pre-existing ID provided, skips create API call entirely
- Fallback: If no pre-existing ID, uses original create-then-update flow
- Error Handling: Specific handling for
APPLICATION_ALREADY_EXIST_CAN_BE_RESETin orchestrator context
- New Parameter:
-
apply_for_job_and_update_status(Enhanced):- Checks user's
pre_created_applicationsfor existing application ID - Passes pre-existing ID to
execute_api_callsif available - Handles new error status
ALREADY_EXISTS_ADVANCED_ORCHfrom orchestrator's create attempts
- Checks user's
New Feature: Dynamic learning and management of job type mappings.
Components:
load_or_refresh_global_job_id_map(): Loads map from Firestore on startup and periodicallyderive_representative_hour_type(): Analyzes schedules to determine job's hour typeupdate_global_job_id_map_in_firestore(): Transactionally updates Firestore with new mappings
Dynamic Learning Process:
- When orchestrator encounters unknown
job_id:- Fetches all schedules for the job
- Derives representative hour type (FLEX_TIME, PART_TIME, FULL_TIME)
- Creates structured key:
JOB_CITY_UPPER-DERIVED_HOUR_TYPE - Asynchronously updates GlobalJobIdMap in Firestore
- Updates local cache for immediate use
Conflict Resolution:
- Transactional updates prevent race conditions
- Existing mappings are preserved (first-come-first-serve)
- Detailed logging for conflict investigation
Enhanced with comprehensive performance monitoring:
- New PERF Log Level: Dedicated category for performance metrics
- Timing Coverage:
load_or_refresh_global_job_id_map: GlobalJobIdMap operationsupdate_global_job_id_map_in_firestore: Firestore update operationsprocess_single_job_concurrently: User matching performance- All existing API and WebSocket timing logs
- Enhanced Error Context: All error messages now include source context (e.g., "Orchestrator:", "LoginManager:")
- Python: Version 3.9 or higher.
- Google Cloud Project: With Firestore enabled.
- Firebase Service Account Key: Download the JSON key file.
- Google Chrome: Installed (for
LoginManager.py). - ChromeDriver: Corresponding version to your Chrome, placed in your system's PATH or specified directly.
- CAPTCHA Solver Extension: (e.g., NopeCHA) downloaded and placed in the
extensions/guidirectory. - Required Python Packages: Install using
pip install -r requirements.txt. Therequirements.txtshould include:firebase-admingoogle-cloud-firestoreseleniumpython-dotenvaiohttpwebsocketsrequests(forLoginManager's OTP client)
- AWS Account & EC2 Instance: (For
AppIdApplicationOrchestrator.py) A Linux EC2 instance configured with Python and necessary permissions (e.g., IAM role for Firestore access if not using service account key directly on EC2).
- Ensure all prerequisites for
LoginManager.pyare met on your local machine. - Create a
.envfile in the project root and addFIREBASE_SERVICE_ACCOUNT_KEY_PATH. - New Step: Initialize the GlobalJobIdMap by running
python tests/AddGlobalJobIdMap.pyonce to populate the system with initial job type mappings. - Populate Firestore
UserJobProfileswith user data (email, pin, preferences, initial statusLOGIN_REQUIRED). - Run the script from the project root:
python src/LoginManager.py - Monitor the console output for login progress, pre-creation activity, and errors.
- Set up an AWS EC2 instance with Python and required packages.
- Copy the project files to the EC2 instance.
- Place the Firebase service account key JSON file on the EC2 instance and update the
.envfile with its path. Or, configure an IAM role for the EC2 instance with Firestore permissions. - Set
TARGET_COUNTRYin the.envfile if not "CA". - Updated: Run the enhanced orchestrator:
python src/AppIdApplicationOrchestrator.py - For production, use a process manager like
systemdorsupervisorto ensure the script runs continuously and restarts on failure.
- Monitor Performance Logs: The new PERF logging provides detailed timing information. Monitor for performance degradation over time.
- GlobalJobIdMap Monitoring: Periodically review the GlobalJobIdMap for completeness and accuracy. New job types should be automatically discovered and added.
- Pre-creation Success Rates: Monitor LoginManager logs for pre-creation success rates and APPLICATION_ALREADY_EXIST_CAN_BE_RESET errors.
- Cache Performance: Monitor the indexed cache performance logs to ensure O(1) user lookup performance is maintained.
- Firestore Costs: The new system generates additional Firestore operations for GlobalJobIdMap management and pre-created application storage. Monitor costs accordingly.
- API Changes: [Same as before, with additional monitoring for the create-application API behavior with null scheduleId]
Enhanced from previous roadmap:
-
Advanced Pre-creation Strategies:
- Machine learning models to predict which job types are most likely to appear
- Batch pre-creation optimization during low-traffic periods
- Intelligent application ID recycling and management
-
GlobalJobIdMap Enhancements:
- Multi-region support (US, UK, etc.) with separate mapping documents
- Automated job type validation and cleanup
- Integration with job posting analytics for predictive mapping
-
Performance Optimization:
- Redis integration for ultra-fast cache operations
- Advanced concurrency patterns with worker pools
- WebSocket connection pooling and reuse
-
Enhanced Monitoring & Analytics:
- Real-time dashboards for pre-creation success rates
- Performance trend analysis and alerting
- User success rate analytics and optimization recommendations
-
Admin Web Application (Enhanced):
- Pre-creation management and troubleshooting tools
- GlobalJobIdMap administration interface
- Performance monitoring and optimization dashboard
- Advanced user preference analysis and recommendation engine