Fix #630: Prevent duplicate HTTP task execution via status update and lock renewal #681
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #630
What
Prevents duplicate execution of long-running HTTP tasks (and other async system tasks) by:
Persisting task status as IN_PROGRESS before blocking execution begins.
Renewing the workflow lock during long-running system task execution.
Why
Duplicate HTTP task execution was caused by two core issues:
Task status remained SCHEDULED in the database during execution Since status was only persisted after the blocking HTTP call completed, WorkflowRepairService incorrectly detected the task as stuck and re-queued it.
Workflow lock expired during long-running execution The default 60-second lease expired while HTTP tasks ran for several minutes, allowing other workers to acquire the same workflow and execute the task again.
This resulted in:
Duplicate outbound HTTP requests
Concurrent workflow decisions
Inconsistent workflow state
Fixes
For async system tasks that actually block, the task is now marked IN_PROGRESS and persisted before invoking systemTask.start():
if (systemTask.isAsync() && systemTask.isAsyncComplete(task)) {
task.setStatus(TaskModel.Status.IN_PROGRESS);
task.setWorkerId(Utils.getServerId());
executionDAOFacade.updateTask(task);
}
systemTask.start(workflow, task, workflowExecutor);
This prevents premature re-queueing by WorkflowRepairService.
A periodic lock renewal mechanism is added using ScheduledExecutorService, following the same watchdog pattern used by Redisson distributed locks. This implements the long-term solution explicitly suggested by maintainers in PR Fix duplicate HTTP task execution via WorkflowRepairService race condition #633:
“Longer term – we should extend the lease while working on the long running system tasks.”
Locks are renewed at a fixed interval (half of the lease time) while the task is executing, and safely released in finally to prevent leaks.
Testing
Manual
Long-running HTTP task (120s+ delay)
Verified:
Single execution (previously executed up to 4x)
No re-queue during execution
No workflow lock expiration
Unit Tests
Updated existing Spock test:
AsyncSystemTaskExecutorTest.groovy
Adjusted expectations for:
Early IN_PROGRESS persistence
Additional updateTask() call
All core tests pass:
./gradlew :conductor-core:test
Notes
This is my first contribution to Conductor. The lock renewal implementation directly follows standard distributed lock watchdog patterns and maintainer guidance from prior reviews. Happy to refine based on feedback.
PR Checklist
./gradlew :conductor-core:test)