Skip to content

Conversation

@Deepak1101100
Copy link

Fixes #630

What
Prevents duplicate execution of long-running HTTP tasks (and other async system tasks) by:
Persisting task status as IN_PROGRESS before blocking execution begins.
Renewing the workflow lock during long-running system task execution.

Why
Duplicate HTTP task execution was caused by two core issues:
Task status remained SCHEDULED in the database during execution Since status was only persisted after the blocking HTTP call completed, WorkflowRepairService incorrectly detected the task as stuck and re-queued it.

Workflow lock expired during long-running execution The default 60-second lease expired while HTTP tasks ran for several minutes, allowing other workers to acquire the same workflow and execute the task again.
This resulted in:
Duplicate outbound HTTP requests
Concurrent workflow decisions
Inconsistent workflow state

Fixes

  1. Persist IN_PROGRESS before blocking call
    For async system tasks that actually block, the task is now marked IN_PROGRESS and persisted before invoking systemTask.start():

if (systemTask.isAsync() && systemTask.isAsyncComplete(task)) {
task.setStatus(TaskModel.Status.IN_PROGRESS);
task.setWorkerId(Utils.getServerId());
executionDAOFacade.updateTask(task);
}
systemTask.start(workflow, task, workflowExecutor);

This prevents premature re-queueing by WorkflowRepairService.

  1. Workflow lock renewal for long-running tasks
    A periodic lock renewal mechanism is added using ScheduledExecutorService, following the same watchdog pattern used by Redisson distributed locks. This implements the long-term solution explicitly suggested by maintainers in PR Fix duplicate HTTP task execution via WorkflowRepairService race condition #633:

“Longer term – we should extend the lease while working on the long running system tasks.”

Locks are renewed at a fixed interval (half of the lease time) while the task is executing, and safely released in finally to prevent leaks.

Testing
Manual
Long-running HTTP task (120s+ delay)
Verified:
Single execution (previously executed up to 4x)
No re-queue during execution
No workflow lock expiration

Unit Tests
Updated existing Spock test:
AsyncSystemTaskExecutorTest.groovy

Adjusted expectations for:
Early IN_PROGRESS persistence
Additional updateTask() call

All core tests pass:
./gradlew :conductor-core:test

Notes
This is my first contribution to Conductor. The lock renewal implementation directly follows standard distributed lock watchdog patterns and maintainer guidance from prior reviews. Happy to refine based on feedback.

PR Checklist

  • [✔] Bug reproduced locally
  • [✔] Root cause identified
  • [✔] Fix implemented
  • [✔] Existing unit tests updated
  • [✔] All core tests passing ./gradlew :conductor-core:test)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Http Task executed twice after COMPLETED status [read thread for workaround]

1 participant