[telemetry] Add retry logic for partial telemetry push failures#1224
[telemetry] Add retry logic for partial telemetry push failures#1224akgitrepos wants to merge 4 commits intodatabricks:mainfrom
Conversation
|
@gopalldb @vikrantpuppala Hi, I've implemented retry logic for partial telemetry push failures which was one of the TODO item. Would appreciate a review. Thanks! |
samikshya-db
left a comment
There was a problem hiding this comment.
Thanks for the great addition here, @akgitrepos ! 🥳
Added a few comments.
|
|
||
| @Override | ||
| public void pushEvent(TelemetryRequest request) throws Exception { | ||
| pushEventWithRetry(request, maxRetries); |
There was a problem hiding this comment.
nit : can you move the contents of pushEventWithRetry in this function itself?
There was a problem hiding this comment.
Refactored the recursive pushEventWithRetry into pushEvent with an iterative approach.
| private static final int DEFAULT_MAX_RETRIES = 3; | ||
| private static final long INITIAL_BACKOFF_MS = | ||
| 1000; // 1 second - matches DatabricksHttpRetryHandler | ||
| private static final long MAX_BACKOFF_MS = 10000; // 10 seconds - matches codebase standard |
There was a problem hiding this comment.
The best way would be to extract calculateExponentialBackoff and doSleepForDelay from DatabricksHttpRetryHandler as an util, and reuse it here.
The reason I was saying this is because the backoff logic and thread interruption logic is more mature there. (Also to honor DRY)
There was a problem hiding this comment.
Created RetryUtil shared utility class. Both DatabricksHttpRetryHandler and TelemetryPushClient now reuse the same backoff calculation and thread-sleep logic.
Implements retry logic for partial telemetry push failures. When the telemetry service returns a partial success response, the client now automatically retries up to 3 times with exponential backoff. This addresses the TODO comment in TelemetryPushClient.java. Signed-off-by: Akshey Sigdel <sigdelakshey@gmail.com> # Conflicts: # src/test/java/com/databricks/jdbc/telemetry/TelemetryPushClientTest.java
Signed-off-by: Akshey Sigdel <sigdelakshey@gmail.com>
cf71661 to
576535f
Compare
|
@samikshya-db Any additional feedback on this? |
|
This PR has been marked as Stale because it has been open for 30 days with no activity. If you would like the PR to remain open, please remove the stale label or comment on the PR. |
Description
Implements retry logic for partial telemetry push failures in
TelemetryPushClient. When the telemetry service returns a partial success response (numProtoSuccess < totalEvents), the client now automatically retries up to 3 times with exponential backoff (1s → 2s → 4s, max 10s).This addresses the TODO comment in
TelemetryPushClient.java.Changes Made
maxRetriesparameter (default: 3)DatabricksHttpRetryHandler)Testing
pushEvent_noRetryOnFullSuccess- Verifies no retry when all events succeedpushEvent_retriesOnPartialSuccess- Verifies retry is triggered on partial failureAdditional Notes