Conversation
Hello benzekrimaha,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
380069a to
25ea9d5
Compare
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (71.57%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files
... and 3 files with indirect coverage changes
@@ Coverage Diff @@
## development/9.3 #2723 +/- ##
===================================================
+ Coverage 74.48% 74.62% +0.14%
===================================================
Files 200 200
Lines 13603 13681 +78
===================================================
+ Hits 10132 10210 +78
Misses 3461 3461
Partials 10 10
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
8316f88 to
408c96c
Compare
408c96c to
e1c5b13
Compare
e1c5b13 to
aefb677
Compare
| const log = this.logger.newRequestLogger(); | ||
| const start = new Date(); | ||
| const start = Date.now(); | ||
| this._scanId = uuid(); |
There was a problem hiding this comment.
Hmm, we're storing the scan ID as a "global" field variable, but it sounds like it is really relevant/used only inside this function (through indirect calls). Could we drop the global field and instead pass it through to whatever uses it? Maybe in _createBucketTaskMessages?
extensions/lifecycle/bucketProcessor/LifecycleBucketProcessor.js
Outdated
Show resolved
Hide resolved
725c3df to
11a94ea
Compare
extensions/lifecycle/bucketProcessor/LifecycleBucketProcessor.js
Outdated
Show resolved
Hide resolved
extensions/lifecycle/bucketProcessor/LifecycleBucketProcessor.js
Outdated
Show resolved
Hide resolved
a2128cf to
a464b39
Compare
|
LGTM |
This commit adds new Prometheus metrics to track lifecycle conductor scan health: - s3_lifecycle_conductor_full_scan_elapsed_seconds: scan duration - s3_lifecycle_conductor_scan_count: buckets/workflows per scan - s3_lifecycle_bucket_processor_scan_start_time: active scan timestamp - s3_lifecycle_bucket_processor_buckets_count: processed buckets It also generates a unique conductorScanId (UUID) per conductor scan and passes it through all listing methods (zookeeper, mongodb, bucketd). Include scanId and bucketSource in Kafka messages via contextInfo. Record scan-level metrics (elapsed time, bucket count, workflow count, lifecycle bucket count) at the end of each full scan. In LifecycleBucketProcessor, it detects the scan boundaries by comparing the incoming conductorScanId and emits scan start/end metrics. Tracks per-bucket completion after successful task processing and resets the scan gauge when the internal task queue drains. We also moved onProcessBuckets() call to scan start and add onConductorFullScan() at scan end for accurate timing. We also use Date.now() instead of new Date() for consistent elapsed time calculations. Issue: BB-740
Here we forward conductorScanId from the conductor Kafka message contextInfo through all lifecycle task entries and action queue entries: - LifecycleTask: includes conductorScanId in contextInfo when creating bucket re-queue entries, version listing entries, and all expiration and transition action entries - LifecycleTaskV2: same propagation for v2 listing, re-queue, and action entries - LifecycleDeleteObjectTask, LifecycleUpdateExpirationTask, LifecycleUpdateTransitionTask: extracts conductorScanId from the action entry and add it as a log default field for end-to-end traceability This ensures every log line emitted during lifecycle processing can be correlated back to the originating conductor scan. Issue: BB-740
Here we add new Grafana dashboard panels to visualize lifecycle conductor scan health and bucket processor activity: - Conductor full scan duration (elapsed seconds) - Conductor scan counts (total buckets, lifecycle buckets, workflows) - Bucket processor scan start time with idle detection - Bucket processor buckets processed per scan Issue: BB-740
In this commit we add unit tests for the new LifecycleMetrics scan monitoring methods: - onConductorFullScan: verify elapsed seconds, bucket/workflow/ lifecycle_bucket counts are set correctly - onBucketProcessorScanStart/ScanEnd: verify scan timestamp gauge is set and reset to 0 - onBucketProcessorBucketDone: verify bucket counter is incremented - Error handling: verify errors are caught and logged LifecycleConductor unit tests were also updated to verify that: - conductorScanId is present in Kafka bucket-task messages - bucketSource is included in contextInfo - scanId is a valid UUID generated per scan BackbeatTestConsumer was updated to normalize conductorScanId and bucketSource fields for deep equality comparison in functional tests. Issue: BB-740
253df18 to
7a649cc
Compare
|
…dle signals The drain callback on _internalTaskScheduler fires every time the queue goes from non-empty to empty. Under light load this happens between Kafka message batches within the same conductor scan, prematurely resetting the scan start time gauge to 0 and setting _processorScanMetricsActive to false. Because subsequent messages carry the same conductorScanId, the new-scan branch is skipped and onBucketProcessorScanStart is never called again, leaving the Bucket Processor Scan Progress dashboard panel hidden for the remainder of the scan. Fix: replace the immediate reset with a 30-second debounced timeout. If new messages from the same scan arrive during the debounce window the timeout is cancelled, keeping the scan active. A new scan also cancels any pending debounce before performing a full metric reset. Issue: BB-740
Review by Claude Code |
extensions/lifecycle/bucketProcessor/LifecycleBucketProcessor.js
Outdated
Show resolved
Hide resolved
|
- Clear _scanEndTimeout in close() to prevent callback firing on a partially torn-down bucket processor instance - Propagate conductor's start timestamp (Date.now() at scan start) via contextInfo.scanStartTimestamp instead of using Date.now() in the bucket processor, so the scan progress dashboard accurately reflects the full scan duration - Document that bucketProcessorBucketsCount.reset() clears all label combinations (acceptable: single bucket source per deployment) - Update tests and BackbeatTestConsumer normalizer for the new scanStartTimestamp field
|
LGTM |
Issue: BB-740