feat(optimizer): [3/N] Analyzer#533
Merged
Merged
Conversation
Introduces the optimizer service module with: - MySQL/H2 schema for table_operations, table_stats, table_stats_history, and table_operations_history - JPA entities with JSON column support (vladmihalcea hibernate-types) - All model/DTO/enum types: OperationType, OperationStatus, TableStats, CompleteOperationRequest, JobResult, OperationMetrics, etc. - JPA AttributeConverters for JobResult and OperationMetrics JSON columns - MapStruct mapper (OptimizerMapper) for entity→DTO conversion - Spring Boot application shell and build wiring (settings.gradle, build.gradle dockerPrereqs) No repositories, controllers, or service layer yet — those follow in subsequent PRs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove OperationMetrics class and converter; stats are read directly from table_stats instead of duplicating into operations - Remove orphanFilesDeleted/orphanBytesDeleted from history entity, DTO, and schema; operation-specific data belongs in the result JSON - Add addedSizeBytes to CommitDelta for tracking write volume - Fix OperationType javadoc to describe current state, not roadmap - Fix TableOperationsHistoryRow javadoc: written on operation complete, not by Spark app directly - Add field comments to all DTOs and request objects Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spring Data JPA repositories for all four optimizer tables with filtered query support. Includes tests exercising save/find, filtered queries, upsert semantics, and append-only history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address PR review comments: rename findFiltered → find across all repos, remove redundant findByTableUuid/findByTableUuidSince from history repos, add explicit assertion to context test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Service interface and implementation for all optimizer CRUD operations including complete-operation lifecycle, stats upsert with history double-write, and filtered queries. Three REST controllers expose the endpoints. The apps/optimizer shared module provides lightweight entity/repo copies for the analyzer and scheduler apps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Align OptimizerDataServiceImpl with renamed repository methods from optimizer-1 review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shared JPA entities and repositories for optimizer apps (analyzer, scheduler). All repos expose a single find method with optional filters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve repo conflicts by taking optimizer-1's clean find-only versions. Scheduler-specific methods and streamAll removed per review feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These fields never belonged in the data model — remove them at the source rather than adding then deleting in a later PR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Propagate CompleteOperationRequest orphan field removal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Propagate CompleteOperationRequest orphan field removal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
H2 integration tests for OptimizerDataServiceImpl covering completeOperation (write history, not-found) and upsertTableStats (create, update, history append). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strengthen upsertTableStats test to verify history rows contain the raw delta stats from each call, not just the row count. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ling Introduces apps/optimizer-analyzer, a Spring Boot CommandLineRunner that evaluates every table in table_stats against pluggable OperationAnalyzer strategies. The first strategy, OrphanFilesDeletionAnalyzer, schedules OFD operations with 24h success / 1h failure retry cadence, a 6h SCHEDULED timeout, and a 5-strike circuit breaker. Key design choices: - Bulk-loads operations and history into maps (one query per type), then iterates the stats list — O(types) queries, not O(tables). - Uses the existing generic find() repository methods with null params. - Pure unit tests with Mockito — no Spring context needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
mkuchenbecker
commented
Apr 7, 2026
- AnalyzerRunner.analyzeDatabase: @transactional; try/catch+continue per table save; per-table log dropped to DEBUG, aggregate INFO per database; comment on dual defaultLimit (Mockito tests vs @value); replace TODO(scale-test) inline comment with link to BDP-102738. - CadencePolicy: extract intervalFor(status) with exhaustive switch + throwing default so new HistoryStatusDto values surface explicitly. - CadenceBasedOrphanFilesDeletionAnalyzer: drop redundant @Autowired on the single constructor. - CadenceBasedOrphanFilesDeletionAnalyzerTest: rename SUCCESS_INTERVAL/ FAILURE_INTERVAL to TEST_*; type tableWithProperty as boolean; drop redundant tableWithoutProperty (covered by _whenTablePropertiesEmpty). - TableOperationsHistoryDto: new after() method (replaces ad-hoc AnalyzerRunner::moreRecentHistory comparator); trusts service-set completedAt invariant. - application.properties: ofd.success-retry-hours 24 -> 16 so a re-eval is guaranteed within any rolling 24h window regardless of when the prior run landed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Member
|
Regarding the directory structure, can we make follow two options:
|
Address PR #533 review (abhisheknath2011): > Root project can be apps/optimizer/analyzer Move: - apps/optimizer-analyzer/ → apps/optimizer/analyzer/ - package com.linkedin.openhouse.analyzer → com.linkedin.openhouse.optimizer.analyzer - Gradle module :apps:optimizer-analyzer → :apps:optimizer:analyzer Same shape will land on opt-4 (apps/optimizer-scheduler → apps/optimizer/ scheduler) in a follow-up. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mkuchenbecker
added a commit
that referenced
this pull request
May 26, 2026
…duler Brings the analyzer rename from opt-3 (apps/optimizer-analyzer → apps/optimizer/analyzer, package com.linkedin.openhouse.analyzer → com.linkedin.openhouse.optimizer.analyzer). Applies the same shape to the scheduler per PR #533 review: - apps/optimizer-scheduler/ → apps/optimizer/scheduler/ - package com.linkedin.openhouse.scheduler → com.linkedin.openhouse.optimizer.scheduler - Gradle module :apps:optimizer-scheduler → :apps:optimizer:scheduler settings.gradle conflict resolved to include both nested modules.
…nalyzer Per PR #533 review (abhisheknath2011, 2026-05-26): > As analyzer is going to a service with APIs exposed, can we move the > code to the services module instead of keeping under apps module? Move: - apps/optimizer/analyzer/ → services/optimizer/analyzer/ - Gradle module :apps:optimizer:analyzer → :services:optimizer:analyzer - Package unchanged (com.linkedin.openhouse.optimizer.analyzer) Same shape will land on opt-4 (apps/optimizer/scheduler → services/ optimizer/scheduler). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Member
Somehow I don't see previous comment to move Analyzer from |
Collaborator
Author
|
PTAL @abhisheknath2011 Comment is here: https://github.com/linkedin/openhouse/pull/533/changes#r3262568853 |
…per-database Per PR #533 review (abhisheknath2011): > Is this related to max default number of tables to be processed by on > execution cycle of Analyzer run? Can we update the doc here and reflect > the same on the property name as well? Yes — it bounds the per-database working set across the three pre-load reads. Rename for clarity: - Property: optimizer.repo.default-limit → analyzer.max-tables-per-database - Field: defaultLimit → maxTablesPerDatabase - Drop the Mockito-flavored implementation note; replace with a javadoc describing semantics (per-database cap; tables beyond the bound defer to the next cycle). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per PR #533 review (abhisheknath2011): > Going by the paging default limit of 10k, the DBs with more that 10k > tables would bot be processed. Can we ensure that all the tables within > a given DB are process by adjusting the pagination list until we reach > the end? Previous code fetched page 0 only — tables past index 10000 were silently dropped, not "deferred to the next cycle" as the stale doc claimed. Now analyzeDatabase iterates pages until the page comes back smaller than the page size (terminates on partial or empty page). - Wrap the read + per-table loop in a while-pageNumber loop. - Re-load currentOps and latestHistory per page (per-page bound on the in-memory maps; affected tables whose ops fall in a different page get treated as "no current op" → may produce a duplicate PENDING that the scheduler's cancelDuplicates handles). - Rename property: analyzer.max-tables-per-database → analyzer.tables-page-size (it caps a page, not the cycle) - Rename field: maxTablesPerDatabase → tablesPageSize - New test: analyze_iteratesAllPages_processesEveryTableAcrossPageBoundary exercises 3 tables across 2 pages with page size 2. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…se/optimizer/analyzer/AnalyzerRunner.java Co-authored-by: Abhishek Nath <anath1@linkedin.com>
mkuchenbecker
commented
May 26, 2026
c3e782b to
21dbf34
Compare
…ps/optimizer/analyzerapp (#602) ## Status **(WIP)** — inspection branch on top of `mkuchenb/optimizer-3` (PR #533). Splits the analyzer into library + deployable. ## Summary Library/deployable split: - `services/optimizer/analyzer/` — library (analysis logic; `AnalyzerRunner`, `OperationAnalyzer`, `CadencePolicy`, `CadenceBasedOrphanFilesDeletionAnalyzer`) - `apps/optimizer/analyzerapp/` — deployable Spring Boot wrapper (just `AnalyzerApplication` + `application.properties`) The analysis code stays in `services/`; only the `@SpringBootApplication` entry point moves to `apps/`. ## Why `analyzerapp` and not `analyzer` as the leaf name Two Gradle leaf projects both named `analyzer` — one at `:services:optimizer:analyzer`, one at `:apps:optimizer:analyzer` — produced a self-referential `compileJava → compileJava` cycle. Disambiguating the apps leaf to `analyzerapp` avoids it. The exact Gradle mechanism wasn't pinned down (could be the leaf collision, could be the also-colliding implicit parent at `:apps:optimizer` vs explicit `:services:optimizer`), but the rename is the smallest change that makes it build. ## Build changes `services/optimizer/analyzer/build.gradle`: - `api project(':services:optimizer')` so `OperationTypeDto`, repos, etc. are visible on consumers' compile classpath. - `bootJar { enabled = false }` — no `@SpringBootApplication` here anymore. - `jar.archiveClassifier = ''` so the library publishes `analyzer.jar`. `apps/optimizer/analyzerapp/build.gradle` (new): - `openhouse.springboot-ext-conventions` + Spring Boot 2.7.8 plugins. - `implementation project(':services:optimizer:analyzer')` + Spring Boot starter + JPA + MySQL driver. `settings.gradle`: `include ':apps:optimizer:analyzerapp'`. ## Testing - `./gradlew :services:optimizer:analyzer:test` — 20 tests green. - `./gradlew :apps:optimizer:analyzerapp:bootJar` — produces `build/analyzerapp/libs/analyzerapp.jar`. ## What does NOT change - Package: every class keeps `com.linkedin.openhouse.optimizer.analyzer`. `AnalyzerApplication`'s `@SpringBootApplication` component-scan picks up the library classes because they sit in the same package, just in a different module on the classpath. - The scheduler (opt-4) is unchanged in this PR. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: mkuchenbecker <mkuchenbecker@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
abhisheknath2011
approved these changes
May 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Optimizer Stack
Summary
PR 3 of N in the optimizer stack.
Introduces
apps/optimizer-analyzer, a Spring Boot CommandLineRunner that evaluates every table intable_statsagainst pluggableOperationAnalyzerstrategies. The first strategy,OrphanFilesDeletionAnalyzer, schedules OFD operations with 24h success / 1h failure retry cadence, a 6h SCHEDULED timeout, and a 5-strike circuit breaker.Key design choices:
find()repository methods with null params.Changes
Core:
AnalyzerRunner— loads table_stats, pre-loads operations and history into maps, evaluates each table against all analyzers, circuit breaker logic.Strategy interface:
OperationAnalyzer—isEnabled(table),shouldSchedule(table, currentOp, latestHistory),getCircuitBreakerThreshold().Cadence policy:
CadencePolicy— encapsulates time-based retry logic shared across operation types.OFD analyzer:
OrphanFilesDeletionAnalyzer— enabled viamaintenance.optimizer.ofd.enabledtable property.Testing Done
25 unit tests:
AnalyzerRunnerTest(7 tests) — eligible table insertion, cadence skip, disabled table, shouldSchedule=false, null UUID, circuit breaker trip, below-threshold passOrphanFilesDeletionAnalyzerTest(18 tests) — isEnabled variants, shouldSchedule for no-op/PENDING/SCHEDULING/SCHEDULED with history combinationsAdditional Information