Add onprem-data-verifier tool by firat-kaya · Pull Request #31 · looker-open-source/customer-scripts

firat-kaya · 2026-02-19T13:39:07Z

This commit introduces a new Go application, , designed to validate Looker on-premise backup artifacts before migration to Looker Cloud. The tool performs several checks, including:

Workspace structure verification
MD5 checksum integrity checks
GPG key validation against the customer's LUID
SQL dump analysis (version, charset, extended inserts, critical tables)
CMK validation (raw and base64) Includes unit tests for various components.

…tion, , designed to validate Looker on-premise backup artifacts before migration to Looker Cloud. The tool performs several checks, including: Workspace structure verification MD5 checksum integrity checks GPG key validation against the customer's LUID SQL dump analysis (version, charset, extended inserts, critical tables) CMK validation (raw and base64) Includes unit tests for various components. A file is also added to exclude IDE files.

gemini-code-assist · 2026-02-19T13:39:36Z

Summary of Changes

Hello @firat-kaya, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a critical new utility designed to enhance the reliability and security of Looker on-premise to cloud migrations. By systematically verifying backup artifacts, the tool aims to prevent common data integrity and security issues, ensuring a smoother transition process. It provides a structured, step-by-step validation workflow, outputting clear results and a comprehensive metadata report.

Highlights

New Tool Introduction: A new Go application, onprem-data-verifier, has been added to validate Looker on-premise backup artifacts before migration to Looker Cloud.
Comprehensive Validation Pipeline: The tool performs a series of checks including workspace structure, MD5 checksum integrity, GPG key validation, SQL dump analysis (version, charset, extended inserts, critical tables), and Customer Master Key (CMK) validation.
CLI Interface and Reporting: The tool features a Cobra-based command-line interface with required flags for backup directory, customer name, and LUID. It provides colored console output and generates a detailed JSON report (metadata.json).
Modular Design with Unit Tests: The validation logic is organized into distinct packages (cmd, logger, metadata, validator) and is thoroughly covered by unit tests for checksums, CMK, configuration, GPG, metadata, and SQL structure.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

.gitignore
- Added .idea directory to ignore IDE-specific files.
onprem-data-verifier/.idea/vcs.xml
- Added IntelliJ IDEA VCS mapping configuration for the new project.
onprem-data-verifier/README.md
- Added comprehensive documentation for the onprem-data-verifier tool, detailing its purpose, features, prerequisites, build instructions, usage, workspace requirements, validation pipeline, and output formats.
onprem-data-verifier/cmd/root.go
- Added the main Cobra command structure for the onprem-data-verifier CLI tool.
- Implemented flag parsing for backupDir, customerName, luid, and outputFile.
- Integrated the validator.Orchestrator to run the validation process and generate the final report.
- Updated error handling to use the custom logger package for consistent output.
onprem-data-verifier/go.mod
- Added Go module definition for onprem-data-verifier.
- Included dependencies for github.com/spf13/cobra, github.com/spf13/viper, and github.com/stretchr/testify.
onprem-data-verifier/go.sum
- Added Go module checksums for all direct and indirect dependencies.
onprem-data-verifier/logger/logger.go
- Added a new logger package to provide standardized, colored console output for different log levels (Step, Success, Warn, Error, Info, Title, Fatal, Completion).
- Implemented a formatBytes helper function for human-readable size output.
onprem-data-verifier/main.go
- Added the main entry point for the onprem-data-verifier application, calling cmd.Execute().
onprem-data-verifier/metadata/metadata.go
- Added a Report struct to define the structure of the validation output metadata.
- Implemented ToJSON method for serializing the report to JSON.
- Implemented Save method for writing the report to a specified file path, with directory validation and fallback logic.
onprem-data-verifier/tests/checksum_test.go
- Added unit tests for CalculateMD5 to verify MD5 hash generation for files.
- Added unit tests for ParseMD5Manifest to ensure correct parsing of MD5 manifest files, including handling absolute paths.
- Added unit tests for VerifyFile to confirm file integrity against expected checksums.
onprem-data-verifier/tests/cmk_validator_test.go
- Added unit tests for ValidateCMK to verify Customer Master Key formats (Raw and Base64) and handle invalid lengths or corrupt data.
onprem-data-verifier/tests/config_test.go
- Added unit tests for GetDefaultConfig to ensure the default validation policies, supported versions, and table checks are correctly loaded.
onprem-data-verifier/tests/gpg_verify_test.go
- Added unit tests for ParseGPGColons to extract GPG key IDs from GPG output.
- Added unit tests for ParseGPGColons_NotFound to handle cases where no keys are found in GPG output.
onprem-data-verifier/tests/metadata_test.go
- Added unit tests for metadata.Report JSON serialization and field mapping.
- Added unit tests for Report.Save to test saving to default, custom, and non-existent directories with fallback behavior.
onprem-data-verifier/tests/sql_structure_test.go
- Added unit tests for IsLookerVersionSupported to check version compatibility.
- Added comprehensive unit tests for AnalyzeSQLDump to verify extraction of Looker version, detection of extended inserts, charset, collation, and critical tables from SQL dumps, including gzipped files and edge cases.
onprem-data-verifier/validator/ValidationConfig.go
- Added ValidationConfig struct to define validation rules and policies.
- Implemented GetDefaultConfig to provide hardcoded default validation settings for supported Looker versions and critical tables.
onprem-data-verifier/validator/checksum.go
- Added CalculateMD5 function for streaming MD5 hash calculation.
- Implemented ParseMD5Manifest to read and normalize MD5 manifest files.
- Added VerifyFile function to compare file hashes against expected values.
onprem-data-verifier/validator/cmk_validator.go
- Added ValidateCMK function to check the validity and format (Raw or Base64) of Customer Master Keys.
onprem-data-verifier/validator/gpg_verify.go
- Added ParseGPGColons to extract GPG key IDs from gpg --list-keys output.
- Implemented GetKeyIDsFromEmail to query the local GPG keyring for keys associated with a specific email.
- Added VerifyRecipient to check if an encrypted file is encrypted for any of the valid GPG key IDs.
onprem-data-verifier/validator/sql_structure.go
- Added SQLAnalysisResult struct to store results from SQL dump analysis.
- Implemented AnalyzeSQLDump for a single-pass scan of SQL files (including gzipped) to extract Looker version, detect extended inserts, charsets, collations, and identify critical tables.
- Added IsLookerVersionSupported helper function for version compatibility checks.
onprem-data-verifier/validator/validator.go
- Added the Orchestrator struct to manage the overall validation workflow.
- Implemented NewOrchestrator for initializing the validator with customer-specific paths and configuration.
- Added the Run method to execute all validation steps sequentially, including workspace verification, integrity checks, GPG key validation, database analysis, CMK validation, and filesystem analysis.
- Implemented individual validation methods (validateDatabase, validateCMK, validateFileSystem, verifyWorkspace, verifyIntegrity, verifyGPGKeys) that update the internal report and log progress.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new Go application, onprem-data-verifier, to validate Looker on-premise backup artifacts. The tool is well-structured, with clear separation of concerns into validation logic, command-line interface, and logging. The inclusion of unit tests is also a great practice. My review focuses on improving correctness, code clarity, and adherence to best practices. I've provided suggestions to fix an invalid Go version, correct a broken example in the command's help text, use a standard timestamp format, and remove an unnecessary IDE file from version control.

onprem-data-verifier/go.mod

onprem-data-verifier/.idea/vcs.xml

onprem-data-verifier/cmd/root.go

onprem-data-verifier/validator/ValidationConfig.go

onprem-data-verifier/validator/validator.go

drstrangelooker

please address the Gemini issues then commit.

…tion, , designed to validate Looker on-premise backup artifacts before migration to Looker Cloud. The tool performs several checks, including: Workspace structure verification MD5 checksum integrity checks GPG key validation against the customer's LUID SQL dump analysis (version, charset, extended inserts, critical tables) CMK validation (raw and base64) Includes unit tests for various components. A file is also added to exclude IDE files.

…o on-prem-verify

firat-kaya · 2026-03-11T11:30:02Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new Go application, onprem-data-verifier, designed to validate Looker on-premise backup artifacts. The tool is well-structured with a clear separation of concerns and includes a comprehensive suite of validations and tests. However, two significant vulnerabilities were identified: a weak GPG recipient verification that can be bypassed by crafted GPG packets, and a potential Denial of Service vulnerability via gzip bombs. Addressing these issues will significantly improve the robustness and security of the migration process. Additionally, opportunities for improvement were noted in logging, output formatting, and the removal of IDE-specific files and unreachable code to refine the implementation and enhance user-facing output for clarity.

gemini-code-assist · 2026-03-11T12:15:56Z

onprem-data-verifier/validator/gpg_verify.go

+	for _, id := range validKeyIDs {
+		if strings.Contains(output, id) {
+			return true, nil
+		}
+	}


The VerifyRecipient function uses a weak check (strings.Contains(output, id)) to verify if a file is encrypted for a specific Key ID. This is insecure for several reasons:

Substring Matching: It can match substrings of other Key IDs (e.g., Key ID '1234' would match '0123456789ABCDEF'), leading to false positives.

Security Bypass: An attacker can bypass this check by crafting a GPG message that includes the target Key ID in a non-recipient field that gpg --list-packets displays, such as a literal data packet filename or a fake signature packet.

Empty ID Match: If the Key ID list contains an empty string (e.g., due to malformed GPG output), strings.Contains will always return true, bypassing the check for any file.

This allows an attacker to provide a malicious backup that appears to be encrypted for the correct recipient, bypassing a critical security control.

// Check if the file is encrypted for ANY of the valid IDs // Use a more robust check to ensure the ID is associated with a public key encryption packet for _, id := range validKeyIDs { if id == "" { continue } // Match the specific pattern for a recipient key ID in gpg --list-packets output pattern := fmt.Sprintf("keyid %s", id) if strings.Contains(output, pattern) { return true, nil } }

gemini-code-assist · 2026-03-11T12:15:56Z

onprem-data-verifier/.idea/vcs.xml

+<?xml version="1.0" encoding="UTF-8"?>
+<project version="4">
+  <component name="VcsDirectoryMappings">
+    <mapping directory="$PROJECT_DIR$/.." vcs="Git" />
+  </component>
+</project>


IDE-specific configuration files and directories (like .idea) should not be committed to the version control system. While .gitignore has been updated to exclude .idea, this directory was likely added before the ignore rule was in place. Please remove the entire .idea directory from the repository to maintain a clean project structure.

gemini-code-assist · 2026-03-11T12:15:56Z

onprem-data-verifier/validator/sql_structure.go

+	if strings.HasSuffix(filePath, ".gz") {
+		gzReader, err = gzip.NewReader(file)
+		if err != nil {
+			return nil, fmt.Errorf("failed to create gzip reader: %w", err)
+		}
+		defer gzReader.Close()
+		scanner = bufio.NewScanner(gzReader)


The AnalyzeSQLDump function uses gzip.NewReader to decompress SQL dumps without any limits on the decompression ratio or the total size of the decompressed data. An attacker can provide a specially crafted 'gzip bomb' (a small compressed file that expands to an enormous size) that can exhaust system resources (CPU and memory), leading to a Denial of Service (DoS) when the tool attempts to scan the file.

Suggested change

if strings.HasSuffix(filePath, ".gz") {

gzReader, err = gzip.NewReader(file)

if err != nil {

return nil, fmt.Errorf("failed to create gzip reader: %w", err)

}

defer gzReader.Close()

scanner = bufio.NewScanner(gzReader)

if strings.HasSuffix(filePath, ".gz") {

gzReader, err = gzip.NewReader(file)

if err != nil {

return nil, fmt.Errorf("failed to create gzip reader: %w", err)

}

defer gzReader.Close()

// Wrap the reader with a limit to prevent gzip bombs (e.g., 100GB limit)

// Note: Requires importing the "io" package

limitedReader := io.LimitReader(gzReader, 100*1024*1024*1024)

scanner = bufio.NewScanner(limitedReader)

}

gemini-code-assist · 2026-03-11T12:15:56Z

onprem-data-verifier/cmd/root.go

+				--backupDir ./workspace \
+				--customerName lookersre-scotty-1 \
+				--luid "u-12345-6789" \
+				--output metadata.json"`,


The example usage in the Long help text includes an extraneous double quote at the end of the line, which could be confusing for users. It should be removed.

Suggested change

--output metadata.json"`,

--output metadata.json`,

gemini-code-assist · 2026-03-11T12:15:56Z

onprem-data-verifier/logger/logger.go

+		div *= unit
+		exp++
+	}
+	return fmt.Sprintf("%.2f %cB", float64(b)/float64(div), "KMGTPE"[exp])


The current formatBytes function produces slightly awkward output (e.g., 1.23 K B). To improve clarity and align with IEC standards for units based on powers of 1024, it would be better to format the output as 1.23 KiB, 1.23 MiB, etc.

Suggested change

return fmt.Sprintf("%.2f %cB", float64(b)/float64(div), "KMGTPE"[exp])

return fmt.Sprintf("%.2f %ciB", float64(b)/float64(div), "KMGTPE"[exp])

gemini-code-assist · 2026-03-11T12:15:56Z

onprem-data-verifier/metadata/metadata.go

+	if finalPath != "metadata.json" && finalPath == reportPath {
+		logger.Info("Saving report to the user provided output path: %s", finalPath)
+	} else if finalPath == "metadata.json" {
+		logger.Info("Saving report to the current directory: %s", finalPath)
+	}


The logging logic for the report's save path is incomplete. When a user specifies a path in a non-existent directory, the tool correctly falls back to the current directory but fails to log the actual save location, which could cause confusion. Simplifying the logging to always report the final path would be more robust.

logger.Info("Saving report to: %s", finalPath)

gemini-code-assist · 2026-03-11T12:15:56Z

onprem-data-verifier/validator/validator.go

+	if len(stats.DetectedCharsets) > 0 {
+		logger.Success("Database Charset: utf8mb4")
+	}


In validateDatabase, if no DEFAULT CHARSET definitions are found in the SQL dump, nothing is logged about the charset check. While this isn't a failure, it's an important piece of information. It would be beneficial to log a warning in this scenario to inform the user.

Suggested change

if len(stats.DetectedCharsets) > 0 {

logger.Success("Database Charset: utf8mb4")

}

if len(stats.DetectedCharsets) > 0 {

logger.Success("Database Charset: utf8mb4")

} else {

logger.Warn("No 'DEFAULT CHARSET' definitions found in the SQL dump.")

}

gemini-code-assist · 2026-03-11T12:15:56Z

onprem-data-verifier/validator/validator.go

+	// Execution will never reach here but we need to add a return to keep compiler happy
+	o.Report.CmkStatus = "Invalid"
+	return fmt.Errorf("CMK is invalid")


This block of code is unreachable. The ValidateCMK helper function returns an error on failure, which is handled in the preceding if err != nil block. If no error occurs, isValid is guaranteed to be true, and the function will return from within the if isValid block. This unreachable code should be removed to improve code clarity and maintainability.

firat-kaya requested a review from a team as a code owner February 19, 2026 13:39

gemini-code-assist bot reviewed Feb 19, 2026

View reviewed changes

drstrangelooker approved these changes Feb 20, 2026

View reviewed changes

firat-kaya added 2 commits March 11, 2026 11:25

Merge remote-tracking branch 'refs/remotes/origin/on-prem-verify' int…

de67c14

…o on-prem-verify

firat-kaya merged commit e36ffde into main Mar 11, 2026
2 checks passed

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

	return fmt.Sprintf("%.2f %cB", float64(b)/float64(div), "KMGTPE"[exp])
	return fmt.Sprintf("%.2f %ciB", float64(b)/float64(div), "KMGTPE"[exp])

Conversation

firat-kaya commented Feb 19, 2026

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drstrangelooker left a comment

Choose a reason for hiding this comment

Uh oh!

firat-kaya commented Mar 11, 2026

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants