Conversation
…tion, , designed to validate Looker
on-premise backup artifacts before migration to Looker Cloud. The tool performs several checks, including: Workspace structure verification
MD5 checksum integrity checks
GPG key validation against the customer's LUID
SQL dump analysis (version, charset, extended inserts, critical tables)
CMK validation (raw and base64) Includes unit tests for various components. A file is also added to exclude IDE
files.
Summary of ChangesHello @firat-kaya, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a critical new utility designed to enhance the reliability and security of Looker on-premise to cloud migrations. By systematically verifying backup artifacts, the tool aims to prevent common data integrity and security issues, ensuring a smoother transition process. It provides a structured, step-by-step validation workflow, outputting clear results and a comprehensive metadata report. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new Go application, onprem-data-verifier, to validate Looker on-premise backup artifacts. The tool is well-structured, with clear separation of concerns into validation logic, command-line interface, and logging. The inclusion of unit tests is also a great practice. My review focuses on improving correctness, code clarity, and adherence to best practices. I've provided suggestions to fix an invalid Go version, correct a broken example in the command's help text, use a standard timestamp format, and remove an unnecessary IDE file from version control.
drstrangelooker
left a comment
There was a problem hiding this comment.
please address the Gemini issues then commit.
…tion, , designed to validate Looker
on-premise backup artifacts before migration to Looker Cloud. The tool performs several checks, including: Workspace structure verification
MD5 checksum integrity checks
GPG key validation against the customer's LUID
SQL dump analysis (version, charset, extended inserts, critical tables)
CMK validation (raw and base64) Includes unit tests for various components. A file is also added to exclude IDE
files.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new Go application, onprem-data-verifier, designed to validate Looker on-premise backup artifacts. The tool is well-structured with a clear separation of concerns and includes a comprehensive suite of validations and tests. However, two significant vulnerabilities were identified: a weak GPG recipient verification that can be bypassed by crafted GPG packets, and a potential Denial of Service vulnerability via gzip bombs. Addressing these issues will significantly improve the robustness and security of the migration process. Additionally, opportunities for improvement were noted in logging, output formatting, and the removal of IDE-specific files and unreachable code to refine the implementation and enhance user-facing output for clarity.
| for _, id := range validKeyIDs { | ||
| if strings.Contains(output, id) { | ||
| return true, nil | ||
| } | ||
| } |
There was a problem hiding this comment.
The VerifyRecipient function uses a weak check (strings.Contains(output, id)) to verify if a file is encrypted for a specific Key ID. This is insecure for several reasons:
- Substring Matching: It can match substrings of other Key IDs (e.g., Key ID '1234' would match '0123456789ABCDEF'), leading to false positives.
- Security Bypass: An attacker can bypass this check by crafting a GPG message that includes the target Key ID in a non-recipient field that
gpg --list-packetsdisplays, such as a literal data packet filename or a fake signature packet. - Empty ID Match: If the Key ID list contains an empty string (e.g., due to malformed GPG output),
strings.Containswill always return true, bypassing the check for any file.
This allows an attacker to provide a malicious backup that appears to be encrypted for the correct recipient, bypassing a critical security control.
// Check if the file is encrypted for ANY of the valid IDs
// Use a more robust check to ensure the ID is associated with a public key encryption packet
for _, id := range validKeyIDs {
if id == "" {
continue
}
// Match the specific pattern for a recipient key ID in gpg --list-packets output
pattern := fmt.Sprintf("keyid %s", id)
if strings.Contains(output, pattern) {
return true, nil
}
}| <?xml version="1.0" encoding="UTF-8"?> | ||
| <project version="4"> | ||
| <component name="VcsDirectoryMappings"> | ||
| <mapping directory="$PROJECT_DIR$/.." vcs="Git" /> | ||
| </component> | ||
| </project> No newline at end of file |
There was a problem hiding this comment.
IDE-specific configuration files and directories (like .idea) should not be committed to the version control system. While .gitignore has been updated to exclude .idea, this directory was likely added before the ignore rule was in place. Please remove the entire .idea directory from the repository to maintain a clean project structure.
| if strings.HasSuffix(filePath, ".gz") { | ||
| gzReader, err = gzip.NewReader(file) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create gzip reader: %w", err) | ||
| } | ||
| defer gzReader.Close() | ||
| scanner = bufio.NewScanner(gzReader) |
There was a problem hiding this comment.
The AnalyzeSQLDump function uses gzip.NewReader to decompress SQL dumps without any limits on the decompression ratio or the total size of the decompressed data. An attacker can provide a specially crafted 'gzip bomb' (a small compressed file that expands to an enormous size) that can exhaust system resources (CPU and memory), leading to a Denial of Service (DoS) when the tool attempts to scan the file.
| if strings.HasSuffix(filePath, ".gz") { | |
| gzReader, err = gzip.NewReader(file) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to create gzip reader: %w", err) | |
| } | |
| defer gzReader.Close() | |
| scanner = bufio.NewScanner(gzReader) | |
| if strings.HasSuffix(filePath, ".gz") { | |
| gzReader, err = gzip.NewReader(file) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to create gzip reader: %w", err) | |
| } | |
| defer gzReader.Close() | |
| // Wrap the reader with a limit to prevent gzip bombs (e.g., 100GB limit) | |
| // Note: Requires importing the "io" package | |
| limitedReader := io.LimitReader(gzReader, 100*1024*1024*1024) | |
| scanner = bufio.NewScanner(limitedReader) | |
| } |
| --backupDir ./workspace \ | ||
| --customerName lookersre-scotty-1 \ | ||
| --luid "u-12345-6789" \ | ||
| --output metadata.json"`, |
| div *= unit | ||
| exp++ | ||
| } | ||
| return fmt.Sprintf("%.2f %cB", float64(b)/float64(div), "KMGTPE"[exp]) |
There was a problem hiding this comment.
The current formatBytes function produces slightly awkward output (e.g., 1.23 K B). To improve clarity and align with IEC standards for units based on powers of 1024, it would be better to format the output as 1.23 KiB, 1.23 MiB, etc.
| return fmt.Sprintf("%.2f %cB", float64(b)/float64(div), "KMGTPE"[exp]) | |
| return fmt.Sprintf("%.2f %ciB", float64(b)/float64(div), "KMGTPE"[exp]) |
| if finalPath != "metadata.json" && finalPath == reportPath { | ||
| logger.Info("Saving report to the user provided output path: %s", finalPath) | ||
| } else if finalPath == "metadata.json" { | ||
| logger.Info("Saving report to the current directory: %s", finalPath) | ||
| } |
There was a problem hiding this comment.
The logging logic for the report's save path is incomplete. When a user specifies a path in a non-existent directory, the tool correctly falls back to the current directory but fails to log the actual save location, which could cause confusion. Simplifying the logging to always report the final path would be more robust.
logger.Info("Saving report to: %s", finalPath)| if len(stats.DetectedCharsets) > 0 { | ||
| logger.Success("Database Charset: utf8mb4") | ||
| } |
There was a problem hiding this comment.
In validateDatabase, if no DEFAULT CHARSET definitions are found in the SQL dump, nothing is logged about the charset check. While this isn't a failure, it's an important piece of information. It would be beneficial to log a warning in this scenario to inform the user.
| if len(stats.DetectedCharsets) > 0 { | |
| logger.Success("Database Charset: utf8mb4") | |
| } | |
| if len(stats.DetectedCharsets) > 0 { | |
| logger.Success("Database Charset: utf8mb4") | |
| } else { | |
| logger.Warn("No 'DEFAULT CHARSET' definitions found in the SQL dump.") | |
| } |
| // Execution will never reach here but we need to add a return to keep compiler happy | ||
| o.Report.CmkStatus = "Invalid" | ||
| return fmt.Errorf("CMK is invalid") |
There was a problem hiding this comment.
This block of code is unreachable. The ValidateCMK helper function returns an error on failure, which is handled in the preceding if err != nil block. If no error occurs, isValid is guaranteed to be true, and the function will return from within the if isValid block. This unreachable code should be removed to improve code clarity and maintainability.
This commit introduces a new Go application, , designed to validate Looker on-premise backup artifacts before migration to Looker Cloud. The tool performs several checks, including: