Skip to content

santize sheet name and file name#55

Merged
Divyanshu Tiwari (divyanshu-tiwari) merged 5 commits intomainfrom
handle-space-in-file
Apr 2, 2026
Merged

santize sheet name and file name#55
Divyanshu Tiwari (divyanshu-tiwari) merged 5 commits intomainfrom
handle-space-in-file

Conversation

@divyanshu-tiwari
Copy link
Copy Markdown
Contributor

@divyanshu-tiwari Divyanshu Tiwari (divyanshu-tiwari) commented Apr 1, 2026

Description

This pull request introduces improvements to how file and sheet names are handled and sanitized throughout the pipeline. The main focus is on standardizing the sanitization process by using a single Sanitize function for all relevant contexts, and adding new support for optional sheet name sanitization in XLSX conversion.

Sanitization and naming improvements:

  • Replaced all usages of custom or local file/sheet name sanitization with the shared Sanitize function from converter, ensuring consistent normalization of names across file, archive, and converter tasks. [1] [2] [3] [4]
  • Renamed sanitizeColumnName to Sanitize and exported it from the converter package for use in other modules.

XLSX converter enhancements:

  • Added a new SanitizeSheetNames option to the XLSX converter, allowing users to optionally sanitize sheet names in addition to headers. [1] [2]
  • Refactored the XLSX sheet reading logic to use a method receiver, simplifying access to converter options and improving code clarity. [1] [2] [3]
  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation and I have updated the documentation accordingly.
  • I have added tests to cover my changes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to standardize name sanitization across the pipeline by promoting the existing column/header sanitizer to a public converter.Sanitize function, applying it to file/archive context filenames, and adding an opt-in XLSX sheet-name sanitization option.

Changes:

  • Promotes sanitizeColumnName to converter.Sanitize and updates CSV/XLSX header sanitization call sites.
  • Applies converter.Sanitize to file and archive task context values used for output filenames.
  • Adds sanitize_sheet_names support in the XLSX converter and refactors sheet reading into an xlsx method.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
internal/pkg/pipeline/task/file/file.go Sanitizes CtxKeyFileNameWrite context value using converter.Sanitize.
internal/pkg/pipeline/task/archive/tar.go Sanitizes TAR entry base names when setting CtxKeyArchiveFileNameWrite.
internal/pkg/pipeline/task/archive/zip.go Sanitizes ZIP entry base names when setting CtxKeyArchiveFileNameWrite.
internal/pkg/pipeline/task/converter/converter.go Exports sanitizer as Sanitize (previously sanitizeColumnName).
internal/pkg/pipeline/task/converter/csv.go Switches CSV header init to use Sanitize.
internal/pkg/pipeline/task/converter/xlsx.go Adds sanitize_sheet_names option and uses Sanitize for headers (and optionally sheet names).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@divyanshu-tiwari Divyanshu Tiwari (divyanshu-tiwari) changed the title Handle space in file sanitize sheet name and file name Apr 1, 2026
@divyanshu-tiwari Divyanshu Tiwari (divyanshu-tiwari) changed the title sanitize sheet name and file name santize sheet name and file name Apr 1, 2026
@divyanshu-tiwari Divyanshu Tiwari (divyanshu-tiwari) merged commit e210f2c into main Apr 2, 2026
7 checks passed
@divyanshu-tiwari Divyanshu Tiwari (divyanshu-tiwari) deleted the handle-space-in-file branch April 2, 2026 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants