Skip to content

Sanitize filenames to remove URI-illegal characters#54

Merged
Mayuresh Pawar (Mayureshpawar29) merged 5 commits intomainfrom
eml-sanitize-attachment-filenames
Apr 3, 2026
Merged

Sanitize filenames to remove URI-illegal characters#54
Mayuresh Pawar (Mayureshpawar29) merged 5 commits intomainfrom
eml-sanitize-attachment-filenames

Conversation

@Mayureshpawar29
Copy link
Copy Markdown
Contributor

@Mayureshpawar29 Mayuresh Pawar (Mayureshpawar29) commented Apr 1, 2026

Description

  • Introduce a centralized textutil.SlugifyFileName function that replaces URI-unsafe characters in filenames (spaces, parentheses, brackets, unicode, etc.) with hyphens, collapses consecutive hyphens, and
    falls back to attachment for fully non-alphanumeric names
  • Replace the previous converter.SanitizeFileName with textutil.SlugifyFileName across all tasks (EML, CSV, XLSX, tar, zip, file) for consistent filename handling

Test Cases —

SlugifyFileName

Input Output
report.csv report.csv
report-000000000000 (1).csv report_000000000000_1.csv
My Report [Final].CSV my_report_final.csv
résumé données.pdf r_sum_donn_es.pdf
... attachment
file with spaces.txt file_with_spaces.txt
(attachment).csv attachment.csv

Slugify

Input Output
Report (1) report_1
My Report [Final] my_report_final
hello world hello_world
résumé données r_sum_donn_es
___already_trimmed___ already_trimmed
UPPER CASE upper_case
... (empty string)

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation and I have updated the documentation accordingly.
  • I have added tests to cover my changes.

Add sanitizeFilename() to the EML converter that replaces characters not
  in [a-zA-Z0-9_-.] with hyphens, collapses consecutive hyphens, and trims
  leading/trailing hyphens. Applied to all converter outputs (attachments,
  inlines, body parts, headers).
Replaced the existing converter.SanitizeFileName function with textutil.SlugifyFileName across multiple files, including tar, zip, file, and various converter implementations. This change centralizes filename sanitization logic and improves consistency in handling filenames. Additionally, a new textutil package was introduced to encapsulate the slugification logic.
@Mayureshpawar29 Mayuresh Pawar (Mayureshpawar29) changed the title fix: sanitize EML attachment filenames to remove URI-illegal characters Sanitize filenames to remove URI-illegal characters Apr 3, 2026
@Mayureshpawar29 Mayuresh Pawar (Mayureshpawar29) merged commit bf6eb72 into main Apr 3, 2026
7 checks passed
@Mayureshpawar29 Mayuresh Pawar (Mayureshpawar29) deleted the eml-sanitize-attachment-filenames branch April 3, 2026 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants