Filename Sanitization¶
sanitize_filename() converts arbitrary Unicode strings into safe filenames that work across operating systems. It handles transliteration, illegal character removal, reserved name detection, and length truncation.
These examples are executed in CI
Every python block on this page runs against the shipped wheel and its
asserted outputs are checked, so the results below cannot silently rot
(see #154). Each assert is the guaranteed return value.
Basic usage¶
from disarm import sanitize_filename
assert sanitize_filename("my<file>:v2.txt") == "my_file_v2.txt"
assert sanitize_filename("café résumé.pdf") == "cafe_resume.pdf"
assert sanitize_filename("../../../etc/passwd") == "_.etcpasswd"
assert sanitize_filename("CON.txt") == "_CON.txt" # Windows reserved name
use disarm::api::{self, Platform};
// sanitize_filename(text, separator, max_length, platform, lang, preserve_extension)
assert_eq!(api::sanitize_filename("my<file>:v2.txt", "_", 255, Platform::Universal, None, true).unwrap(), "my_file_v2.txt");
assert_eq!(api::sanitize_filename("café résumé.pdf", "_", 255, Platform::Universal, None, true).unwrap(), "cafe_resume.pdf");
assert_eq!(api::sanitize_filename("../../../etc/passwd", "_", 255, Platform::Universal, None, true).unwrap(), "_.etcpasswd");
// CON.txt is a Windows reserved name
assert_eq!(api::sanitize_filename("CON.txt", "_", 255, Platform::Universal, None, true).unwrap(), "_CON.txt");
require "disarm"
Disarm.sanitize_filename("my<file>:v2.txt") # => "my_file_v2.txt"
Disarm.sanitize_filename("café résumé.pdf") # => "cafe_resume.pdf"
Disarm.sanitize_filename("CON.txt") # => "_CON.txt"
import { sanitizeFilename } from 'disarm'
sanitizeFilename('my<file>:v2.txt') // => 'my_file_v2.txt'
sanitizeFilename('café résumé.pdf') // => 'cafe_resume.pdf'
sanitizeFilename('CON.txt') // => '_CON.txt'
Parameters¶
separator¶
Character used to replace illegal characters (default: "_"):
assert sanitize_filename("hello:world", separator="-") == "hello-world"
use disarm::api::{self, Platform};
assert_eq!(api::sanitize_filename("hello:world", "-", 255, Platform::Universal, None, true).unwrap(), "hello-world");
Disarm.sanitize_filename("hello:world", separator: "-") # => "hello-world"
import { sanitizeFilename } from 'disarm'
sanitizeFilename('hello:world', { separator: '-' }) // => 'hello-world'
max_length¶
Maximum filename length in bytes (default: 255):
assert len(sanitize_filename("a" * 300)) == 255
When preserve_extension=True, the extension is counted toward the limit and preserved:
assert sanitize_filename("a" * 300 + ".pdf", max_length=20) == "aaaaaaaaaaaaaaaa.pdf"
platform¶
Target platform for sanitization rules:
# Universal (default) — safe on all platforms
assert sanitize_filename("my:file?.txt", platform="universal") == "my_file.txt"
# POSIX — only / and NUL are illegal
assert sanitize_filename("my:file?.txt", platform="posix") == "my:file?.txt"
# Windows — additionally forbids < > : " | ? * and reserved names
assert sanitize_filename("CON.txt", platform="windows") == "_CON.txt"
use disarm::api::{self, Platform};
// Universal (default) — safe on all platforms
assert_eq!(api::sanitize_filename("my:file?.txt", "_", 255, Platform::Universal, None, true).unwrap(), "my_file.txt");
// POSIX — only / and NUL are illegal
assert_eq!(api::sanitize_filename("my:file?.txt", "_", 255, Platform::Posix, None, true).unwrap(), "my:file?.txt");
// Windows — additionally forbids < > : " | ? * and reserved names
assert_eq!(api::sanitize_filename("CON.txt", "_", 255, Platform::Windows, None, true).unwrap(), "_CON.txt");
# Universal (default) — safe on all platforms
Disarm.sanitize_filename("my:file?.txt", platform: :universal) # => "my_file.txt"
# POSIX — only / and NUL are illegal
Disarm.sanitize_filename("my:file?.txt", platform: :posix) # => "my:file?.txt"
# Windows — additionally forbids < > : " | ? * and reserved names
Disarm.sanitize_filename("CON.txt", platform: :windows) # => "_CON.txt"
import { sanitizeFilename } from 'disarm'
sanitizeFilename('my:file?.txt', { platform: 'universal' }) // => 'my_file.txt'
sanitizeFilename('my:file?.txt', { platform: 'posix' }) // => 'my:file?.txt'
sanitizeFilename('CON.txt', { platform: 'windows' }) // => '_CON.txt'
| Platform | Illegal characters | Reserved names |
|---|---|---|
"universal" |
Union of POSIX + Windows rules | CON, PRN, AUX, NUL, COM1–9, LPT1–9 |
"posix" |
/, NUL |
None |
"windows" |
< > : " / \\ \| ? *, control chars |
CON, PRN, AUX, NUL, COM1–9, LPT1–9 |
lang¶
Language profile for transliteration of non-ASCII characters:
# German profile expands umlauts (ä → ae)
assert sanitize_filename("Ärger.txt", lang="de") == "Aerger.txt"
# Default profile strips the diaeresis (ä → a)
assert sanitize_filename("Ärger.txt") == "Arger.txt"
use disarm::api::{self, Platform};
// German profile expands umlauts (ä → ae)
assert_eq!(api::sanitize_filename("Ärger.txt", "_", 255, Platform::Universal, Some("de"), true).unwrap(), "Aerger.txt");
// Default profile strips the diaeresis (ä → a)
assert_eq!(api::sanitize_filename("Ärger.txt", "_", 255, Platform::Universal, None, true).unwrap(), "Arger.txt");
# German profile expands umlauts (ä → ae)
Disarm.sanitize_filename("Ärger.txt", lang: "de") # => "Aerger.txt"
# Default profile strips the diaeresis (ä → a)
Disarm.sanitize_filename("Ärger.txt") # => "Arger.txt"
import { sanitizeFilename } from 'disarm'
sanitizeFilename('Ärger.txt', { lang: 'de' }) // => 'Aerger.txt'
sanitizeFilename('Ärger.txt') // => 'Arger.txt'
preserve_extension¶
Whether to preserve the file extension during truncation (default: True):
assert sanitize_filename("long_name.pdf", max_length=12, preserve_extension=True) == "long_nam.pdf"
assert sanitize_filename("long_name.pdf", max_length=12, preserve_extension=False) == "long_name.pd"
Pipeline¶
The sanitization pipeline executes in this order:
- Transliterate non-ASCII characters (using
langif set) - Strip OS-illegal characters (per
platform) - Replace stripped characters with
separator - Collapse consecutive separators
- Handle reserved names (prefix with
_) - Truncate to
max_length(respectingpreserve_extension) - Strip leading/trailing separators and dots