About Unicode Filenames in Different Operating Systems

Different operating systems use different ways to store diacritical symbols in the file names.

  • Windows and Linux uses UTF-8 Normalized Form Composed (NFC), for example, "ü" is a single Unicode character.
  • MacOS uses UTF-8 Normalized Form Decomposed (NFD), for example, "ü" is two characters–"u" in combination with the diacritical mark.

When you copy files from macOS to the Linux server, you have to be aware that the filename will be modified to support the format of the Linux file system. So if you try to read a known file from the disk with a Python script, the code might not find it on the drive.

My recommended practice is always to use ASCII characters for file names.

Tips and Tricks Programming Dev Ops Architecture Python 3 Unicode