SDB:Converting Files or File Names to UTF-8 Encoding
From openSUSE
Version: 9.1
Symptom
Special characters in files or file names are not properly displayed.
Cause
As of SUSE LINUX version 9.1, the default system character set is "lang_LANG.UTF-8". Further, if you have files from Windows filesystems, their filename might not be encoded correctly.
Solution
There are several approaches for the conversion to UTF-8 encoding:
If you have problems concerning the incorrect representation of file names, use the script "convmv" to convert these names to UTF-8. For example:
sudo zypper in convmv
to install the script.
convmv --notest -r -f latin1 -t utf-8
If the files are from a Windows filesystem, you can change their encoding with:
convmv -r -f cp1252 -t utf-8
Add a --notest to the line above, to actually change the filenames. Without it, convmv will just show you what it would do, i.e. before you convert, make sure you have chosen the correct input-encoding. Other English encoding could be cp437
If you have problems concerning the incorrect representation of file contents, use the command "iconv" to convert them to UTF-8. For example:
iconv -f latin1 -t utf-8 document.txt >> document_new.txt
To switch back to the ISO encoding, open the "Language selection" module in the "System" section in "YaST Control Center". The language currently in use is preselected after launching the module. Click "Details" and disable the use of UTF-8 encoding in the displayed dialog. Accept the modified settings and finish YaST.
The release notes for SUSE LINUX version 9.1 include a section on this subject:
UTF-8 Encoding Is Default
See http://www.suse.de/~mfabian/suse-cjk/locales.html
Non-UTF-8 File Names
Files from file systems created with SUSE LINUX versions up to 9.0 do not use UTF-8 encoding for the file names (unless otherwise specified). If these files include non-ASCII characters, they are not properly displayed with SUSE LINUX 9.1 or newer versions. To avoid this, use the script convmv to convert the files to UTF-8.
Keywords: utf8 | utf-8 | charset | special | character | specialcharacter | update | charset

