Properly viewing UTF-8

Do not allow such poorly written software as Notepad, Wordpad, etc to touch any files from your hosting environment.

Notepad and other poorly written software is not aware of *nix BOM (Byte Order Mark, look it up). This will cause some software to stop working completely after one of its files is edited with Notepad (such as MediaWiki and others), while other *nix software will generate errors.

Notepad++, while being much better than Notepad, and is safe to work with BOM-sensitive files, must only still be used by a knowledgeable person after reading its documentation.

Also, when you open the file, check what it auto-detected under Encoding> menu. The most universal option for Unix, Linux, Apache, and CMSes is "Encode in UTF-8 without BOM".

This is not the default setting. Notepad++ comes up as an ASCII viewer by default, and you must keep this in mind. This option must be changed when you are working with non-ASCII files.


However, even with all of this, when viewing an UTF-8 encoded file, and with the options in the Encoding menu set properly, you may still be seeing strange characters in your file. I have found that using a viewer which prompts you for encoding, such as Microsoft Office or Open Office (but not LibreOffice), and explicitly setting Unicode on that prompt screen, you can then check whether your non-ASCII or non-Latin characters are properly stored in Unicode.

To check encoding of the file itself, you can run the *nix command:

file filename



Sources, and for further reading:

˅˅˅ Additional valuable information is available at one of the links below: ˅˅˅


Did you like the article? Let Google Search know by clicking this button: . Please link to content that you find useful on this website on your own website, forum or blog! You can also comment on this page below, or to ask a question or suggest a topic for me to research. There is a user-editable Wiki available on my website, as well as a Forum that you can contribute to. Site Map.

Page last modified 06-Jan-13 20:51:51 EST
Comments on this page: