When converting documents from customer word format to html it seems inevitable that there will be extra characters provided by Microsoft. Unfortunately this expanded character set is not supported by many web browsers and email clients, so we have to go through and clean all of this out of our html files.
I normally use vim to edit all html files, and I have finally found a reliable method of locating these ‘bad characters’ in vim.
The process consists of two steps:
1) Make sure the ‘file encoding’ is 8-bit
:setlocal fenc=latin1
2) Use the 8g8 command in Normal mode (see “help 8g8″)
This process allows the bad characters to be identified and converted to utf-8 characters that can be displayed in all web browsers and email clients. If anyone out there has a better/easier way of doing this, please let me know.
Related posts:
- Linux Directory Structure To a Windows user the Linux Directory Structure can seem...
- Vim – Multiple Files One of the very cool things about Vim is the...
- Configure the linux to write corefiles as core.pid? If you dump core files you can run into a...






Removing smartquotes from text in Linux