When ever I see lots of question marks, boxes, or empty spaces for specific characters rendered string, it means the FONT did not contain a symbol for that character.
This MOST OFTEN means you are using a UTF encoded string with a font that does not contain UFT characters. (few fonts contain ALL the UFT characters!) OR you used a special symbol or artistic font that contains even fewer characters than even the default ASCII set.
In your case your string was probably generated either by Word, in which case quotes are probably word character codes, or is UTF special code likes 'open and close quotes' rather than the ascii quote charcater (use for both open and close).
Two solutions.
- Use a UTF font (like Arial from Windows), or a special UTF font contains special symbols for math or dingbats such as Mincho (windows), DejaVu-Sans-Book (from DaJaVu Linux package), STIXGeneral (from Adobe Acro-reader package)
- Convert your font to use only ASCII characters.
the later I do quite regularly for text files I download from the web. (most Browsers understand BOTH word and UTF character codes) using some conversion macros using perl filters from a VIM editor.
Code: Select all
:" Unicode Characters (General Punctuation)
:%!perl -pe "s/\xE2\x80[\x98\x99]/'/g"
:%!perl -pe 's/\xE2\x80[\x9C\x9d]/"/g'
:%!perl -pe "s/\xE2\x80[\x92-\x95\223]/ - /g"
:%!perl -pe "s/\xE2\x80[\xA5\xA6]/.../g"
:%!perl -pe "s/a\xC2\x80\xC2\x91/-/g"
:%!perl -pe "s/\xC3[\x80-\x85]|[\xC0-\xC5]/A/g"
:%!perl -pe "s/\xC3[\x88-\x8B]|[\xC8-\xCB]/E/g"
:%!perl -pe "s/\xC3[\x8C-\x8F]|[\xCC-\xCF]/I/g"
:%!perl -pe "s/\xC3[\x92-\x96]|[\xD2-\xD6]/O/g"
:%!perl -pe "s/\xC3[\x99-\x9C]|[\xD9-\xDC]/U/g"
:%!perl -pe "s/\xC3[\xA0-\xA5]|[\xE0-\xE5]/a/g"
:%!perl -pe "s/\xC3[\xA8-\xAB]|[\xE8-\xEB]/e/g"
:%!perl -pe "s/\xC3[\xAC-\xAF]|[\xEC-\xEF]/i/g"
:%!perl -pe "s/\xC3[\xB2-\xB6]|[\xF2-\xF6]/o/g"
:%!perl -pe "s/\xC3[\xB9-\xBC]|[\xFA-\xFC]/u/g"
:%!perl -pe "s/·/·/g"
Code: Select all
:" Word Text File Specials
:%!tr '\021\022\023\024' \'\'\"\"
:%!tr '\221\222\223\224\226\227\213' \'\'\"\"-
:%!tr '\240' '\040'
:%!perl -pe 's/\x85/... /g'
:%s/[ɼ]/.../g
The VIM codes :% at the start of the lines means 'filter file through this command'
the :" means comment ignore.
For example octal character code \023 is word open quotes do I replace it with regular quotes "
while the hexadecimal code \x85 is the 'ellipses' or ... character, so I replace the one character code it with 4 characters.
There are probably other text conversion methods (let me know if anyone has some) , but the above has generally (95% of the time) been enough to handle most character coding problems. At least for the text files I have been dealing with.