I launched a blog aggregation website using MagpieRSS recently. I had all sorts of strange errors with encoding.
I had strings that had black diamonds with question marks inside of them.
I had strings that looked like: Today?s president obama said ?blah blah?
I tried (and wasted a lot of time) debugging my own code for far longer than I care t admit.
Functions to replace certain patterns in strings. Encoding and decoding inputs. etc.
What solved the issues were patching MagpieRSS.
I changed MAGPIE_OUTPUT_ENCODING and MAGPIE_INPUT_ENCODING in rss_fetch.inc to 'UTF-8'
Around line 358 you should see the lines I crossed out and replace them with UTF-8 versions:
if ( !defined('MAGPIE_OUTPUT_ENCODING') ) {
define('MAGPIE_OUTPUT_ENCODING', 'ISO-8859-1');
define('MAGPIE_OUTPUT_ENCODING', 'UTF-8');
}if ( !defined('MAGPIE_INPUT_ENCODING') ) {
define('MAGPIE_INPUT_ENCODING', null);
define('MAGPIE_INPUT_ENCODING', 'UTF-8');
}
The only problem left was converting all the content to proper UTF-8 format.
Also make sure your database is using UTF-8 I chose general, but choose what suits you.
function encodeutf8($string){
return htmlentities(html_entity_decode($string,ENT_QUOTES,'UTF-8'),ENT_QUOTES,'UTF-8');
}
I html_entity_decode first because if anything in the text is already encoded (and many RSS feeds are)
htmlentities will encode the ampersand indicating it's a special character. So you run into issues
with ’ becoming &8217; which displays '’' instead of a single quote '.
If you still encounter problems, adding a header in PHP to the page displaying content also helped remove
strange characters and fixed encoding issues.
header('Content-Type: text/html; charset=UTF-8');
That's all the tricks I've picked up so far trying to make MagpieRSS function properly. Hopefully that
saves others some pain and frustration. Feel free to leave any other tips or ask questions in the comments.