(You can choose or or both)

Wednesday, November 22, 2006

Excel CSV UTF-8 unicode accents characters


(Désolé, encore un article technique, en compensation - une photo de 'stinguette prise ce matin. J'expérimente aussi avec les 'catégories' dans ce blog - vous verrez que cet article est 'techno'; alors technophobes s'abstenir).

(Sorry, another technical post. But at least this time there's a photo of Little My, taken this morning. I'm also experimeting with 'labels' on this blog - you will notice that this one is 'Techno' - though not of the musical sort).

(UPDATE 2008-09-05: This is by far the most read post on my blog. Leave a little comment for me if you find it useful, I like to hear your stories)
Hello, welcome to my blog if you came here searching in desperation for a solution to a stupid problem.

The problem is that Excel won't correctly interpret accents in a csv file with utf-8 encoding. So the following csv:
id;name
4;Hélène

Looks like this when opened in Excel:
id name
4 Hélène

Do you want to know the solution?
Do you really really want to know?
Are you prepared to abandon all coding niceties and foibles about doing things the right way?

The solution (the only one that I have found/worked after trawling through lots of google) is to save your data as html, but to give the file a .xls extension.

Don't ask me why, but despite being flummoxed by the csv, Excel is smart enough to eat the html, accents and all.

Here is an example of the same data - with the necessary charset declaration.
<html>
<meta http-equiv="Content-Type" content="text/html" charset="utf-8" />
<table>
<tr>
<th>id</th>
<th>name</th>
</tr>
<tr>
<td>4</td>
<td>Hélène</td>
</tr>
</table>
</html>

You can even do a bit of styling on the results:
<style>
table, td, th {border:solid black}
table {border-collapse:collapse;border-width:0 0.5pt 0.5pt 0}
td, th {border-width:0.5pt 0 0 0.5pt}
</style>

[Postscript for Anonymous (in comments below) - if using Notepad or equivalent, you have to Save As UTF-8, otherwise it won't work.]

7 comments yet :

Anonymous said...

I tried this by copying ur html into a file called test.xls and opened in excel but it does not show the accentuated characters correctly

Anonymous said...

Thanks for your reply. BTW, after trail and error of your html snippet, I found that if you change the charset="Unicode" instead of "UTF-8" excel was able to read that html and show the data correctly.

Thanks for your help. I have looked around everywhere to get this to work, looks like html is the way to go. I am going to give it a shot.

Rudy said...

Yeah, the nice professionals at MS hardcoded UTF-16. Save it with that encoding, and all will work. I really despise them, because they cost me too much time every time I have to use their crap... Amateurs!

Tevye Krynski said...

It looks to me like excel will render UTF-8 correctly if the first three characters of your CSV file are "" (without quotes).

Benjol said...

@Tevye, you may be right, though I would have thought I'd have tried that at the time. Too long ago now to remember!

@Rudy, I don't know the history, but my understanding is that it is more CSV that is broken than Excel. As for character encoding, I doubt whether there are more than a dozen people in the world who are qualified to throw stones...

Anonymous said...

it helped me thanks.
Although there is one more way of achieving this. As excel has something called web options where you can choose to show content as Unicode.

Anonymous said...

A simple solution is found here.
http://www.accompa.com/kb/answer.html?answer_id=262

Open the csv in Notepad. Select "Save As" and select UTF-8 as the encoding.

Now it opens in Excel with accents.