Error : 'utf-8' codec can't decode byte 0xb0 in position 14: invalid start byte

I'm a beginner at Python, and I would like to read multiple csv file and when i encode them with encoding = "ISO-8859-1",I get this kind of characters in my csv file : "D°faut". So I tried to encode in utf-8, I get this error : 'utf-8' codec can't decode byte 0xb0 in position 14: invalid start byte'. Can someone help me please ? Thank you !

1

1 Answer

If you decode with utf-8 you should also encode with utf-8. Depending on the unicode character you want to display (basically everything except for basic latin letters, digits and the usual symbols) utf-8 needs multiple bytes to store it. Since the file is read byte by byte you need to know if the next character needs more than a byte. This is indicated by the most significant bit of the byte. 0xb0 translates to 1011 0000 in binary and as you can see, the first bit is a 1 and that tells the utf-8 decoder that it needs more bytes for the character to be read. Since you encoded with iso-8859-1 the following byte will be part of the current character and encoding fails. If you want to encode the degree symbol (°), it would be encoded as 0xC2 0xB0.

In any case: Always encode with the same encoding as you want to decode. If you need characters outside the code page, use utf-8. In general using any of the utf encodings is a good advice.

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.

You Might Also Like