Ciro Santilli

Deprecated in favor of: https://cirosantilli.com

Search

Main menu

Home
About

Category Archives: character encoding

Python read ISO 8859-1 (Latin1) encoding

Posted on 2012-07-1 by cirosantilli

Reply


import codecs
f = codecs.open(path, 'r', 'iso-8859-1')
str = f.read()
f.close()

Posted in character encoding, python | Leave a reply

ISO 8859-1 (Latin1) encoding

Posted on 2012-07-1 by cirosantilli

Reply

if you ever meet 2 byte characters, that

are not UTF8
become rubbish when you translate to UTF8
should be latin character variations such as á, ç, or ö
have high first byte (0xF2), which is the case for the most commmon latin char variations

suspect this encoding. It is quite common. For a list see

http://www.w3schools.com/tags/ref_entities.asp

Posted in character encoding | Leave a reply

cirosantilli

https://cirosantilli.com

Personal Links

Home page

View Full Profile →

category cloud

7/10 8/10 9/10 analysis mathematics art bash shell brazil character encoding china chinese cinema ciro note computer c programming language crime english font french funny game gcc general knowledge git version control glibc gnome terminal japan joys of open source keyboard automation latex libc link only linux linux util literature mathematics military perl person programming python quick fact quote regex scifi sheakespeare tale thoughts title only ubuntu ubuntu 11.10 ubuntu 12.04 Uncategorized unity desktop vim editor violent

Top Rated

rss

RSS - Posts
RSS - Comments

visit count

1,364

Create a free website or blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy

Subscribe Subscribed
- Ciro Santilli
- Already have a WordPress.com account? Log in now.