11-03-2011

Character encoding gotchas

Just when you think you’ve got your spring web application nicely under
control your first customer from a Scandinavian country tries to place
an order. And then you are hit by the evil character encoding monster.
Your customer doesn’t live in København but in K�benhavn and their last
name is now MÃ¥rtensson instead of Mårtensson. Chances are your
customers from China will be treated even worse by your web app.

No problem, you think, “Just need to set tomcat default encoding to
UTF-8 and we’re in worldwide business”. Well if life were that easy us
programmers would be out of jobs really quickly. Here’s the list of
tricks I needed to perform to make sure our expansion to Scandinavia and
China could begin:

1. set tomcat default encoding


In conf/server.xml set the attribute URIEncoding=”UTF-8″ on the Context
entries.

2. in web.xml add a character encoding filter


<filter>
<filter-name>characterEncodingFilter</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>


and map it to the requests that you need to be treated as UTF-8:

<filter-mapping>
<filter-name>characterEncodingFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>


3. make sure your database is in utf-8


Especially when using MySQL you need to be aware that by default it
creates databases in latin1 format. If, by accident, you didn’t pay
attention to this small detail when you first created your database,
here’s what you can do to change it afterwards:

alter database my_database default charset utf8 collate
utf8_general_ci;


followed, just to be sure, by the following statement for all your
tables:

alter table my_table convert to character set utf8 collate
utf8_general_ci;

4. make sure your DB connection also uses UTF-8, all the time


We’re using the DBCP connection pool, configured like this:

<bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource" p:connectionProperties="characterEncoding=UTF-8;useUnicode=true;"
...other properties...
</bean> 


5. instruct freemarker to use UTF-8 when processing its templates


<bean id="freemarkerConfiguration" class="org.springframework.ui.freemarker.FreeMarkerConfigurationFactoryBean">
<property name="templateLoaderPath" value="classpath:/mailTemplates" />
<property name="freemarkerSettings">
<props>
<prop key="default_encoding">UTF-8</prop>
<prop key="output_encoding">UTF-8</prop>
</props>
</property>
</bean>


6. when using the Spring restTemplate, make it use UTF-8


We were using restTemplate to POST from one web app to another. By
default, it uses ISO-8859-1 for its request parameters. This must be
overridden like this:

<bean id="restTemplate" class="org.springframework.web.client.RestTemplate">
<property name="messageConverters">
<list>
<bean class="org.springframework.http.converter.StringHttpMessageConverter" />
<bean class="org.springframework.http.converter.FormHttpMessageConverter" >
<property name="charset" value="UTF-8" />
</bean>
</list>
</property>
</bean>


That was all it took!

See the original
blog post on Wwwilpower
.

 

Jolien

Latest blogs

The 7th Magazine, 7 portals to inspire

Read blog

Illustrators Journal: covering the world of art

Read blog

Integrating the Peecho API and checkout: Issuu

Read blog