Home > linux, script > Batch convert files to UTF-8

Batch convert files to UTF-8

In my daily job as DBA I need to convert a lot of files from ISO-8859-1 to UTF-8. This files are commonly SQL scripts generated on Windows environments and I have to execute them in linux servers (Red Hat or Ubuntu) against UTF-8 databases.These scripts usually have characters in Spanish and Catalan (tildes, ñ, ç, etc.) and I need to convert to UTF-8 to avoid getting errors due to invalid characters.

I’ve tried with iconv, specifying as output file the same name as the input file:

$ iconv -f ISO-8859-1 -t utf-8 file.sql -o file.sql

but if the file is greater than 32KB I get this:

Bus error

To avoid this I’ve created a script (2utf8.sh):

#!/bin/bash
if [ $# -lt 1 ]
then
  echo "Use: "$0" <file_name>"
  echo "Convert files from ISO-8859-1 to UTF8"
  exit
fi

for i in $*
do
  if [ ! -f $i ]; then # Only convert text files
    continue
  fi
  # Generate temp file to avoid Bus error
  iconv -f ISO-8859-1 -t utf-8 $i -o $i.tmp
  mv $i.tmp $i
done

You just have to run it with the name of the file to convert, or using wildcards:

$ 2utf8.sh *.sql

Hope this help you!

Advertisements
Categories: linux, script
  1. June 8, 2017 at 2:02

    You. are. AMAZING!

    Thanks for this. I know not many people are going to run into these issues but you’ve honestly saved my butt. I would never have thought this was the reason why my file was limited to 32768 bytes.

    If I ever run into you, I owe you a beer :)!

    Like

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: