Gibberish in AWS Athena? instead of Hebrew ?

The problem is usually the orignal encoding of the source file.

But sometime it is about the end of line problem from different OS, just use dos2unix and try not to open the files in windows OS systems.

brew install dos2unix

dos2unix filename.csv

 

check the file type in linux CMD:

file -I filename.csv

the result should be something like the below. any other encoding like iso-8859-8 or UTF -8 should produce Hebrew, so your file will be probably in different encoding

text/plain; charset=iso-8859-1

Than the challenge would the convert…

3 ways to convert your text gibrish file into Hebrew:

  1. Microsoft XL , rename the file to filename.txt and open file, it will open a wizard letting you choose the encoding.
  2. Linux CMD:

    iconv -f iso-8859-1 -t utf-8 < file > file.new

  3. online encoding convertor to utf 8 i used : https://subtitletools.com/convert-text-files-to-utf8-online

 

Trying testing the file locally – if you see Hebrew on your desktop , you should be fine on Athena.

Have fun!

—————————————————————————————————–

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s