Monday, December 12, 2011

Export HDFS over CIFS (Samba3)


Three weeks ago I played with libhdfs and NFS, but I did not get the results I expected. Then my next idea was, why not to use Samba? Samba3x is stable and most OS can mount an exported share.
The main task was to research the performance and setup of this scenario, because samba has a lot of tuning mechanisms inside. Let's go!

I used a RHEL 5.7 and the delivered RPMs:
 #> rpm -qa|grep samba
 samba-3.0.33-3.29.el5_7.4.x86_64
 samba-common-3.0.33-3.29.el5_7.4.x86_64

Like I described in "NFS exported HDFS" I mounted hdfs over fuse into the directory /123 via /etc/fstab:

 #> cat /etc/fstab
 [..]
 hadoop-fuse-dfs#dfs://NAMENODE:9000 /123/hdfs fuse usetrash,rw 0 0

and checked it:
 #> mount
 [..]
 fuse on /123/hdfs type fuse (rw,nosuid,nodev,allow_other,default_permissions)

 #> ls -la /123
 total 16
 drwxr-xr-x  3 root root   4096 Dec  9 16:36 .
 drwxr-xr-x 27 root root   4096 Dec  9 12:11 ..
 drwxr-xr-x  5 hdfs nobody 4096 Dec  9 02:14 hdfs

The first step afterwards is to configure samba. I figured that config out:

#> cat /etc/samba/smb.conf
[global]
        bind interfaces only = yes
        deadtime = 15
        default case = lower
        disable netbios = yes
        interfaces = eth0
        dns proxy = no
        workgroup = HDFS
        server string = Samba Server Version %v
        socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF=65536 SO_SNDBUF=65536
        load printers = no
        max connections = 30
        strict sync = no
        sync always = no
        syslog = 1
        syslog only = yes
        security = user
        smb passwd file = /etc/samba/smbpasswd
        
[hdfs]
        comment = HDFS
        path = /123/hdfs
        public = yes
        writable = yes
        printable = no
        create mask = 0744
        force user = hdfs
        force group = nobody

Created user and password, here I used the hdfs-system-user (id=hdfs, group=nobody)      

smbpasswd -a username

At last I started the server:
 #> service smb restart

Test cases
For testing I used another RHEL5.7 server and mounted the exported share into /test:
 #> mount -t cifs -o username=hdfs,rw //SAMBASERVER/hdfs /test
 Password: HERE_THE_PASSWORD

check:
 #> ls -la /test/
 total 8
 drwxr-xr-x  5 hdfs nobody    0 Dec  9 02:14 .
 drwxr-xr-x 25 root root   4096 Dec  9 15:03 ..
 drwxr-xr-x  3 hdfs nobody    0 Dec  9 02:12 mapred
 drwxr-xr-x  3 hdfs nobody    0 Dec  9 02:13 opt
 drwxr-xr-x  6 hdfs nobody    0 Dec  9 15:56 user

Now the hdfs from my testcluster is exported via samba. So far, so good.

My first test concerned the read performance, here I chose a rsync of a smaller logfile collection:
 #> cd /tmp/rsync-test
 #> rsync -av /test/hdfs/user/flume/weblogs/2011-12-07/ .
 sent 20478888644 bytes  received 92606 bytes  17377158.46 bytes/sec
 total size is 20475835998
 (19GB, 16 MB/s) 

How many files I synced?
 #> find . -type f |wc -l
 4665

Okay, that worked. Then I tested the write speed, here I used a plain file I created with

 #> dd if=/dev/zero of=/tmp/13GB bs=128M count=100

and copied it into the cifs-mount, for testing with "time":
 #> time cp /tmp/13GB /test/hdfs/user/
 real 7m57.864s
 user 0m0.328s
 sys 0m20.602s

= around 27 mb/s

checked for correct rights and groups on hdfs:

 hdfs#> hadoop dfs -ls /user
 Found 1 item
 -rw-r--r--   3 hdfs supergroup 13421772800 2011-12-09 15:56 /user/13GB

To compare with a scp write test I used:
 #> scp /tmp/13GB hdfs@SAMBASERVER:/123/hdfs/user

and got
13GB  100%   13GB  47.8MB/s   04:28

which is much faster. The overhead from samba will cost performance, for sure.

Conclusion
It is possible to export a hdfs filesystem over libhdfs and samba to clients and get acceptable results. That makes some tasks easier, including the use of hdfs as a (limited) cluster storage.

Links:
Samba-Tuning: https://calomel.org/samba_optimize.html

Friday, December 2, 2011

OSX improved shell environment


There is my favorite environment for all reasons on my MacBook and come with an improved zsh and extended .vimrc with highlighting, checking, TextMate, Solarize. Some tools are to install, so here a recipe for it.
Features are direction keys navigation through directories or files, dev friendly colors, command highlighting, improved history, option autocomplete, ssh autocomplete (if the keys are known) and a lot more useful things.

Get Xcode:
 AppStore => Xcode => Install Xcode

From now we use a terminal window.

Install Brew
 /usr/bin/ruby -e "$(curl -fsSL https://raw.github.com/gist/323731)"

Install git and wget:
 brew install git
 brew install wget

Install oh-my-zsh:
 wget --no-check-certificate https://github.com/robbyrussell/oh-my-zsh/raw/master/tools/install.sh -O - | sh

The script want to change your shell from /bin/bash into /bin/zsh, here you have to provide your password.

To change the theme edit ~/.zshrc and edit line 8:
 ZSH_THEME="YOUR_FAV_THEME"

Install and enable zsh-highlighting:
 git clone http://github.com/nicoulaj/zsh-syntax-highlighting.git ~/.oh-my-zsh/plugins/
 edit .zshrc and add in line 27 (space seperated):
  zsh-syntax-highlighting

Solarized and Pathogen
(http://ethanschoonover.com/solarized)
(https://github.com/tpope/vim-pathogen)

mkdir -p ~/.vim/autoload ~/.vim/bundle; \  
curl -so ~/.vim/autoload/pathogen.vim \
    https://raw.github.com/tpope/vim-pathogen/HEAD/autoload/pathogen.vim
 
cd ~/.vim/bundle
 git clone https://github.com/altercation/vim-colors-solarized.git

Extended .vimrc:
cd ~/ && git clone https://github.com/ikaros/vim-configuration.git && rm -rf .vim && mv vim-configuration .vim && rake -T && rake place_vim_config

I remove some bundles for now:
 git rm --cache bundle/cucumber
 git rm --cache bundle/rubytest
 git rm --cache bundle/snipmate

and add snipmate, because the bundle above does not work:
 git submodule add https://github.com/msanders/snipmate.vim.git bundle/snipmate

update all bundles:
 git submodule update --init

To change the Solarized theme edit in your .vimrc the background-variable (set background=light | dark)

Finished. Enjoy your new shell!