Monday, December 12, 2011
Three weeks ago I played with libhdfs and NFS, but I did not get the results I expected. Then my next idea was, why not to use Samba? Samba3x is stable and most OS can mount an exported share.
The main task was to research the performance and setup of this scenario, because samba has a lot of tuning mechanisms inside. Let's go!
I used a RHEL 5.7 and the delivered RPMs:
#> rpm -qa|grep samba
Like I described in "NFS exported HDFS" I mounted hdfs over fuse into the directory /123 via /etc/fstab:
#> cat /etc/fstab
hadoop-fuse-dfs#dfs://NAMENODE:9000 /123/hdfs fuse usetrash,rw 0 0
and checked it:
fuse on /123/hdfs type fuse (rw,nosuid,nodev,allow_other,default_permissions)
#> ls -la /123
drwxr-xr-x 3 root root 4096 Dec 9 16:36 .
drwxr-xr-x 27 root root 4096 Dec 9 12:11 ..
drwxr-xr-x 5 hdfs nobody 4096 Dec 9 02:14 hdfs
The first step afterwards is to configure samba. I figured that config out:
#> cat /etc/samba/smb.conf
bind interfaces only = yes
deadtime = 15
default case = lower
disable netbios = yes
interfaces = eth0
dns proxy = no
workgroup = HDFS
server string = Samba Server Version %v
socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF=65536 SO_SNDBUF=65536
load printers = no
max connections = 30
strict sync = no
sync always = no
syslog = 1
syslog only = yes
security = user
smb passwd file = /etc/samba/smbpasswd
comment = HDFS
path = /123/hdfs
public = yes
writable = yes
printable = no
create mask = 0744
force user = hdfs
force group = nobody
Created user and password, here I used the hdfs-system-user (id=hdfs, group=nobody)
smbpasswd -a username
At last I started the server:
#> service smb restart
For testing I used another RHEL5.7 server and mounted the exported share into /test:
#> mount -t cifs -o username=hdfs,rw //SAMBASERVER/hdfs /test
#> ls -la /test/
drwxr-xr-x 5 hdfs nobody 0 Dec 9 02:14 .
drwxr-xr-x 25 root root 4096 Dec 9 15:03 ..
drwxr-xr-x 3 hdfs nobody 0 Dec 9 02:12 mapred
drwxr-xr-x 3 hdfs nobody 0 Dec 9 02:13 opt
drwxr-xr-x 6 hdfs nobody 0 Dec 9 15:56 user
Now the hdfs from my testcluster is exported via samba. So far, so good.
My first test concerned the read performance, here I chose a rsync of a smaller logfile collection:
#> cd /tmp/rsync-test
#> rsync -av /test/hdfs/user/flume/weblogs/2011-12-07/ .
sent 20478888644 bytes received 92606 bytes 17377158.46 bytes/sec
total size is 20475835998
(19GB, 16 MB/s)
How many files I synced?
#> find . -type f |wc -l
Okay, that worked. Then I tested the write speed, here I used a plain file I created with
#> dd if=/dev/zero of=/tmp/13GB bs=128M count=100
and copied it into the cifs-mount, for testing with "time":
#> time cp /tmp/13GB /test/hdfs/user/
= around 27 mb/s
checked for correct rights and groups on hdfs:
hdfs#> hadoop dfs -ls /user
Found 1 item
-rw-r--r-- 3 hdfs supergroup 13421772800 2011-12-09 15:56 /user/13GB
To compare with a scp write test I used:
#> scp /tmp/13GB hdfs@SAMBASERVER:/123/hdfs/user
13GB 100% 13GB 47.8MB/s 04:28
which is much faster. The overhead from samba will cost performance, for sure.
It is possible to export a hdfs filesystem over libhdfs and samba to clients and get acceptable results. That makes some tasks easier, including the use of hdfs as a (limited) cluster storage.
Friday, December 2, 2011
Features are direction keys navigation through directories or files, dev friendly colors, command highlighting, improved history, option autocomplete, ssh autocomplete (if the keys are known) and a lot more useful things.
AppStore => Xcode => Install Xcode
From now we use a terminal window.
/usr/bin/ruby -e "$(curl -fsSL https://raw.github.com/gist/323731)"
Install git and wget:
brew install git
brew install wget
wget --no-check-certificate https://github.com/robbyrussell/oh-my-zsh/raw/master/tools/install.sh -O - | sh
The script want to change your shell from /bin/bash into /bin/zsh, here you have to provide your password.
To change the theme edit ~/.zshrc and edit line 8:
Install and enable zsh-highlighting:
git clone http://github.com/nicoulaj/zsh-syntax-highlighting.git ~/.oh-my-zsh/plugins/
edit .zshrc and add in line 27 (space seperated):
Solarized and Pathogen
mkdir -p ~/.vim/autoload ~/.vim/bundle; \
curl -so ~/.vim/autoload/pathogen.vim \
git clone https://github.com/altercation/vim-colors-solarized.git
cd ~/ && git clone https://github.com/ikaros/vim-configuration.git && rm -rf .vim && mv vim-configuration .vim && rake -T && rake place_vim_config
I remove some bundles for now:
git rm --cache bundle/cucumber
git rm --cache bundle/rubytest
git rm --cache bundle/snipmate
and add snipmate, because the bundle above does not work:
git submodule add https://github.com/msanders/snipmate.vim.git bundle/snipmate
git submodule update --init
To change the Solarized theme edit in your .vimrc the background-variable (set background=light | dark)
Finished. Enjoy your new shell!