2pk03 over AI, ML, BigData and data processing

Posts

Showing posts from November, 2011

NFS exported HDFS (CDH3)

By Anonymous - November 30, 2011

For some reasons it could be a good idea to make a hdfs filesystem available across networks as a exported share. Here I describe a working scenario with linux and hadoop with tools both have on board. I used fuse and libhdfs to mount a hdfs filesystem. Change namenode.local and <PORT> to fit your environment. Install: yum install hadoop-0.20-fuse.x86_64 hadoop-0.20-libhdfs.x86_64 Create a mountpoint: mkdir /hdfs-mount Mount your hdfs (testing): hadoop-fuse-dfs dfs://namenode.local:<PORT> /hdfs-mount -d You will show like that: INFO fuse_options.c:162 Adding FUSE arg /hdfs-mount INFO fuse_options.c:110 Ignoring option -d unique: 1, opcode: INIT (26), nodeid: 0, insize: 56 INIT: 7.10 flags=0x0000000b max_readahead=0x00020000 INFO fuse_init.c:101 Mounting namenode.local:<PORT> INIT: 7.8 flags=0x00000001 max_readahead=0x00020000 max_write=0x00020000 unique: 1, error: 0 (Success), outsize: 40 Hit crtl-C after you see "Su

All in one HDFS Cluster for your pocket

By Anonymous - November 19, 2011

Update 1 (Nov 21, 2011): - added 3rd interface as host-only-adapter (hadoop1) - enabled trusted device eth2 About one year ago, I created a small XEN-environment for my engineering pourposes. When I was traveling for hours it was very helpful to track some issues or test new features. The problem was that I had to carry 2 notebooks with me. That was the reason I switched to VirtualBox [1] which runs on OSX, Linux and Windows as well. I could play with my servers and when I did, they configured to death and I reimported them into a clean setup. I think that will also be a good start for new people who have to find into the hadoop ecosystem to see the power without the harm of configuration in a multi-node environment. The appliance is created with VirtualBox, because it runs on OSX and Windows very easily. The idea behind it is to check new settings in a small environment rather easily; the appliance is designed for research, not for development and really not for production. The a

HDFS debugging scenario

By Anonymous - November 03, 2011

The first step to debug issues in a running hadoop - environment to take a look at the stacktraces, easy accessible over jobtracker/stacks and let you show all running stacks in a jstack view. You will see the running processes, as an example I discuss a lab testing scenario, see below. http://jobtracker:50030/stacks Process Thread Dump: 43 active threads Thread 3203101 (IPC Client (47) connection to NAMENODE/IP:9000 from hdfs): State: TIMED_WAITING Blocked count: 6 Waited count: 7 Stack: java.lang.Object.wait(Native Method) org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:676) org.apache.hadoop.ipc.Client$Connection.run(Client.java:719) In that case the RPC connection has a state "TIMED_WAIT" in a block and waited count. That means, the namenode could not answer the RPC request fast enough. The problem belongs the setup as I see often in production environments. For demonstration I use a ESX Cluster with a VM for the namen