2pk03 over AI, ML, BigData and data processing

Posts

Showing posts from February, 2015

Hadoop and trusted MiTv5 Kerberos with Active Directory

By Anonymous - February 16, 2015

For actuality here a example how to enable an MiTv5 Kerberos <=> Active Directory trust just from scratch. Should work out of the box, just replace the realms: HADOOP1.INTERNAL = local server (KDC) ALO.LOCAL = local kerberos realm AD.REMOTE = AD realm with your servers. The KDC should be inside your hadoop network, the remote AD can be somewhere. 1. Install the bits At the KDC server (CentOS, RHEL - other OS' should have nearly the same bits): yum install krb5-server krb5-libs krb5-workstation -y At the clients (hadoop nodes): yum install krb5-libs krb5-workstation -y Install Java's JCE policy (see Oracle documentation ) on all hadoop nodes. 2. Configure your local KDC /etc/krb5.conf [libdefaults] default_realm = ALO.LOCAL dns_lookup_realm = false dns_lookup_kdc = false kdc_timesync = 1 ccache_type = 4 forwardable = true proxiable = true fcc-mit-ticketflags = true max_life = 1d max_renewable_life = 7d renew_lifetime = 7d

Hadoop based SQL engines

By Anonymous - February 09, 2015

Apache Hadoop comes more and more into the focus of business critical architectures and applications. Naturally SQL based solutions are the first to get considered, but the market is evolving and new tools are coming up, but leaving unnoticed. Listed below an overview over currently available Hadoop based SQL technologies. The must haves are: Open Source (various contributors), low-latency querying possible, supporting CRUD (mostly!) and statements like CREATE, INSERT INTO, SELECT * FROM (limit..), UPDATE Table SET A1=2 WHERE, DELETE, and DROP TABLE. Apache Hive (SQL-like, with interactive SQL (Stinger) Apache Drill (ANSI SQL support) Apache Spark ( Spark SQL , queries only, add data via Hive, RDD or Parquet ) Apache Phoenix (built atop Apache HBase , lacks full transaction support, relational operators and some built-in functions) Presto from Facebook (can query Hive, Cassandra , relational DBs & etc. Doesn't seem to be designed for low-latency responses acro