2pk03 over AI, ML, BigData and data processing

Posts

Showing posts from December, 2013

How HDFS protects your data

By Anonymous - December 27, 2013

Often we get questions how HDFS protects data and what the mechanisms are to prevent data corruption. Eric Sammer explain this en detail in Hadoop Operations . Additional to the points below, you can also have a second cluster to sync the files, simply to prevent human being failures, like deleting a subset of data. If you have enough space in your cluster, enabling the trash per core-site.xml and setting to a higher value then a day helps too. <property> <name>fs.trash.interval</name> <value>1440</value> <description>Number of minutes after which the checkpoint gets deleted. If zero, the trash feature is disabled. 1440 means 1 day </description> </property> <property> <name>fs.trash.checkpoint.interval</name> <value>15</value> <description>Number of minutes between trash checkpoints. Should be smaller or equal to fs.trash.interval. Every time the checkpointer runs it creates a new check