Apache Hadoop is a framework written in Java for running applications on large cluster built of commodity hardware.
The Hadoop framework transparently provides both reliability and data motion to sapplications. Hadoop implements a computational paradigm named MapReduce, where the application is divided into many small fragments of work, each of which can execute or re-execute on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster.
All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines or racks of machines) are common and thus should be automatically handled in software by the framework. Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.
Hadoop was created by Doug Cutting and Mike Cafarella in 2005. Cutting, who was working at Yahoo at the time, named it after his son's toy elephant. It was originally developed to support distribution for the Nutch search engine project.