Advantages of Hadoop
- Hadoop provides both distributed storage and computational capabilities to address big data challenges.
- Hadoop is extremely scalable on commodity hardware.
- HDFS which is the storage component of Hadoop, is optimized for high throughput as HDFS uses large block sizes needed to manipulate large filesizes (gigabytes, petabytes…).
- HDFS also provides data replication by replicating files for specified number of times and fault tolerant by automatically re-replicating data blocks on failed nodes.
- MapReduce framework is used by Hadoop, which is a batch-based, distributed computing framework so, as to provide paralleled work processing over a large amount of data thus, removing distributed system complications for the developer. Mapreduce decomposes the job into Map and Reduce tasks and schedules them for remote execution on the slave or data nodes in a hadoop cluster
Disadvantages of Hadoop
Following are the major common areas found as weaknesses of Hadoop framework or system:
- HDFS and MapReduce components of Hadoop have single points of failure due to their single-master processes, until the Hadoop 2.x release.
- Security is disabled by default as it is highly complex for configuration.
- Storage and network level encryption is also not present in Hadoop which is needed for data security.
- HDFS is inefficient for handling small files, and it lacks transparent compression. As HDFS is not designed to work well with random reads over small files due to its optimization for sustained throughput.
- MapReduce is a batch-based architecture hence it is inefficient for real-time data access needs.
- MapReduce is a shared-nothing architecture hence Tasks that require global synchronization or sharing of mutable data are not a good fit which can pose challenges for some algorithms.