-Rack Awarness and Bandwith: Large HDFS instances run on a cluster of computers that commonly spread across many racks.
In most cases, network bandwidth between machines in the same rack is greater than network bandwidth between machines
in different racks.A simple but non-optimal policy is to place replicas on unique racks. This prevents losing data when an
entire rack fails and allows use of bandwidth from multiple racks when reading data. This policy evenly distributes replicas
in the cluster which makes it easy to balance load on component failure.
However, this policy increases the cost of writes because a write needs to transfer blocks to multiple racks.
-Proximity of DataNode: To minimize global bandwidth consumption and read latency, HDFS tries to satisfy the block allocation or read request
from a replica that is closest to the reader.