HDFS stays in safe mode because of reported blocks not reaching 0.9990 of total blocks

After a node failure and restarting the HDFS, the NameNode reports:

“The reported blocks 1968810 needs additional 5071 blocks to reach the threshold 0.9990 of total blocks 1975856. Safe mode will be turned off automatically.”

in the log.

Why this happens? And how to fix it?

About why the NameNode stays in the safe mode:

At startup time, the namenode reads its namespace from disk (the
FSImage and edits files). This includes all the HDFS filenames and
block lists that it should know, but not the mappings of block
replicas to datanodes. Then it waits in safe mode for all or most of
the datanodes to send their Initial Block Reports, which let the
namenode build its map of which blocks have replicas in which
datanodes. It keeps waiting until dfs.namenode.safemode.threshold-pct
of the blocks that it knows about from FSImage have been reported from
at least dfs.namenode.replication.min (default 1) datanodes [so that’s
a third config parameter I didn’t mention earlier]. If this threshold
is achieved, it will post a log that it is ready to leave safe mode,
wait for dfs.namenode.safemode.extension seconds, then automatically
leave safe mode and generate replication requests for any
under-replicated blocks (by default, those with replication < 3).

and

If it doesn’t reach the “safe replication for all known blocks”
threshold, then it will not leave safe mode automatically. It logs
the condition and waits for an admin to decide what to do, because
generally it means whole datanodes or sets of datanodes did not come
up or are not able to communicate with the namenode. Hadoop wants a
human to look at the situation before hadoop starts trying to madly
generate re-replication commands for under-replicated blocks, and
deleting blocks with zero replicas available.

By Matthew Foley.

If you are sure that the blocks will never be reported in. You can force the NameMode to leave safemode by

hadoop dfsadmin -safemode leave

You may then run hdfs fsck -move or hdfs fdck -delete to move or delete corrupted files if you are sure you will not need these affected files any more.

Eric Ma

Eric is a systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

2 comments:

Leave a Reply

Your email address will not be published. Required fields are marked *