After doing some research, thought to share the list of interview questions on Hadoop. Which can definitely help you in the interview process.
The main components of Hadoop are Storage Unit –HDFS (NameNode DataNode) and Processing Framework –YARN (ResourceManager, NodeManager).
1. Why adding and removing nodes is done in a Hadoop Cluster frequently?
While using the commodity Hardware, “DataNode” crashes quite frequently in a Hadoop cluster. Hadoop also offers ease of scale along with an increase in data volume. These two tasks require the Hadoop administrator to add and delete “Data Nodes” frequently in a Hadoop Cluster.
2. Can NameNode and DataNode be commodity hardware?
DataNodes are similar to personal laptops and computers as they stores the data as well as needed in huge number. So, NameNode is the master node that stores metadata about all the blocks stored in HDFS. Since the requirement is for high memory space, NameNode should essentially be a high-end machine with good memory space.
3. Can you define Rack Awareness in Hadoop, if yes then how?
An algorithm through which the “NameNode” is the way in which blocks and their replicas are placed. This is done based on the rack definitions to bring down the network traffic between “DataNodes” within the same rack.
4. What are the three modes in which Hadoop can run?
- Firstly, Standalone or Local Mode
- Secondly, the Pseudo-Distributed Mode
- And Finally, the Fully Distributed Mode
5. What is MapReduce and the syntax of the same?
A framework used for processing large data sets over a group of computers with the help of parallel programming. The Syntax for the same is hadoop_jar_file.jar /input_path /output_path.
6. Why do you use “RecordReader” in Hadoop?
The above-mentioned functionality is used to load the data from the sources and convert it into pairs that are feasible for reading by the “Mapper” task. So, The “InutFormat” defines the “RecordReader” instance”.
7. What do you know about “SequenceFileInputFormat”?
An input format enabling to read within the sequence file. The compressed binary file format is best for transferring the data between the outputs of one “MapReduce” job to the input of some other “MapReduce” job. Further, It is possible to generate the Sequence file as the output of other MapReduce tasks.
8. What are the different relational operations in “Pig Latin” you know about?
Different relational operators are:
- For each
- Order by
- Filters
- Group
- Join
- Distinct
- Limit
9. What is “WAL” in HBase?
WAL or Write Ahead Log is a file linked to every Region Server underneath the distributed environment. Here, The WAL comes in handy to store the new data that does not have permanent storage allocation. So, WAL is used in case, data recovery is unsuccessful.
10. Do you have a working knowledge of Hadoop?
This you have to answer a little diplomatically wherein mention the Live Projects on which you have worked. In case you have completed the certification and skill to apply the knowledge practically, now is the time to go for the Live Projects.
Hope this Hadoop spark & scala interview questions are helpful for you. And will also help you crack your next interview with ease.