In a MapReduce job with 500 map tasks, how many map task attempts will there be?
Answer : D
Explanation:
From Cloudera Training Course:
Task attempt is a particular instance of an attempt to execute a task
There will be at least as many task attempts as there are tasks
If a task attempt fails, another will be started by the JobTracker
Speculative execution can also result in more task attempts than completed tasks
Workflows expressed in Oozie can contain:
Answer : A
Explanation: Oozie workflow is a collection of actions (i.e. Hadoop Map/Reduce jobs, Pig jobs) arranged in a control dependency DAG (Direct Acyclic Graph), specifying a sequence of actions execution. This graph is specified in hPDL (a XML Process Definition Language). hPDL is a fairly compact language, using a limited amount of flow control and action nodes.
Control nodes define the flow of execution and include beginning and end of a workflow
(start, end and fail nodes) and mechanisms to control the workflow execution path ( decision, fork and join nodes).
Workflow definitions -
Currently running workflow instances, including instance states and variables
Reference: Introduction to Oozie
Note: Oozie is a Java Web-Application that runs in a Java servlet-container - Tomcat and uses a database to store:
Consider the following two relations, A and B.
Answer : B
You need to perform statistical analysis in your MapReduce job and would like to call methods in the Apache Commons Math library, which is distributed as a 1.3 megabyte
Java archive (JAR) file. Which is the best way to make this library available to your
MapReducer job at runtime?
Answer : C
Explanation: The usage of the jar command is like this,
Usage: hadoop jar <jar> [mainClass] args...
If you want the commons-math3.jar to be available for all the tasks you can do any one of these
1. Copy the jar file in $HADOOP_HOME/lib dir
or
2. Use the generic option -libjars.
Which one of the following statements is true about a Hive-managed table?
Answer : B