Hortonworks HDPCD - Hortonworks Data Platform Certified Developer Exam

Page:    1 / 22   
Total 108 questions

In a MapReduce job with 500 map tasks, how many map task attempts will there be?

  • A. It depends on the number of reduces in the job.
  • B. Between 500 and 1000.
  • C. At most 500.
  • D. At least 500.
  • E. Exactly 500.


Answer : D

Explanation:
From Cloudera Training Course:
Task attempt is a particular instance of an attempt to execute a task
There will be at least as many task attempts as there are tasks
If a task attempt fails, another will be started by the JobTracker
Speculative execution can also result in more task attempts than completed tasks

Workflows expressed in Oozie can contain:

  • A. Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins.
  • B. Sequences of MapReduce job only; on Pig on Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins.
  • C. Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actions with exception handlers but no forks.
  • D. Iterntive repetition of MapReduce jobs until a desired answer or state is reached.


Answer : A

Explanation: Oozie workflow is a collection of actions (i.e. Hadoop Map/Reduce jobs, Pig jobs) arranged in a control dependency DAG (Direct Acyclic Graph), specifying a sequence of actions execution. This graph is specified in hPDL (a XML Process Definition Language). hPDL is a fairly compact language, using a limited amount of flow control and action nodes.
Control nodes define the flow of execution and include beginning and end of a workflow
(start, end and fail nodes) and mechanisms to control the workflow execution path ( decision, fork and join nodes).

Workflow definitions -
Currently running workflow instances, including instance states and variables
Reference: Introduction to Oozie
Note: Oozie is a Java Web-Application that runs in a Java servlet-container - Tomcat and uses a database to store:

Consider the following two relations, A and B.


Which Pig statement combines A by its first field and B by its second field?

  • A. C = DOIN B BY a1, A by b2;
  • B. C = JOIN A by al, B by b2;
  • C. C = JOIN A a1, B b2;
  • D. C = JOIN A SO, B $1;


Answer : B

You need to perform statistical analysis in your MapReduce job and would like to call methods in the Apache Commons Math library, which is distributed as a 1.3 megabyte
Java archive (JAR) file. Which is the best way to make this library available to your
MapReducer job at runtime?

  • A. Have your system administrator copy the JAR to all nodes in the cluster and set its location in the HADOOP_CLASSPATH environment variable before you submit your job.
  • B. Have your system administrator place the JAR file on a Web server accessible to all cluster nodes and then set the HTTP_JAR_URL environment variable to its location.
  • C. When submitting the job on the command line, specify the libjars option followed by the JAR file path.
  • D. Package your code and the Apache Commands Math library into a zip file named JobJar.zip


Answer : C

Explanation: The usage of the jar command is like this,
Usage: hadoop jar <jar> [mainClass] args...
If you want the commons-math3.jar to be available for all the tasks you can do any one of these
1. Copy the jar file in $HADOOP_HOME/lib dir
or
2. Use the generic option -libjars.

Which one of the following statements is true about a Hive-managed table?

  • A. Records can only be added to the table using the Hive INSERT command.
  • B. When the table is dropped, the underlying folder in HDFS is deleted.
  • C. Hive dynamically defines the schema of the table based on the FROM clause of a SELECT query.
  • D. Hive dynamically defines the schema of the table based on the format of the underlying data.


Answer : B

Page:    1 / 22   
Total 108 questions