Professional Machine Learning Engineer v1.0 (Professional Machine Learning Engineer)

Page:    1 / 11   
Total 153 questions

You are using transfer learning to train an image classifier based on a pre-trained EfficientNet model. Your training dataset has 20,000 images. You plan to retrain the model once per day. You need to minimize the cost of infrastructure. What platform components and configuration environment should you use?

  • A. A Deep Learning VM with 4 V100 GPUs and local storage.
  • B. A Deep Learning VM with 4 V100 GPUs and Cloud Storage.
  • C. A Google Kubernetes Engine cluster with a V100 GPU Node Pool and an NFS Server
  • D. An AI Platform Training job using a custom scale tier with 4 V100 GPUs and Cloud Storage


Answer : C

While conducting an exploratory analysis of a dataset, you discover that categorical feature A has substantial predictive power, but it is sometimes missing. What should you do?

  • A. Drop feature A if more than 15% of values are missing. Otherwise, use feature A as-is.
  • B. Compute the mode of feature A and then use it to replace the missing values in feature A.
  • C. Replace the missing values with the values of the feature with the highest Pearson correlation with feature A.
  • D. Add an additional class to categorical feature A for missing values. Create a new binary feature that indicates whether feature A is missing.


Answer : A

You work for a large retailer and have been asked to segment your customers by their purchasing habits. The purchase history of all customers has been uploaded to BigQuery. You suspect that there may be several distinct customer segments, however you are unsure of how many, and you don’t yet understand the commonalities in their behavior. You want to find the most efficient solution. What should you do?

  • A. Create a k-means clustering model using BigQuery ML. Allow BigQuery to automatically optimize the number of clusters.
  • B. Create a new dataset in Dataprep that references your BigQuery table. Use Dataprep to identify similarities within each column.
  • C. Use the Data Labeling Service to label each customer record in BigQuery. Train a model on your labeled data using AutoML Tables. Review the evaluation metrics to understand whether there is an underlying pattern in the data.
  • D. Get a list of the customer segments from your company’s Marketing team. Use the Data Labeling Service to label each customer record in BigQuery according to the list. Analyze the distribution of labels in your dataset using Data Studio.


Answer : B

You recently designed and built a custom neural network that uses critical dependencies specific to your organization’s framework. You need to train the model using a managed training service on Google Cloud. However, the ML framework and related dependencies are not supported by AI Platform Training. Also, both your model and your data are too large to fit in memory on a single machine. Your ML framework of choice uses the scheduler, workers, and servers distribution structure. What should you do?

  • A. Use a built-in model available on AI Platform Training.
  • B. Build your custom container to run jobs on AI Platform Training.
  • C. Build your custom containers to run distributed training jobs on AI Platform Training.
  • D. Reconfigure your code to a ML framework with dependencies that are supported by AI Platform Training.


Answer : D

While monitoring your model training’s GPU utilization, you discover that you have a native synchronous implementation. The training data is split into multiple files. You want to reduce the execution time of your input pipeline. What should you do?

  • A. Increase the CPU load
  • B. Add caching to the pipeline
  • C. Increase the network bandwidth
  • D. Add parallel interleave to the pipeline


Answer : A

Your data science team is training a PyTorch model for image classification based on a pre-trained RestNet model. You need to perform hyperparameter tuning to optimize for several parameters. What should you do?

  • A. Convert the model to a Keras model, and run a Keras Tuner job.
  • B. Run a hyperparameter tuning job on AI Platform using custom containers.
  • C. Create a Kuberflow Pipelines instance, and run a hyperparameter tuning job on Katib.
  • D. Convert the model to a TensorFlow model, and run a hyperparameter tuning job on AI Platform.


Answer : C

You have a large corpus of written support cases that can be classified into 3 separate categories: Technical Support, Billing Support, or Other Issues. You need to quickly build, test, and deploy a service that will automatically classify future written requests into one of the categories. How should you configure the pipeline?

  • A. Use the Cloud Natural Language API to obtain metadata to classify the incoming cases.
  • B. Use AutoML Natural Language to build and test a classifier. Deploy the model as a REST API.
  • C. Use BigQuery ML to build and test a logistic regression model to classify incoming requests. Use BigQuery ML to perform inference.
  • D. Create a TensorFlow model using Google’s BERT pre-trained model. Build and test a classifier, and deploy the model using Vertex AI.


Answer : B

You need to quickly build and train a model to predict the sentiment of customer reviews with custom categories without writing code. You do not have enough data to train a model from scratch. The resulting model should have high predictive performance. Which service should you use?

  • A. AutoML Natural Language
  • B. Cloud Natural Language API
  • C. AI Hub pre-made Jupyter Notebooks
  • D. AI Platform Training built-in algorithms


Answer : A

You need to build an ML model for a social media application to predict whether a user’s submitted profile photo meets the requirements. The application will inform the user if the picture meets the requirements. How should you build a model to ensure that the application does not falsely accept a non-compliant picture?

  • A. Use AutoML to optimize the model’s recall in order to minimize false negatives.
  • B. Use AutoML to optimize the model’s F1 score in order to balance the accuracy of false positives and false negatives.
  • C. Use Vertex AI Workbench user-managed notebooks to build a custom model that has three times as many examples of pictures that meet the profile photo requirements.
  • D. Use Vertex AI Workbench user-managed notebooks to build a custom model that has three times as many examples of pictures that do not meet the profile photo requirements.


Answer : C

You lead a data science team at a large international corporation. Most of the models your team trains are large-scale models using high-level TensorFlow APIs on AI Platform with GPUs. Your team usually takes a few weeks or months to iterate on a new version of a model. You were recently asked to review your team’s spending. How should you reduce your Google Cloud compute costs without impacting the model’s performance?

  • A. Use AI Platform to run distributed training jobs with checkpoints.
  • B. Use AI Platform to run distributed training jobs without checkpoints.
  • C. Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs with checkpoints.
  • D. Migrate to training with Kuberflow on Google Kubernetes Engine, and use preemptible VMs without checkpoints.


Answer : D

You need to train a regression model based on a dataset containing 50,000 records that is stored in BigQuery. The data includes a total of 20 categorical and numerical features with a target variable that can include negative values. You need to minimize effort and training time while maximizing model performance. What approach should you take to train this regression model?

  • A. Create a custom TensorFlow DNN model
  • B. Use BQML XGBoost regression to train the model.
  • C. Use AutoML Tables to train the model without early stopping.
  • D. Use AutoML Tables to train the model with RMSLE as the optimization objective.


Answer : A

You are building a linear model with over 100 input features, all with values between –1 and 1. You suspect that many features are non-informative. You want to remove the non-informative features from your model while keeping the informative ones in their original form. Which technique should you use?

  • A. Use principal component analysis (PCA) to eliminate the least informative features.
  • B. Use L1 regularization to reduce the coefficients of uninformative features to 0.
  • C. After building your model, use Shapley values to determine which features are the most informative.
  • D. Use an iterative dropout technique to identify which features do not degrade the model when removed.


Answer : B

You work for a global footwear retailer and need to predict when an item will be out of stock based on historical inventory data Customer behavior is highly dynamic since footwear demand is influenced by many different factors. You want to serve models that are trained on all available data, but track your performance on specific subsets of data before pushing to production. What is the most streamlined and reliable way to perform this validation?

  • A. Use then TFX ModelValidator tools to specify performance metrics for production readiness.
  • B. Use k-fold cross-validation as a validation strategy to ensure that your model is ready for production.
  • C. Use the last relevant week of data as a validation set to ensure that your model is performing accurately on current data.
  • D. Use the entire dataset and treat the area under the receiver operating characteristics curve (AUC ROC) as the main metric.


Answer : B

You have deployed a model on Vertex AI for real-time inference. During an online prediction request, you get an “Out of Memory” error. What should you do?

  • A. Use batch prediction mode instead of online mode.
  • B. Send the request again with a smaller batch of instances.
  • C. Use base64 to encode your data before using it for prediction.
  • D. Apply for a quota increase for the number of prediction requests.


Answer : C

You work at a subscription-based company. You have trained an ensemble of trees and neural networks to predict customer churn, which is the likelihood that customers will not renew their yearly subscription. The average prediction is a 15% churn rate, but for a particular customer the model predicts that they are 70% likely to churn. The customer has a product usage history of 30%, is located in New York City, and became a customer in 1997. You need to explain the difference between the actual prediction, a 70% churn rate, and the average prediction. You want to use Vertex Explainable AI. What should you do?

  • A. Train local surrogate models to explain individual predictions.
  • B. Configure sampled Shapley explanations on Vertex Explainable AI.
  • C. Configure integrated gradients explanations on Vertex Explainable AI.
  • D. Measure the effect of each feature as the weight of the feature multiplied by the feature value.


Answer : A

Page:    1 / 11   
Total 153 questions