3.1 Frame business problems as machine learning problems.
● Determine when to use/when not to use ML
● Know the difference between supervised and unsupervised learning
● Selecting from among classification, regression, forecasting, clustering,
recommendation, etc.
3.2 Select the appropriate model(s) for a given machine learning problem.
● Xgboost, logistic regression, K-means, linear regression, decision trees, random
forests, RNN, CNN, Ensemble, Transfer learning
● Express intuition behind models
3.3 Train machine learning models.
● Train validation test split, cross-validation
● Optimizer, gradient descent, loss functions, local minima, convergence, batches,
probability, etc.
● Compute choice (GPU vs. CPU, distributed vs. non-distributed, platform [Spark vs.
non-Spark])
● Model updates and retraining
○ Batch vs. real-time/online
3.4 Perform hyperparameter optimization.
● Regularization
○ Drop out
○ L1/L2
● Cross validation
● Model initialization
● Neural network architecture (layers/nodes), learning rate, activation functions
● Tree-based models (# of trees, # of levels)
● Linear models (learning rate)
3.5 Evaluate machine learning models.
● Avoid overfitting/underfitting (detect and handle bias and variance)
● Metrics (AUC-ROC, accuracy, precision, recall, RMSE, F1 score)
● Confusion matrix
● Offline and online model evaluation, A/B testing
● Compare models using metrics (time to train a model, quality of model, engineering
costs)
● Cross validation