My Brain CellsMy Brain Cells
HomeBlogAbout

© 2026 My Brain Cells

XGitHubLinkedIn
The Ultimate Cheat Sheet: Picking the Right Model, Optimizer & LR for Every Scenario

The Ultimate Cheat Sheet: Picking the Right Model, Optimizer & LR for Every Scenario

AS
Anthony Sandesh
In supervised-learning, unsupervised-learning, time-series, deep-learning and reinforcement-learning tasks, each modeling problem brings its own “sweet-spot” of algorithms, solvers/optimizers, and hyperparameter defaults. Below is a practical guide to choosing models, optimizers (or solvers), learning‐rate heuristics and when to reach for each technique.

1. Regression problems

Scenario
Models
Solver / Optimizer
Learning Rate & Tips
Simple, low-dimensional data
Linear Regression
Closed-form (Normal eq)
no LR; just scale features
Multicollinear features
Ridge, Lasso
Coordinate descent
α (regularization) ~1e–3–1; use cross-val to pick λ
Sparse → feature selection
Lasso
Coordinate descent
increase λ to induce sparsity; monitor #nonzero coeffs
Nonlinear but interpretable
Decision Trees
Greedy splitting
max_depth ~3–10; min_samples_leaf ≥5
Better nonlinear, less over-fit
Random Forest, GBM
Tree-based (no LR)
n_estimators 100–500; learning_rate (GBM) 0.01–0.1
State-of-the-art boosting
XGBoost, LightGBM
Histogram & gradient
LR 0.01; early stopping; max_depth 4–8; subsample 0.5

2. Classification problems

Scenario
Models
Solver / Optimizer
Learning Rate & Tips
Binary, linear separable
Logistic Regression
LBFGS / liblinear
C=1.0; scale inputs; try both L1 & L2
Small data, non-parametric
K-Nearest Neighbors (KNN)
—
k ≈ √n_samples; standardize distances
Margin-based, high-dimensional
Support Vector Machines (SVM)
SMO
C=1; kernel=’rbf’; γ=1/num_features
Probabilistic clustering
Gaussian Mixture Models (GMM)
EM
n_components: elbow plot; covariance=full vs diag
Few samples, deep features
SVM with Word2Vec/GloVe embeddings
SMO
tune C; embeddings via gensim

3. Unsupervised learning & dimensionality reduction

Task
Models & Techniques
Solver / Optimizer
Tips
Dimensionality reduction
PCA
SVD
n_components ≈ 0.95 explained variance; whiten=False
Density clustering
DBSCAN
Ball-tree/k-dtree
eps ≈ 0.5×avg_dist; min_samples≈5; good for arbitrary shapes
Mixture modeling
Gaussian Mixture (GMM)
EM
use BIC/AIC to choose components
Anomaly detection
Isolation Forest, One-Class SVM
Tree-based / SMO
contamination=0.01–0.1; subsample size ≈256

4. Time-series modelling

Scenario
Models
Solver / Optimizer
Tips
Stationary, short history
ARIMA, SARIMA
Maximum likelihood
p,d,q via ACF/PACF; seasonal P,D,Q with m (season length)
Exponential trends
Exponential Smoothing
Holt-Winters
α,β,γ via grid search; test additive vs multiplicative
Nonlinear, external regressors
Gradient Boosting (GBM)
Tree-based
add lagged features; learning_rate=0.05; n_estimators=200
Deep sequence modeling
LSTM, GRU
Adam
lr=1e–3; batch_size=32; clip gradients at 1.0; sequence length tuning
Distance-based similarity
Dynamic Time Warping (DTW)
—
window ≈10% of series length; use as distance for 1-NN

5. Deep-learning architectures

Goal
Models
Optimizer
LR & Scheduling Tips
Tabular / MLP
Feedforward MLP (Multilayer Perceptron)
Adam/SGD
lr=1e–3 (Adam) or 1e–2 (SGD); weight_decay=1e–4; step LR decay
Image tasks
CNN (ResNet, custom conv stacks)
AdamW/SGD
lr=1e–3 (AdamW) or 0.1 (SGD w/momentum); cosine annealing
Sequence-to-sequence/NLP
RNN / LSTM / GRU with attention
Adam
lr=5e–4; warmup steps; then linear decay
Pretrained transformer fine-tuning
BERT / GPT / T5
AdamW
lr=2e–5–5e–5; 0 warmup → 10% of total steps
Representation learning
Autoencoder, VAE
Adam
lr=1e–3; β for VAE =1; increase β to enforce disentangling
Generative modeling
GAN, DGAN
Adam
lr_D=2e–4, lr_G=2e–4; β1=0.5, β2=0.999; train D more steps

6. Reinforcement learning

Setting
Models
Optimizer
LR & Stability Tips
Value-based, discrete actions
DQN
Adam
lr=1e–4; replay buffer=1e6; target_update=1000 steps
Policy gradient, continuous
Policy Gradient / Actor-Critic
Adam
lr=3e–4; entropy_coeff=0.01; normalize rewards
Model-based RL
World Model + MPC
Adam
lr=1e–3; plan horizon tuning
On-policy, stochastic policy
PPO, A2C
Adam
lr=2.5e–4; clip=0.2; n_steps=2048
Off-policy, continuous actions
DDPG, SAC
Adam
lr=3e–4; τ=0.005; α (entropy)≈0.2

7. Bringing it all together: model selection

Regardless of domain, the first step is to benchmark a handful of diverse models and pick the one that “just works” before diving deep into tuning. Here’s a minimal Python snippet to automate that process using scikit-learn:
How it works:
  1. Pipeline ensures consistent preprocessing (e.g. scaling).
  1. cross_val_score gives an unbiased estimate of performance.
  1. Inspect mean ± std accuracy to compare stability.
Once you’ve identified the leading candidate, dive into grid/random search for fine-tuning hyperparameters (e.g. learning rate, regularization strength, tree depth or number of layers).

Key takeaways
  • Start with simple, interpretable models.
  • Match model complexity to data size and noise.
  • Use well-known default optimizers/solvers (Adam for DL, coordinate descent for L1/L2).
  • Always benchmark multiple approaches before heavy tuning.
  • Automate model comparison with cross-validation pipelines—then optimize the winner.
Happy modeling!

More posts

A Beginner's Guide to LangChain

A Beginner's Guide to LangChain

SGLang: The Engine That's Redefining High-Performance LLM Programming

SGLang: The Engine That's Redefining High-Performance LLM Programming

Generate High-Quality Synthetic Data 📊 for ML/DL & GenAI Projects

Generate High-Quality Synthetic Data 📊 for ML/DL & GenAI Projects

System Design: Complete Guide for Interviews

Newer

System Design: Complete Guide for Interviews

Github Complete Guide

Older

Github Complete Guide

On this page

  1. 1. Regression problems
  2. 2. Classification problems
  3. 3. Unsupervised learning & dimensionality reduction
  4. 4. Time-series modelling
  5. 5. Deep-learning architectures
  6. 6. Reinforcement learning
  7. 7. Bringing it all together: model selection