#### Intro

- machine learning - applied statistics to estimate functions.
- hyperparameters - parameters set outside the learning algorithm.
- components of ML: optimization algorithm, cost function, model, dataset

#### Neural Networks

- can compute an NAND gate, since NAND gates are universal computation building blocks
- can compute any computation
- learning algorithms tune weights and biases
- logistic function = 1 / (1 + e^(-z)), z = dot_product(w, x) + b
- feedforward nets, no loops
- recurrent nets, feedback loops with neurons active for limited duration
- objective function, commonly is mean squared error
- gradient descent used to solve minimization function
- softmax and log likelihood => outputs probability distribution
- validation data helps find good hyperparameters
- regularization, so network prefers to learn smaller weights, helps prevent overfitting
- regularized networks generalize better
- dropout, disable portion of neurons during run, equivalent to running multiple networks
- hyperparameters - learning rate, L2 reg, mini batch size
- grid search, automatic hyperparameter optimization
- deep learning
- convolution layers
- dropout layers for regularization
- rectified linear units instead of sigmoid
- uses GPU
- generate more training samples to improve generality

- deep belief network - generative model, runs the network backwards, generating inputs