Neural network notes

can compute an NAND gate, since NAND gates are universal computation building blocks
can compute any computation
learning algorithms tune weights and biases
logistic function = 1 / (1 + e^(-z)), z = dot_product(w, x) + b
feedforward nets, no loops
recurrent nets, feedback loops with neurons active for limited duration
objective function, commonly is mean squared error
gradient descent used to solve minimization function
softmax and log likelihood => outputs probability distribution
validation data helps find good hyperparameters
regularization, so network prefers to learn smaller weights, helps prevent overfitting
regularized networks generalize better
dropout, disable portion of neurons during run, equivalent to running multiple networks
hyperparameters - learning rate, L2 reg, mini batch size
grid search, automatic hyperparameter optimization
deep learning
- convolution layers
- dropout layers for regularization
- rectified linear units instead of sigmoid
- uses GPU
- generate more training samples to improve generality
deep belief network - generative model, runs the network backwards, generating inputs