Implement neural network
Philosophy
- Many variables determine the final result, but we don't know how many (what degree polynomials). Overfitting? Underfitting?
- Can we have a model that automatically generalizes and chooses the best fit for our data, which like the human brain, is not bound by the types of functions available
- Hidden layers represent some hidden variables that cannot be measurement or observed. The final output is decided by the the previous layer, but this layer summarize the knowledge from the previous layers.
The structure of neural network
- This is a 3 layers neural network since the input layer is not counted
- E.g. A1 is the activation (or value ) of the first neuron in the first hidden layer.
- The weights are organized in the form of a matrix of shape (no of units in current layer, no of units in previous layer), The biases are organized in the shape of (no of units in current layer, 1).
E.g. W1.shape =(2,3), B1.shape=(2,1), is the wight from A1 to X2.
A1 = g(f(x1,x2,x3)) where f
is a linear function using corresponding wight and bias, and g is an activation function,
Activation functions
and are the parameters we want to learn
Implementation
Goal
Minimize MSE (below) to lear
where
Backpropagation - Find the gradients of the cost with respect to each of the wights and biases to tune
Optimizer to achieve the goal -- Gradient descent
Terminology
Layers of a network11
Activation functions
Why
Sigmoid
Disadvantage: