ANN Building Blocks part 1

Bengt Sennblad, NBIS

Biological Neurons

Neuron

Algorithm

  • Multiple input (on/off):
    • from 1-several neurons
  • Processing:
    • Combination: of inputs
    • Activation: on or off state
  • Single output (on/off): to 1-several neurons

Artificial neurons

Neuron

Algorithm

  • Multiple input:
    • from 1-several neurons
  • Processing:
    • Combination: of inputs – linear model
    • Activation: activation function
  • Single output: to 1-several neurons



Weighted linear combination of input:

\[ \begin{eqnarray*} z_j &=& \sum_{i} w_{i,j} a'_{i} + b_j\\ \textrm{weights}&& w_{i,j}\\ \textrm{bias}&& b_j \end{eqnarray*} \]

Activation function:
  • e.g., the Sigmoid (logistic) activation function
\[a_j = \sigma(z_j)\]

The Sigmoid Neuron




Weighted linear combination of input:
  • \(z_j = \sum_{i} w_{i,j} a'_{i} + b_j\)
Sigmoid/logistic activation function
  • \(a_j=\sigma(z_j) = \frac{1}{1+e^{-z_j}}\)
Compare with logistic GLM
  • Weighted linear combination of input:
    • \(z = \sum_{i} \beta_{i} x_{i} + \alpha\)
  • Sigmoid/logistic link function
    • \(Pr[y=1|x] = p = \sigma(z) = \frac{1}{1+e^{-z}}\)

… or equivalently

\(\begin{eqnarray} \sigma^{-1}(p)&=&\log\left(\frac{p}{1-p}\right) =\\ logit(p) &=& \sum_{i}\beta_{i} x_{i} + \alpha \end{eqnarray}\)

Example

Neuron

Let inputs be:
\(\begin{eqnarray} a'_1&=&1\\ a'_2&=&0\\ a'_3&=&1 \end{eqnarray}\)


and we have
\(\begin{eqnarray} z_1 &=& \sum_i w_{i,1}a'_i + b_1\\ a_1 &=& \sigma(z_1) \end{eqnarray}\)


\(z_1 = 0.3 \times 1 + 0.8 \times 0 + 0.2 \times 1 - 0.5 =\) \(0\)

\(a_1 = \sigma(z_1) = \frac{1}{1+e^{-0}} =\) \(0.5\)

So, if a sigmoid artificial neuron is just another way of doing logistic regression?

… then what’s all the fuss about?

The fuss happens when you connect several neurons into a network

Neuron

Feed-forward artifical neural networks (ffANN)

Layers

  • “Columns” of 1-many neurons

  • A single Input layer

    • Input neurons receives data input and passes it to next layer
  • 1-many Hidden layer(s)

    • Articial neurons process their input and deliver output to next layer
  • A single Output layer

    • Artifical neurons process their input and deliver final output \(\hat{y}\)
      • output \(\hat{y}_j = a_j\)
      • Continuous \(\hat{y}\): Regression
      • Discrete \(\hat{y}\): Classification

Connectivity between layers

  • ffANN are fully connected (“dense layers”)
    • each neuron in a layer is connected to each neurons in next layer

ANN1x

ffANN examples

ANN1

Other drawing style, omitting \(w\) and \(b\).

ANN1alt

ANN2

Often layers are ‘boxed’

ANN2alt

ffANN examples

layers w >1 dimension (e.g., images) – (messy!) Neuron

Simplify! nodes and arrows implicitNeuron

Collect similar layers into ‘blocks’ Neuron

ffANN examples

Also other type of layers/blocks (cf. coming lectures) Neuron

Hidden Layers

Inutitive function of hidden layers?

  • Each layer can be viewed as transforming the original data to a new multi-dimensional space.
  • A hidden layer should, in practice, have at least two neurons to be meaningful
    • Single neuron layer collapses information and forms a bottleneck
    • A bottleneck early heavily constrains the NN

Depth of ANN

  • number of hidden layers + output layers

Deep Learning

  • Formally, ANNs with depth > 1
    • (often include more advanced layers as well)

Why deep Learning?

For Regression

  • Single layer \(\approx\) logistic regression

Why Deep Learning?

For Regression

  • Single layer \(\approx\) logistic regression
  • More layers \(\rightarrow\)
    • more complex, non-linear, models

Why deep Learning?

For Regression

  • Single layer \(\approx\) logistic regression
  • More layers \(\rightarrow\)
    • more complex, non-linear, models



For classification

  • Single layer \(\approx\) one hyper-plane

Why deep Learning?

For Regression

  • Single layer \(\approx\) logistic regression
  • More layers \(\rightarrow\)
    • more complex, non-linear, models



For classification

  • Single layer \(\approx\) one hyper-plane

Why deep Learning?

For Regression

  • Single layer \(\approx\) logistic regression
  • More layers \(\rightarrow\)
    • more complex, non-linear, models



For classification

  • Single layer \(\approx\) one hyper-plane
  • Adding layers \(\rightarrow\)
    • more hyper planes \(\rightarrow\)
    • more advanced classification

Mini exercise

  • http://playground.tensorflow.org/
    • Try different input “problems”
    • Investigate how different depth affect classification
      • number of hidden layers
      • number of neurons in layer
    • Run for several epochs (=iterations)