ML: Neural Networks and Deep Learning

Introduction to Neural Networks

If you’ve ever wondered how machines attempt to mimic the human brain—without the crippling existential dread—welcome to neural networks. The idea is simple: neurons connected in layers, processing information, making decisions, and occasionally making us question reality.

Definition and History of Neural Networks

Neural networks have been around longer than you’d think—since the 1940s, actually. They went from an obscure academic concept to powering everything from recommendation systems to deepfake horrors. Frank Rosenblatt’s Perceptron was the OG model, but the real breakthrough came when backpropagation was discovered, allowing neural networks to actually learn something useful.

Biological Inspiration and Artificial Neuron Models

Biological neurons are messy, slow, and suffer from emotional instability. Artificial neurons, on the other hand, are simple mathematical functions that take inputs, apply weights, and spit out an output. The most famous model? The perceptron. Think of it as the dumbest possible decision-maker that either fires (1) or doesn’t (0).

Perceptron and Multilayer Perceptron (MLP)

A perceptron is great—if you only need to classify things like “cat or no cat.” But what if you need something more nuanced, like distinguishing between a cat, a dog, and your existential crisis? Enter the multilayer perceptron (MLP), which stacks multiple layers of neurons and learns increasingly complex features.

Forward and Backward Propagation

Forward propagation is when data moves from input to output, blissfully unaware of how wrong it is. Backpropagation, however, is when the network realizes its mistakes and adjusts weights using the gradient descent method. It’s essentially an AI version of regret.

Building Deep Neural Networks with TensorFlow & PyTorch

Deep learning frameworks like TensorFlow and PyTorch exist to make your life easier—or at least keep you from writing matrix operations by hand.

Understanding Layers, Weights, and Biases

A neural network consists of layers (input, hidden, output), weights (determining importance), and biases (helping the network shift decision boundaries). Think of weights as your effort in a relationship and bias as your unfair advantage.

Implementing a Feedforward Neural Network in TensorFlow

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(16, activation='relu', input_shape=(10,)),
    layers.Dense(8, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

Implementing a Feedforward Neural Network in PyTorch

import torch
import torch.nn as nn

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(10, 16)
        self.fc2 = nn.Linear(16, 8)
        self.fc3 = nn.Linear(8, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.sigmoid(self.fc3(x))
        return x

model = SimpleNN()
print(model)

Handling Overfitting with Dropout and Batch Normalization

Deep networks love to overfit—memorizing training data instead of generalizing. Dropout randomly kills neurons during training (savage but effective), while batch normalization stabilizes and accelerates training.

Optimizers: SGD, Adam, RMSprop

Choosing the right optimizer is like picking the right weapon in a video game—each has strengths, weaknesses, and a tendency to ruin your life if misused.

Understanding Gradient Descent and Stochastic Gradient Descent (SGD)

Gradient descent is how networks learn: they adjust weights to minimize error. Stochastic gradient descent (SGD) speeds things up by using random subsets of data, making training noisier but often more effective.

Adaptive Optimization Methods: Adam and RMSprop

Adam (Adaptive Moment Estimation) is the overachiever of optimizers—fast, adaptive, and usually reliable. RMSprop (Root Mean Square Propagation) smooths learning by normalizing updates. Choose wisely.

Choosing the Right Optimizer for Different Problems

  • Use SGD if you enjoy slow, painful convergence but want a theoretically sound approach.
  • Use Adam if you prefer faster convergence with reasonable accuracy.
  • Use RMSprop if you’re dealing with recurrent neural networks (RNNs) or other erratic models.

Activation Functions: ReLU, Sigmoid, Softmax

Activation functions prevent your network from being a boring linear classifier.

Importance of Activation Functions in Deep Learning

Without activation functions, your network is just a stack of linear equations pretending to be smart. They introduce non-linearity, which is essential for learning complex patterns.

Rectified Linear Unit (ReLU) and Its Variants

ReLU is the go-to function—fast, simple, and effective. The downside? It dies (outputs zero) when neurons go negative. Leaky ReLU fixes this by allowing a small negative slope.

Sigmoid and Hyperbolic Tangent (Tanh) Functions

Sigmoid squashes inputs between 0 and 1, making it great for probabilities but terrible for deep networks (hello, vanishing gradients). Tanh is like sigmoid but squashes between -1 and 1.

Softmax for Multi-Class Classification

Softmax is your go-to for multi-class classification, turning raw scores into probabilities. Just don’t use it for binary problems—sigmoid exists for that.

Hands-On Exercises

Exercise 1: Building a Simple Neural Network

Implement a basic neural network using TensorFlow and PyTorch (code above). Train and evaluate it on dummy data.

Exercise 2: Comparing Optimizers

Train the same model using different optimizers and compare their performance.

optimizers = {
    "SGD": optim.SGD(model.parameters(), lr=0.01),
    "Adam": optim.Adam(model.parameters(), lr=0.01),
    "RMSprop": optim.RMSprop(model.parameters(), lr=0.01)
}

Summary

  • Neural networks mimic the human brain (but with fewer existential crises).
  • TensorFlow and PyTorch make deep learning easier.
  • Optimizers determine how well your network learns.
  • Activation functions make your network capable of learning complex patterns.

References

  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
  • Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media.
  • TensorFlow Docs: https://www.tensorflow.org/
  • PyTorch Docs: https://pytorch.org/