- AI Nuggetz
- Posts
- Part 2: Types of Machine Learning
Part 2: Types of Machine Learning
Exploring the Learning Paradigms That Power AI Systems
In our first session, we covered the fundamentals of machine learning. Now let's explore the main approaches to machine learning, each with its own strengths and ideal use cases. Understanding these different paradigms will help you choose the right approach for your specific problems.
Supervised Learning: Learning from Examples
Supervised learning is the most common form of machine learning and likely what most people think of when they hear "AI." It's called "supervised" because the algorithm learns from labeled examples - much like a student learning from exercises where the correct answers are provided.
How Supervised Learning Works
You provide a dataset where each example has:
Features (inputs): The attributes or characteristics
Labels (outputs): The correct answers
The algorithm learns to map inputs to outputs by finding patterns in the training data
Once trained, the model can predict outputs for new, unseen inputs
Think of it like this: You're teaching a child to identify fruits. You show them many examples - "This is an apple" (round, red), "This is a banana" (long, yellow) - and eventually, they learn to recognize new fruits they've never seen before based on their characteristics.
The Mathematics Behind Supervised Learning
At its core, supervised learning is about finding a function f(x) that maps inputs x to outputs y. Mathematically:
y = f(x) + ε
Where:
x represents the input features
y represents the output (label)
f is the function we're trying to learn
ε represents error or noise
The learning process involves minimizing a loss function that measures the difference between predicted outputs and actual labels. For classification, this might be cross-entropy loss; for regression, it might be mean squared error.
Types of Supervised Learning Tasks
There are two main categories of supervised learning tasks:
Classification: Predicting a category or class
Example problems:
Is this email spam or not spam?
Which digit (0-9) is in this image?
What species of iris is this flower?
Popular algorithms:
Logistic Regression: Creates a decision boundary by estimating probabilities using a logistic function
Decision Trees: Creates a tree-like model of decisions based on feature values
Random Forests: Ensembles of decision trees trained on random subsets of features
Support Vector Machines: Finds the hyperplane that best separates classes with maximum margin
Neural Networks: Layers of interconnected nodes that learn complex patterns through backpropagation
Regression: Predicting a continuous value
Example problems:
What will this house sell for?
How many units of this product will sell next month?
What will the temperature be tomorrow?
Popular algorithms:
Linear Regression: Models the relationship as a linear equation (y = mx + b)
Polynomial Regression: Extends linear regression to capture non-linear relationships
Ridge and Lasso Regression: Add regularization to prevent overfitting
Decision Tree/Random Forest Regression: Create tree-like models that predict continuous values
Neural Networks: Can approximate any continuous function with sufficient complexity
The Supervised Learning Process in Detail
Data Collection: Gather relevant labeled examples representative of your problem
Data Preparation:
Clean data (handle missing values, outliers)
Split into training (70-80%), validation (10-15%), and test sets (10-15%)
Normalize or standardize numerical features (often to mean 0, standard deviation 1)
Encode categorical features (one-hot encoding, label encoding)
Model Selection: Choose an algorithm based on problem type, data size, interpretability needs
Training:
Algorithm learns parameters that minimize error on training data
For neural networks, this means adjusting weights through backpropagation
For decision trees, this means finding optimal splits of the data
Validation:
Tune hyperparameters (settings that aren't learned during training)
Use techniques like cross-validation to ensure robustness
Testing: Evaluate final performance on previously unseen test data
Deployment and Monitoring: Use the model and track performance over time
The most important concept in supervised learning is the balance between underfitting (model is too simple) and overfitting (model memorizes training data but doesn't generalize). We use techniques like regularization, cross-validation, and appropriate model complexity to find the right balance.
Unsupervised Learning: Finding Hidden Patterns
While supervised learning works with labeled data, unsupervised learning tackles a more open-ended challenge: finding structure in data without explicit guidance. It's like giving a child a pile of toys and watching how they naturally sort them - by color, size, or function - without telling them how.
The Theory Behind Unsupervised Learning
Unsupervised learning algorithms try to model the underlying structure or distribution of data to learn more about it. Unlike supervised learning, there's no straightforward way to evaluate success - it depends on how useful the discovered patterns are for your application.
Mathematically, unsupervised learning often involves:
Finding clusters that minimize within-cluster variance
Identifying lower-dimensional representations that preserve important relationships
Modeling the probability distribution that generated the data
Types of Unsupervised Learning Tasks
Clustering: Grouping similar data points together
Mathematical foundations:
Distance metrics (Euclidean, Manhattan, cosine similarity)
Similarity and dissimilarity measures
Centroid-based vs. density-based approaches
Popular algorithms:
K-Means: Iteratively assigns points to the nearest centroid, then updates centroids
Hierarchical Clustering: Builds a tree of clusters by iteratively merging or splitting groups
DBSCAN: Defines clusters as dense regions separated by sparse regions, based on a radius parameter ε and minimum points parameter
Dimensionality Reduction: Simplifying data while preserving important information
Mathematical foundations:
Linear vs. non-linear transformations
Variance preservation
Information theory concepts (mutual information, entropy)
Popular algorithms:
Principal Component Analysis (PCA): Projects data onto orthogonal axes of maximum variance
t-SNE: Converts high-dimensional similarities to low-dimensional distances using probability distributions
Autoencoders: Neural networks that compress data through a bottleneck layer, forcing it to learn efficient representations
Association Rule Learning: Discovering relationships between variables
Mathematical foundations:
Support (frequency of itemsets)
Confidence (conditional probability)
Lift (ratio of observed support to expected support)
Popular algorithm:
Apriori algorithm: Uses a breadth-first search strategy to find frequent itemsets
Evaluation in Unsupervised Learning
Since there are no labels to compare against, evaluation is challenging and often domain-specific:
Clustering metrics:
Silhouette score: Measures how similar points are to their own cluster compared to other clusters (-1 to 1, higher is better)
Davies-Bouldin index: Ratio of within-cluster distances to between-cluster distances (lower is better)
Visual inspection: Often crucial for confirming meaningful clusters
Dimensionality reduction:
Reconstruction error: How well original data can be recovered
Downstream task performance: How useful the reduced representation is for classification/regression
Visualization quality: For 2D/3D projections, how well separated different classes appear
The interpretation of results often requires domain expertise to determine if the discovered patterns are meaningful or just statistical artifacts.
Reinforcement Learning: Learning from Experience
Reinforcement learning (RL) is fundamentally different from both supervised and unsupervised learning. Instead of learning from static data, RL agents learn by interacting with an environment and receiving feedback on their actions. It's like teaching a dog new tricks through treats and praise.
The Mathematical Framework of Reinforcement Learning
RL is typically formalized as a Markov Decision Process (MDP) with:
A set of states S
A set of actions A
A transition function P(s'|s,a) defining the probability of moving from state s to s' when taking action a
A reward function R(s,a,s') giving the immediate reward
A discount factor γ ∈ [0,1] controlling the importance of future rewards
The goal is to learn a policy π(a|s) that maximizes expected cumulative reward:
E[∑γᵗR(sₜ,aₜ,sₜ₊₁)]
This leads to two important concepts:
Value function: V(s) - The expected return starting from state s
Q-function: Q(s,a) - The expected return starting from state s, taking action a
Core RL Algorithms and Approaches
Value-Based Methods:
Learn the value of states or state-action pairs
Examples: Q-learning, Deep Q-Networks (DQN)
Mathematically update estimates based on observed rewards and transitions using the Bellman equation
Policy-Based Methods:
Directly learn the policy without an intermediate value function
Examples: REINFORCE, Proximal Policy Optimization (PPO)
Often use gradient ascent to maximize expected rewards
Actor-Critic Methods:
Combine value-based and policy-based approaches
Actor (policy) determines actions; Critic (value function) evaluates those actions
Examples: Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG)
Model-Based Methods:
Learn a model of the environment dynamics
Use the model for planning or imagined experiences
Examples: AlphaZero, MuZero
The Exploration-Exploitation Dilemma
A fundamental challenge in RL is balancing:
Exploration: Trying new actions to discover better strategies
Exploitation: Using known good actions to maximize reward
Common strategies include:
ε-greedy: Take random actions with probability ε, best-known actions otherwise
Boltzmann exploration: Sample actions according to their estimated values
Upper Confidence Bound (UCB): Favor actions with high potential value
Applications of Reinforcement Learning
RL has shown remarkable success in several domains:
Gaming: DeepMind's AlphaGo defeated world champions in Go, a game with more possible board positions than atoms in the universe
Robotics: Teaching robots to walk, grasp objects, or navigate complex environments
Resource management: Optimizing data center cooling, power grid management
Recommendation systems: Balancing exploration (suggesting new content) with exploitation (recommending what users likely enjoy)
Autonomous vehicles: Learning driving policies through simulation
Specialized Learning Paradigms
Beyond the three main types, several specialized approaches combine elements or address specific scenarios:
Semi-Supervised Learning
This approach uses a small amount of labeled data with a large amount of unlabeled data. It's practical when labeling is expensive or time-consuming.
Theoretical foundations:
Smoothness assumption: Points close to each other likely have similar labels
Cluster assumption: Points in the same cluster likely have the same label
Manifold assumption: Data lies on a lower-dimensional manifold within the feature space
Common approaches:
Self-training: Model trained on labeled data makes predictions on unlabeled data, then retrains using high-confidence predictions as additional labeled examples
Co-training: Multiple models trained on different views of the data teach each other
Graph-based methods: Propagate labels through a similarity graph of examples
Self-Supervised Learning
A clever approach where the data provides its own supervision:
Key concept: Create a pretext task from unlabeled data by hiding part of the input and training the model to predict it
Examples in natural language processing:
Next word prediction: "The cat sat on the ___" (mat)
Masked language modeling: "The ___ sat on the mat" (cat)
Text rotation: Predict if a sequence of words is in the correct order
Examples in computer vision:
Image rotation: Predict how much an image was rotated
Jigsaw puzzles: Rearrange shuffled patches of an image
Colorization: Predict colors from grayscale images
Self-supervised learning often produces representations that capture meaningful semantic properties, making them useful for downstream tasks with minimal fine-tuning.
Transfer Learning
This technique reuses knowledge gained from solving one problem to help solve a different but related problem:
Types of transfer learning:
Domain adaptation: Same task, different domains (e.g., sentiment analysis for electronics reviews vs. restaurant reviews)
Task adaptation: Same domain, different tasks (e.g., image classification vs. object detection on similar images)
Multi-task learning: Learning multiple related tasks simultaneously
Common approaches:
Feature extraction: Reuse pre-trained model's representations without modifying them
Fine-tuning: Start with pre-trained weights and update them on new data
Few-shot learning: Adapt a model to new classes with very few examples
Mathematical basis: Transfer learning exploits the fact that many tasks share lower-level features, requiring only higher-level representations to be specialized. This principle has revolutionized fields like NLP and computer vision.
Thought Exercise: Understanding ML in Everyday Applications
Rather than a technical coding exercise, let's engage in a thought exercise that requires no prior programming knowledge:
Scenario: Imagine an email spam filter using machine learning.
Questions to consider:
What type of machine learning would this be? (Supervised, unsupervised, or reinforcement?)
What features (inputs) might be useful for the model to consider? Think about:
Words or phrases in the email
Sender information
Time of day
Email structure elements
How would you collect labeled data to train this system? What challenges might arise?
What would happen if:
You trained your model only on business emails?
Spammers changed their tactics?
You accidentally labeled some legitimate emails as spam?
How might you evaluate if your spam filter is performing well?
Reflection: This exercise demonstrates the practical application of supervised learning concepts without requiring coding knowledge. It shows how machine learning systems in our daily lives make decisions based on patterns in data, and the importance of representative training data.
Key Takeaways
Supervised learning requires labeled data and excels at making specific predictions
Unsupervised learning discovers hidden structure in data without predetermined categories
Reinforcement learning learns optimal behavior through interaction and feedback
Semi-supervised and self-supervised learning leverage limited or implicit supervision
Transfer learning reuses knowledge across tasks, dramatically improving data efficiency
The machine learning approach you choose depends on:
The nature of your problem (prediction, pattern discovery, sequential decision-making)
The data available (labeled, unlabeled, interactive environment)
Your goal (accuracy, interpretability, or novel insights)
With a solid understanding of these learning paradigms, you're ready to explore the exciting world of generative AI in our next session, where we'll see how models can create entirely new content like text, images, and more.
Discussion Question: Think about a smartphone app or website you use regularly. How might it be using machine learning? What type of learning would be most appropriate for its features? What data do you think it collects to make its predictions or recommendations?