Sigmoid Function Explained: Definition, Formula, Graph, and Applications in Machine Learning
Giovanni Romerogiovanniromero.dev
Comments (0)
Views (44)

Sigmoid Function Explained: Definition, Formula, Graph, and Applications in Machine Learning

The sigmoid function is one of the most classic and important mathematical functions in machine learning, especially in logistic regression and binary classification. Although modern deep learning frequently uses ReLU and other activations, understanding the sigmoid is essential for anyone learning neural networks or probability-based models.

In this guide, we cover what the sigmoid function is, why it matters, how it works mathematically, its advantages and disadvantages, and its use in real code examples.


What Is the Sigmoid Function?

The sigmoid function (also known as the logistic function) transforms any real number into a value between 0 and 1.
This makes it ideal for probability estimation.

Mathematically, it is defined as:

σ(x)=11+ex\sigma(x) = \frac{1}{1 + e^{-x}}

Intuition Behind the Sigmoid Function

The sigmoid function has an S-shaped curve, meaning:

  • Large positive values of (x) push the output close to 1
  • Large negative values push the output close to 0
  • At (x = 0), the output is exactly 0.5

This behavior makes sigmoid perfect for answering questions like:

"How likely is the input to be 1 instead of 0?"


Shape of the Sigmoid Function

The curve smoothly transitions between 0 and 1:

  • Output range: (0, 1)
  • Center point: 0.5
  • Smooth gradient everywhere (but small near extremes)

Mathematical Properties

1. Derivative

The derivative of the sigmoid is elegant and extremely useful:

σ(x)=σ(x)(1σ(x))\sigma'(x) = \sigma(x)(1 - \sigma(x))

This simplifies gradient-based optimization such as backpropagation.

2. Limits

limxσ(x)=1\lim_{x \to \infty} \sigma(x) = 1 limxσ(x)=0\lim_{x \to -\infty} \sigma(x) = 0

Why Is the Sigmoid Function Used in Machine Learning?

1. Models probabilities

Sigmoid converts raw model outputs into normalized probabilities.

2. Smooth gradients

Its continuous derivative enables effective gradient descent.

3. Used in binary classification

In logistic regression, the model predicts:

P(y=1x)=σ(wx+b)P(y=1|x) = \sigma(w \cdot x + b)

4. Appears in neural networks

Common in:

  • Output layers for binary classification
  • Some recurrent architectures (e.g., LSTMs use a variant)

Limitations of the Sigmoid Function

Even though it's widely used, sigmoid has some downsides:

Vanishing gradients

For very large or small inputs, gradients become tiny, slowing learning.

Outputs not zero-centered

Values are always positive, causing inefficient gradient behavior.

Not ideal for deep hidden layers

ReLU and variants often perform better.


Sigmoid Function in Code

Python (NumPy)

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

print(sigmoid(0))      # 0.5
print(sigmoid(5))      # ~0.993
print(sigmoid(-5))     # ~0.0067

TensorFlow / Keras

import tensorflow as tf

x = tf.constant([0.0, 2.0, -2.0])
tf.nn.sigmoid(x)

PyTorch

import torch
import torch.nn as nn

sigmoid = nn.Sigmoid()
sigmoid(torch.tensor([0.0, 3.0, -3.0]))

When Should You Use Sigmoid?

Use sigmoid when:

  • You need probability outputs
  • You're solving a binary classification problem
  • You're modeling yes/no decisions
  • You need a smooth "soft threshold"

Do not use sigmoid:

  • In deep hidden layers
  • When vanishing gradients become a problem
  • For multi-class classification (use softmax instead)

FAQ About the Sigmoid Function

Is sigmoid the same as logistic regression?

Not exactly. Logistic regression uses sigmoid as its probability output.

Why is sigmoid used at the output layer?

Because it maps values to the range (0, 1), ideal for probability.

Is sigmoid still used in modern neural networks?

Yes, but mostly in:

  • Binary output layers\
  • Certain recurrent architectures (gates in LSTMs)

Conclusion

The sigmoid function remains a foundational concept in machine learning. Its ability to map real numbers to the range (0, 1) makes it indispensable for probability modeling, logistic regression, and binary classification.

Even though newer activation functions dominate deep learning, mastering sigmoid is critical for understanding how neural networks and probabilistic models operate.

If you're learning AI, this is one function you absolutely must understand.


Tags:

aideep learningmachine learningsigmoid function

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *