In deep learning, logits refer to the raw output values produced by the final layer of a model before applying any normalization, such as Softmax. Understanding logits is crucial as they serve as the foundation for transforming model outputs into probabilities and making predictions.
This article covers:
Definition and Purpose of Logits
Relationship Between Logits and Softmax
Practical Example: From Logits to Probability Distributions
Applications and Considerations of Logits
Key Takeaways and Insights
1. Definition and Purpose of Logits
What Are Logits?
Logits are the raw scores output by the model's final layer (often a dense layer). They represent the unnormalized confidence of the model for each class. Logits are not probabilities—they can be positive, negative, or extremely large/small numbers.
Shape: If the model predicts C classes for a single input, the logits are a vector of size [C].
Properties:
They do not satisfy the constraints of probabilities (e.g., summing to 1 or being between 0 and 1).
They are a direct representation of the model's tendency or 'preference' for each class.
Why Are Logits Important?
Logits are an intermediate result in a model's prediction pipeline. They are transformed into probabilities through activation functions like Softmax, which are then used for decision-making or loss computation.
2. Relationship Between Logits and Softmax
The Softmax Function
Softmax is a mathematical function that transforms logits into probabilities. It is defined as:
P(y_i) = exp(z_i) / sum(exp(z_j))
Where:
z_i: Logit for class i.
C: Total number of classes.
P(y_i): Normalized probability for class i, satisfying: sum(P(y_i)) = 1
Difference Between Logits and Probabilities
Logits (Unnormalized Scores):
Range: Any real number ((−∞) to (+∞)).
No probabilistic interpretation.
Softmax Output (Probabilities):
Range: [0, 1].
Represents the model's confidence in each class, summing to 1.
3. Practical Example: From Logits to Probabilities
Let's take a text classification task where the model predicts the category of a sentence. Suppose the task has three categories:
Class A: News
Class B: Entertainment
Class C: Technology
Input: The sentence "The new smartphone has amazing features."Model's Logits Output: logits = [2.0, 1.0, 0.1]
Step 1: Apply Softmax
To convert logits to probabilities, apply the Softmax function:
The total sum of exponentials is:
sum = 7.39 + 2.72 + 1.11 ≈ 11.22
The probabilities for each class are:
P(A) = 7.39 / 11.22 ≈ 0.66
P(B) = 2.72 / 11.22 ≈ 0.24
P(C) = 1.11 / 11.22 ≈ 0.10
Interpretation
The model assigns the highest probability to Class A (News), indicating the sentence is most likely about news.
Class B (Entertainment) and Class C (Technology) have lower probabilities, reflecting weaker confidence in those predictions.
4. Applications and Considerations of Logits
Applications
Classification Tasks:
Logits are used as input to loss functions like cross-entropy, which compares the logits (or their normalized probabilities) with ground truth labels.
Inference:
During inference, instead of computing probabilities, we can directly use the index of the largest logit value for the predicted class:
Predicted Class = argmax(logits)
This avoids unnecessary computation and yields the same result as Softmax.
Considerations
Numerical Stability:
Large or small logits can cause numerical overflow or underflow during Softmax computation. To mitigate this, subtract the maximum logit from all logits before applying Softmax:
P(y_i) = exp(z_i - max(z)) / sum(exp(z_j - max(z)))
This adjustment ensures stable calculations without affecting the final probabilities.
Gradient Behavior:
The magnitude of logits affects gradients during backpropagation, influencing model training dynamics. Proper initialization and regularization can help manage this.
Interpretability:
Logits are not human-readable probabilities but provide insight into how confident the model is about different classes before normalization.
5. Key Takeaways and Insights
Logits Are the Raw Model Outputs:
They represent unnormalized scores indicating the model's inclination toward different classes.
Softmax Converts Logits to Probabilities:
This transformation is essential for interpreting model predictions and training with probability-based loss functions.
Numerical Stability Is Critical:
Subtracting the maximum logit during Softmax computation avoids overflow and ensures robust results.
Efficiency in Inference:
For classification tasks, the maximum logit directly gives the predicted class, eliminating the need for Softmax in inference pipelines.
By understanding logits and their transformation into probabilities, you gain deeper insights into the inner workings of deep learning models and how they make predictions. With practical examples and careful considerations, logits can be harnessed effectively for various machine learning tasks.