Feature Scaling in Data Science

Feature scaling is a vital preprocessing step in data science, ensuring that all features contribute fairly to machine learning models. This module explains normalization and standardization in depth, with formulas, examples, and guidance on when to use each technique.

By DASCIN Team|Published On: August 26, 2025|Last Updated: August 26, 2025|Categories: Big Data, Data Science|

Feature Scaling in Data Science

Feature scaling is a crucial data preprocessing step in machine learning and data science. Many algorithms rely on distance metrics or gradient-based optimization. If the features of a dataset exist on different scales, the algorithm may become biased toward those with larger magnitudes, leading to poor performance or unstable training.

Consider an example with two features: Age (0–100) and Income (0–100,000). If we use a distance-based algorithm such as k-nearest neighbors (kNN), the differences in income values will dominate the distance computation, while the contribution of age becomes negligible. Similarly, gradient-based algorithms like logistic regression or neural networks will converge more slowly if features are on vastly different scales.

To resolve this, we apply feature scaling. This transforms features into comparable ranges or distributions, ensuring that each contributes appropriately to the model. The two most common methods are Normalization and Standardization.

1. Normalization (Min–Max Scaling)

Definition:
Normalization is a technique that rescales the values of a feature into a fixed range, typically between 0 and 1. This makes the data easier to compare across features, especially when they have very different scales.

Formula:

$x' = frac$

Where:

𝑥 = original value
𝑥_min= minimum value of the feature
𝑥_max= maximum value of the feature
𝑥' = normalized value (between 0 and 1)

Example:

Let's consider the following bief example:

Income = [30,000, 50,000, 70,000, 90,000, 110,000].
Mean ( $μ$ ) = 70,000
Standard deviation () ≈ 15,811
Standardized values ≈ [-1.26, -0.63, 0, 0.63, 1.26].

Example in Python:

Copy to Clipboard

Output:

Copy to Clipboard

When to Use:

Distance-based algorithms: k-Nearest Neighbors (kNN), clustering (e.g., k-Means), where distance is influenced by scale.
Neural networks: Helps stabilize gradient descent by keeping input values bounded within a range.

Limitations:

Sensitive to outliers: Extreme values will stretch the range, making most values very small.

2. Standardization (Z-Score Scaling)

Definition:
Standardization transforms data such that the resulting distribution has a mean of 0 and a standard deviation of 1. This process is also known as Z-score scaling. Unlike normalization, which rescales values to a fixed range (e.g., [0, 1]), standardization does not limit values to a specific interval. Instead, it centers the data and accounts for variability, making different features comparable.

Formula:

$displaystyle z = frac$

Where:

= original feature value
$μ$ = mean of the feature
= standard deviation of the feature
z = standardized value

Example:
Suppose we have the following income values:

Income = [30,000, 50,000, 70,000, 90,000, 110,000]
Compute mean:
Compute standard deviation:
Apply the formula:
1. For 30,000: z = 30,000 − 70,000 / 15,811 ≈ −2.53
2. For 50,000: z = 50,000 − 70,000 / 15,811 ≈ −1.27
3. For 70,000: z = 70,000 − 70,000 / 15,811 = 0
4. For 90,000: z ≈ 1.27
5. For 110,000: z ≈ 2.53

So, the standardized values are approximately: [−2.53,−1.27,0,1.27,2.53]

Example in Python:

Copy to Clipboard

Output:

Copy to Clipboard

When to Use:

Standardization is useful when:

The algorithm assumes normally distributed data, such as:
Linear Regression
Logistic Regression
Principal Component Analysis (PCA)
Support Vector Machines (SVMs)
Features are measured in different units and need to be made comparable.

Advantage:

Not restricted to a fixed range, unlike normalization.
Less sensitive to outliers because it centers around the mean and considers spread.
Often the default choice when scaling is required for machine learning algorithms.

3. Normalization vs. Standardization

The following table outlines the key differences between normalization and standardization:

Aspect	Normalization	Standardization
Range:	Fixed (e.g., 0–1)	No fixed range
Based On:	Minimum & Maximum values	Mean & Standard Deviation
Best For:	Distance-based methods (kNN, clustering, neural nets)	Gaussian-based methods (regression, PCA, SVM)
Outlier Sensitivity	High	Moderate
Interpretability	Values always bounded	Values centered at 0, variance = 1

4. Practical Considerations

In data science, it is useful to follow the following practices:

Try both methods: In practice, data scientists often experiment with both normalization and standardization to see which works best, as performance may vary by dataset.
Pipelines: Feature scaling should be applied after train-test splitting to avoid data leakage.
Scaling with categorical data: Only apply scaling to numeric features; categorical features need separate preprocessing (e.g., one-hot encoding).
Inverse transforms: Libraries like scikit-learn allow reversing scaling for interpretability after modeling.

5. Summary

Feature scaling is essential to prevent larger-magnitude features from dominating smaller ones.

Normalization rescales features to a bounded range (0–1) — ideal for distance-based algorithms.
Standardization centers features at 0 with unit variance — ideal for algorithms assuming normal distributions.

Choosing between them depends on the algorithm and data distribution, but both are indispensable tools in preprocessing.

Knowledge - Certification - Community

About Us

The DASCIN Frameworks

Careers

Contact Offices

Short Programs

Career Credentials

Automated Services

Sustainable IT

All Credential Programs

DASCIN Memberships

Get Involved

DASCIN Ambassador Program

Membership Portal

Training Partners

Academic Partners

Corporate Partners

Partner with Us

DASCIN Resources

Events

Podcasts

DASCIN Portals

Contact Us

Feature Scaling in Data Science

Feature Scaling in Data Science

Feature Scaling in Data Science

1. Normalization (Min–Max Scaling)

2. Standardization (Z-Score Scaling)

3. Normalization vs. Standardization

4. Practical Considerations

5. Summary

Other Knowledge Articles

Big Data & Generative AI: Transforming Analytics in 2025

The Four V’s of Big Data

keep updated & don’t miss anything!

+1 617 535 3144

+1 (617) 535 3144