Normal Distribution

📊 What is Normal Distribution?

Normal distribution is a bell-shaped curve that shows how data is spread around the average. Most values cluster around the center (mean), and fewer values appear at the extremes. It's everywhere in nature - heights, test scores, measurement errors!


import numpy as np
import matplotlib.pyplot as plt

# Generate normal distribution data
data = np.random.normal(100, 15, 1000)  # mean=100, std=15
plt.hist(data, bins=30)
plt.title("Normal Distribution - Bell Curve")
plt.show()

68%

Within 1 STD

95%

Within 2 STD

Bell

Shape

Normal Distribution Concepts

🔔

Bell Shape

Symmetric curve around the mean

# Create bell curve
x = np.linspace(-3, 3, 100)
y = (1/np.sqrt(2*np.pi)) * np.exp(-0.5*x**2)
plt.plot(x, y)

🎯

68-95-99.7 Rule

Standard deviation boundaries

# 68% within 1 standard deviation
mean = 100
std = 15
print(f"68% between {mean-std} and {mean+std}")

📏

Mean = Median

Perfect symmetry

data = np.random.normal(50, 10, 1000)
print(f"Mean: {np.mean(data):.1f}")
print(f"Median: {np.median(data):.1f}")

🌍

Everywhere!

Heights, scores, errors

# Heights example
heights = np.random.normal(170, 10, 100)
print(f"Average height: {np.mean(heights):.1f}cm")

🔔 Understanding the Bell Curve

The normal distribution creates a perfect bell shape. Let's see why:

🔹 Creating Your First Bell Curve

import numpy as np
import matplotlib.pyplot as plt

# Generate 1000 random numbers from normal distribution
data = np.random.normal(0, 1, 1000)  # mean=0, std=1

# Plot histogram
plt.hist(data, bins=30, alpha=0.7)
plt.title("My First Bell Curve")
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.show()

🔹 Different Means, Same Shape

# Three bell curves with different centers
curve1 = np.random.normal(0, 1, 1000)   # centered at 0
curve2 = np.random.normal(5, 1, 1000)   # centered at 5
curve3 = np.random.normal(-3, 1, 1000)  # centered at -3

plt.hist(curve1, alpha=0.5, label="Mean=0")
plt.hist(curve2, alpha=0.5, label="Mean=5")
plt.hist(curve3, alpha=0.5, label="Mean=-3")
plt.legend()
plt.show()

📏 The 68-95-99.7 Rule

This rule tells us where most of our data lives:

📊 The Magic Numbers:

68% of data within 1 standard deviation
95% of data within 2 standard deviations
99.7% of data within 3 standard deviations

🔹 Test Scores Example

# Test scores: mean=75, std=10
mean_score = 75
std_score = 10

print("Test Score Ranges:")
print(f"68% of students score between {mean_score-std_score} and {mean_score+std_score}")
print(f"95% of students score between {mean_score-2*std_score} and {mean_score+2*std_score}")
print(f"99.7% of students score between {mean_score-3*std_score} and {mean_score+3*std_score}")

# Output:
# 68% of students score between 65 and 85
# 95% of students score between 55 and 95
# 99.7% of students score between 45 and 105

🔹 Checking Real Data

# Generate test scores
scores = np.random.normal(75, 10, 1000)

# Count how many fall within 1 standard deviation
within_1_std = np.sum((scores >= 65) & (scores <= 85))
percentage = (within_1_std / len(scores)) * 100

print(f"Actually {percentage:.1f}% are within 1 std deviation")
# Should be close to 68%

🌍 Real World Examples

Normal distribution appears everywhere in real life:

🔹 Human Heights

# Adult male heights (cm)
heights = np.random.normal(175, 7, 1000)

print(f"Average height: {np.mean(heights):.1f}cm")
print(f"68% of men are between {175-7}cm and {175+7}cm")

plt.hist(heights, bins=30)
plt.title("Distribution of Male Heights")
plt.xlabel("Height (cm)")
plt.show()

🔹 IQ Scores

# IQ scores: mean=100, std=15
iq_scores = np.random.normal(100, 15, 1000)

print(f"Average IQ: {np.mean(iq_scores):.1f}")
print("68% of people have IQ between 85 and 115")
print("95% of people have IQ between 70 and 130")

plt.hist(iq_scores, bins=30)
plt.title("IQ Score Distribution")
plt.show()

🔹 Measurement Errors

# Measuring a 10cm object with small errors
true_length = 10.0
measurements = np.random.normal(true_length, 0.1, 100)

print(f"True length: {true_length}cm")
print(f"Average measurement: {np.mean(measurements):.2f}cm")
print(f"Standard deviation: {np.std(measurements):.3f}cm")

plt.hist(measurements, bins=20)
plt.title("Measurement Errors")
plt.show()

🛠️ Working with Normal Distribution

Practical tools for normal distribution analysis:

🔹 Generating Normal Data

# Different ways to create normal data
data1 = np.random.normal(50, 10, 100)    # mean=50, std=10, 100 points
data2 = np.random.randn(100) * 10 + 50   # Same result, different way

print(f"Method 1 mean: {np.mean(data1):.1f}")
print(f"Method 2 mean: {np.mean(data2):.1f}")

🔹 Testing if Data is Normal

from scipy import stats

# Create normal and non-normal data
normal_data = np.random.normal(0, 1, 1000)
uniform_data = np.random.uniform(0, 1, 1000)

# Simple visual test
plt.subplot(1, 2, 1)
plt.hist(normal_data, bins=30)
plt.title("Normal Data")

plt.subplot(1, 2, 2)
plt.hist(uniform_data, bins=30)
plt.title("Not Normal Data")
plt.show()

🔹 Z-Scores (Standardizing)

# Convert any normal distribution to standard normal (mean=0, std=1)
test_scores = np.random.normal(75, 10, 100)

# Calculate z-scores
z_scores = (test_scores - np.mean(test_scores)) / np.std(test_scores)

print(f"Original mean: {np.mean(test_scores):.1f}")
print(f"Z-score mean: {np.mean(z_scores):.1f}")  # Should be ~0
print(f"Z-score std: {np.std(z_scores):.1f}")    # Should be ~1