Scatter Plot
Visualize relationships between two variables with dots
📈 What is a Scatter Plot?
A scatter plot shows the relationship between two variables by plotting dots on a graph. Each dot represents one data point with an X and Y value. It's perfect for spotting patterns, trends, and correlations in your data!
import matplotlib.pyplot as plt
# Simple scatter plot
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.scatter(x, y)
plt.title("My First Scatter Plot")
plt.show()
Scatter Plot Concepts
Plot Points
Each dot is one data point
x = [1, 2, 3]
y = [4, 5, 6]
plt.scatter(x, y)
plt.show()
Show Relationships
See how variables connect
# Height vs Weight
height = [160, 170, 180]
weight = [60, 70, 80]
plt.scatter(height, weight)
Customize Colors
Make plots beautiful
plt.scatter(x, y, color='red', s=100)
plt.scatter(x2, y2, color='blue', s=50)
Find Patterns
Spot trends and outliers
# Look for upward trend
if correlation > 0.7:
print("Strong positive relationship!")
📍 Creating Your First Scatter Plot
Let's start with the basics - plotting dots on a graph:
🔹 Basic Scatter Plot
import matplotlib.pyplot as plt
# Our data points
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
# Create scatter plot
plt.scatter(x, y)
plt.title("Hours Studied vs Test Score")
plt.xlabel("Hours Studied")
plt.ylabel("Test Score")
plt.show()
🔹 Adding More Points
# More realistic data
hours = [1, 2, 3, 4, 5, 6, 7, 8]
scores = [45, 55, 65, 70, 80, 85, 90, 95]
plt.scatter(hours, scores)
plt.title("Study Time vs Test Scores")
plt.xlabel("Hours Studied")
plt.ylabel("Test Score (%)")
plt.grid(True) # Add grid for easier reading
plt.show()
🔹 Random Data Example
import numpy as np
# Generate random data
x = np.random.randn(50) # 50 random numbers
y = np.random.randn(50) # 50 random numbers
plt.scatter(x, y)
plt.title("Random Scatter Plot")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.show()
🎨 Customizing Scatter Plots
Make your plots look professional and informative:
🔹 Colors and Sizes
# Different colors and sizes
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.scatter(x, y,
color='red', # Red dots
s=100, # Size of dots
alpha=0.7) # Transparency
plt.title("Customized Scatter Plot")
plt.show()
🔹 Multiple Groups
# Boys vs Girls test scores
boys_hours = [2, 4, 6, 8]
boys_scores = [60, 70, 80, 90]
girls_hours = [3, 5, 7, 9]
girls_scores = [65, 75, 85, 95]
plt.scatter(boys_hours, boys_scores, color='blue', label='Boys')
plt.scatter(girls_hours, girls_scores, color='pink', label='Girls')
plt.xlabel("Hours Studied")
plt.ylabel("Test Score")
plt.legend() # Show the labels
plt.show()
🔹 Color by Value
# Color dots based on a third variable
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
colors = [10, 20, 30, 40, 50] # Third variable
plt.scatter(x, y, c=colors, cmap='viridis')
plt.colorbar() # Show color scale
plt.title("Colored by Third Variable")
plt.show()
🌍 Real World Examples
See how scatter plots reveal relationships in real data:
🔹 Height vs Weight
# People's height and weight
heights = [150, 160, 165, 170, 175, 180, 185] # cm
weights = [45, 55, 60, 65, 70, 75, 80] # kg
plt.scatter(heights, weights, color='green', s=80)
plt.title("Height vs Weight Relationship")
plt.xlabel("Height (cm)")
plt.ylabel("Weight (kg)")
plt.show()
# You can see: taller people tend to weigh more!
🔹 House Size vs Price
# House data
sizes = [1000, 1200, 1500, 1800, 2000, 2200] # sq ft
prices = [200, 250, 300, 350, 400, 450] # thousands
plt.scatter(sizes, prices, color='orange', s=100)
plt.title("House Size vs Price")
plt.xlabel("Size (sq ft)")
plt.ylabel("Price ($1000s)")
plt.show()
# Pattern: bigger houses cost more!
🔹 Temperature vs Ice Cream Sales
# Weather and sales data
temperature = [60, 65, 70, 75, 80, 85, 90, 95] # Fahrenheit
ice_cream = [100, 150, 200, 300, 400, 500, 600, 700] # sales
plt.scatter(temperature, ice_cream, color='skyblue', s=120)
plt.title("Temperature vs Ice Cream Sales")
plt.xlabel("Temperature (°F)")
plt.ylabel("Ice Cream Sales")
plt.show()
# Hot days = more ice cream sales!
🔍 Reading Scatter Plots
Learn to spot different patterns in your data:
📊 Common Patterns:
- Positive correlation: As X increases, Y increases (upward trend)
- Negative correlation: As X increases, Y decreases (downward trend)
- No correlation: No clear pattern (random dots)
- Outliers: Dots far from the main pattern
🔹 Positive Correlation Example
# Study time vs grades (positive correlation)
study_time = [1, 2, 3, 4, 5, 6, 7, 8]
grades = [50, 60, 65, 70, 75, 80, 85, 90]
plt.scatter(study_time, grades, color='green')
plt.title("Positive Correlation: More Study = Better Grades")
plt.xlabel("Study Hours")
plt.ylabel("Grade")
plt.show()
🔹 Negative Correlation Example
# TV time vs grades (negative correlation)
tv_time = [1, 2, 3, 4, 5, 6, 7, 8]
grades = [90, 85, 80, 75, 70, 65, 60, 50]
plt.scatter(tv_time, grades, color='red')
plt.title("Negative Correlation: More TV = Lower Grades")
plt.xlabel("TV Hours")
plt.ylabel("Grade")
plt.show()
🔹 Finding Outliers
# Data with one outlier
x = [1, 2, 3, 4, 5, 6, 7, 8, 15] # 15 is unusual
y = [2, 4, 6, 8, 10, 12, 14, 16, 5] # 5 is low for x=15
plt.scatter(x, y, color='blue')
plt.title("Spot the Outlier!")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.show()
# The point at (15, 5) doesn't fit the pattern!