Scatter Plot

Visualize relationships between two variables with dots

📈 What is a Scatter Plot?

A scatter plot shows the relationship between two variables by plotting dots on a graph. Each dot represents one data point with an X and Y value. It's perfect for spotting patterns, trends, and correlations in your data!


import matplotlib.pyplot as plt

# Simple scatter plot
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.scatter(x, y)
plt.title("My First Scatter Plot")
plt.show()
                                    
2
Variables
Dots
Show Data
Trends
Revealed

Scatter Plot Concepts

📍

Plot Points

Each dot is one data point

x = [1, 2, 3]
y = [4, 5, 6]
plt.scatter(x, y)
plt.show()
📊

Show Relationships

See how variables connect

# Height vs Weight
height = [160, 170, 180]
weight = [60, 70, 80]
plt.scatter(height, weight)
🎨

Customize Colors

Make plots beautiful

plt.scatter(x, y, color='red', s=100)
plt.scatter(x2, y2, color='blue', s=50)
🔍

Find Patterns

Spot trends and outliers

# Look for upward trend
if correlation > 0.7:
    print("Strong positive relationship!")

📍 Creating Your First Scatter Plot

Let's start with the basics - plotting dots on a graph:

🔹 Basic Scatter Plot

import matplotlib.pyplot as plt

# Our data points
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

# Create scatter plot
plt.scatter(x, y)
plt.title("Hours Studied vs Test Score")
plt.xlabel("Hours Studied")
plt.ylabel("Test Score")
plt.show()

🔹 Adding More Points

# More realistic data
hours = [1, 2, 3, 4, 5, 6, 7, 8]
scores = [45, 55, 65, 70, 80, 85, 90, 95]

plt.scatter(hours, scores)
plt.title("Study Time vs Test Scores")
plt.xlabel("Hours Studied")
plt.ylabel("Test Score (%)")
plt.grid(True)  # Add grid for easier reading
plt.show()

🔹 Random Data Example

import numpy as np

# Generate random data
x = np.random.randn(50)  # 50 random numbers
y = np.random.randn(50)  # 50 random numbers

plt.scatter(x, y)
plt.title("Random Scatter Plot")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.show()

🎨 Customizing Scatter Plots

Make your plots look professional and informative:

🔹 Colors and Sizes

# Different colors and sizes
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.scatter(x, y, 
           color='red',      # Red dots
           s=100,           # Size of dots
           alpha=0.7)       # Transparency
plt.title("Customized Scatter Plot")
plt.show()

🔹 Multiple Groups

# Boys vs Girls test scores
boys_hours = [2, 4, 6, 8]
boys_scores = [60, 70, 80, 90]
girls_hours = [3, 5, 7, 9]
girls_scores = [65, 75, 85, 95]

plt.scatter(boys_hours, boys_scores, color='blue', label='Boys')
plt.scatter(girls_hours, girls_scores, color='pink', label='Girls')
plt.xlabel("Hours Studied")
plt.ylabel("Test Score")
plt.legend()  # Show the labels
plt.show()

🔹 Color by Value

# Color dots based on a third variable
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
colors = [10, 20, 30, 40, 50]  # Third variable

plt.scatter(x, y, c=colors, cmap='viridis')
plt.colorbar()  # Show color scale
plt.title("Colored by Third Variable")
plt.show()

🌍 Real World Examples

See how scatter plots reveal relationships in real data:

🔹 Height vs Weight

# People's height and weight
heights = [150, 160, 165, 170, 175, 180, 185]  # cm
weights = [45, 55, 60, 65, 70, 75, 80]         # kg

plt.scatter(heights, weights, color='green', s=80)
plt.title("Height vs Weight Relationship")
plt.xlabel("Height (cm)")
plt.ylabel("Weight (kg)")
plt.show()

# You can see: taller people tend to weigh more!

🔹 House Size vs Price

# House data
sizes = [1000, 1200, 1500, 1800, 2000, 2200]  # sq ft
prices = [200, 250, 300, 350, 400, 450]        # thousands

plt.scatter(sizes, prices, color='orange', s=100)
plt.title("House Size vs Price")
plt.xlabel("Size (sq ft)")
plt.ylabel("Price ($1000s)")
plt.show()

# Pattern: bigger houses cost more!

🔹 Temperature vs Ice Cream Sales

# Weather and sales data
temperature = [60, 65, 70, 75, 80, 85, 90, 95]  # Fahrenheit
ice_cream = [100, 150, 200, 300, 400, 500, 600, 700]  # sales

plt.scatter(temperature, ice_cream, color='skyblue', s=120)
plt.title("Temperature vs Ice Cream Sales")
plt.xlabel("Temperature (°F)")
plt.ylabel("Ice Cream Sales")
plt.show()

# Hot days = more ice cream sales!

🔍 Reading Scatter Plots

Learn to spot different patterns in your data:

📊 Common Patterns:

  • Positive correlation: As X increases, Y increases (upward trend)
  • Negative correlation: As X increases, Y decreases (downward trend)
  • No correlation: No clear pattern (random dots)
  • Outliers: Dots far from the main pattern

🔹 Positive Correlation Example

# Study time vs grades (positive correlation)
study_time = [1, 2, 3, 4, 5, 6, 7, 8]
grades = [50, 60, 65, 70, 75, 80, 85, 90]

plt.scatter(study_time, grades, color='green')
plt.title("Positive Correlation: More Study = Better Grades")
plt.xlabel("Study Hours")
plt.ylabel("Grade")
plt.show()

🔹 Negative Correlation Example

# TV time vs grades (negative correlation)
tv_time = [1, 2, 3, 4, 5, 6, 7, 8]
grades = [90, 85, 80, 75, 70, 65, 60, 50]

plt.scatter(tv_time, grades, color='red')
plt.title("Negative Correlation: More TV = Lower Grades")
plt.xlabel("TV Hours")
plt.ylabel("Grade")
plt.show()

🔹 Finding Outliers

# Data with one outlier
x = [1, 2, 3, 4, 5, 6, 7, 8, 15]  # 15 is unusual
y = [2, 4, 6, 8, 10, 12, 14, 16, 5]  # 5 is low for x=15

plt.scatter(x, y, color='blue')
plt.title("Spot the Outlier!")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.show()

# The point at (15, 5) doesn't fit the pattern!

🧠 Test Your Knowledge

What does each dot in a scatter plot represent?

In a positive correlation, as X increases, Y:

Which parameter controls the size of dots in plt.scatter()?