Ruby Regular Expressions

Pattern matching and text processing with RegEx

🔍 What are Regular Expressions?

Regular expressions (RegEx) are powerful patterns used to match, search, and manipulate text. Ruby has built-in support for regex, making text processing and validation incredibly efficient and flexible.


# Simple regex match
text = "Hello, World!"
if text =~ /World/
  puts "Match found!"
end
                                    

Output:

Match found!

Key RegEx Concepts

🎯

Matching

Find patterns in text

text =~ /pattern/
text.match(/pattern/)
🔄

Substitution

Replace matched patterns

text.sub(/old/, "new")
text.gsub(/old/, "new")
✂️

Splitting

Split strings by pattern

text.split(/,/)
text.split(/\s+/)
📋

Extraction

Extract matched groups

match = text.match(/(\d+)/)
match[1]

🔹 Basic Pattern Matching

Use the =~ operator or match method to find patterns in strings. The =~ operator returns the position of the match, while match returns a MatchData object with detailed information about the match.

# Using =~ operator (returns position or nil)
text = "The year is 2025"

if text =~ /\d+/
  puts "Found numbers at position: #{text =~ /\d+/}"
end

# Using match method (returns MatchData object)
match = text.match(/\d+/)
if match
  puts "Matched: #{match[0]}"
end

# Check if string starts with pattern
email = "[email protected]"
if email =~ /^[a-z]/
  puts "Email starts with lowercase letter"
end

# Check if string ends with pattern
filename = "document.pdf"
if filename =~ /\.pdf$/
  puts "This is a PDF file"
end

# Case-insensitive matching
name = "RUBY"
if name =~ /ruby/i
  puts "Match found (case-insensitive)"
end

Output:

Found numbers at position: 12
Matched: 2025
Email starts with lowercase letter
This is a PDF file
Match found (case-insensitive)

🔹 Common RegEx Patterns

Learn essential regex patterns for everyday text processing tasks. These patterns cover digits, letters, whitespace, word boundaries, and common validation scenarios like emails and phone numbers.

# Digits
text = "Order #12345"
numbers = text.scan(/\d+/)
puts "Numbers: #{numbers.join(", ")}"

# Letters only
name = "John123Doe"
letters = name.scan(/[a-zA-Z]+/)
puts "Letters: #{letters.join(", ")}"

# Whitespace
sentence = "Hello   World  Ruby"
words = sentence.split(/\s+/)
puts "Words: #{words.join(", ")}"

# Email pattern
email = "[email protected]"
if email =~ /^[\w+\-.]+@[a-z\d\-.]+\.[a-z]+$/i
  puts "Valid email format"
end

# Phone number pattern
phone = "123-456-7890"
if phone =~ /^\d{3}-\d{3}-\d{4}$/
  puts "Valid phone format"
end

# URL pattern
url = "https://www.example.com"
if url =~ /^https?:\/\//
  puts "Valid URL"
end

Output:

Numbers: 12345
Letters: John, Doe
Words: Hello, World, Ruby
Valid email format
Valid phone format
Valid URL

🔹 Substitution and Replacement

Replace text using sub (first occurrence) or gsub (all occurrences). You can use simple strings or complex patterns with capture groups for sophisticated text transformations and data cleaning.

# Replace first occurrence with sub
text = "Hello World, Hello Ruby"
result = text.sub("Hello", "Hi")
puts result

# Replace all occurrences with gsub
result = text.gsub("Hello", "Hi")
puts result

# Replace using regex pattern
sentence = "I have 3 apples and 5 oranges"
result = sentence.gsub(/\d+/, "many")
puts result

# Remove all digits
text = "abc123def456"
result = text.gsub(/\d/, "")
puts "Without digits: #{result}"

# Replace with capture groups
date = "2025-01-15"
formatted = date.gsub(/(\d{4})-(\d{2})-(\d{2})/, '\3/\2/\1')
puts "Formatted date: #{formatted}"

# Remove extra whitespace
messy = "Too    many     spaces"
clean = messy.gsub(/\s+/, " ")
puts "Clean: #{clean}"

Output:

Hi World, Hello Ruby
Hi World, Hi Ruby
I have many apples and many oranges
Without digits: abcdef
Formatted date: 15/01/2025
Clean: Too many spaces

🔹 Extracting Data with Capture Groups

Use parentheses to create capture groups that extract specific parts of matched text. This is invaluable for parsing structured data, extracting information from logs, or processing formatted text like dates and addresses.

# Extract date components
date_string = "Today is 2025-01-15"
match = date_string.match(/(\d{4})-(\d{2})-(\d{2})/)

if match
  year = match[1]
  month = match[2]
  day = match[3]
  puts "Year: #{year}, Month: #{month}, Day: #{day}"
end

# Extract email parts
email = "[email protected]"
match = email.match(/^(.+)@(.+)\.(.+)$/)

if match
  puts "Username: #{match[1]}"
  puts "Domain: #{match[2]}"
  puts "Extension: #{match[3]}"
end

# Extract all numbers from text
text = "I bought 3 apples, 5 oranges, and 2 bananas"
numbers = text.scan(/\d+/)
puts "Numbers found: #{numbers.join(", ")}"

# Extract words
sentence = "Ruby is awesome!"
words = sentence.scan(/\w+/)
puts "Words: #{words.join(", ")}"

# Named capture groups
phone = "Contact: 123-456-7890"
match = phone.match(/(?\d{3})-(?\d{3})-(?\d{4})/)

if match
  puts "Area: #{match[:area]}"
  puts "Prefix: #{match[:prefix]}"
  puts "Line: #{match[:line]}"
end

Output:

Year: 2025, Month: 01, Day: 15
Username: john.doe
Domain: example
Extension: com
Numbers found: 3, 5, 2
Words: Ruby, is, awesome
Area: 123
Prefix: 456
Line: 7890

🔹 Validation Examples

Use regex for input validation to ensure data meets specific format requirements. These patterns help validate user input, check data integrity, and enforce business rules before processing or storing information.

# Validate email
def valid_email?(email)
  email =~ /^[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+$/i
end

puts "[email protected]: #{valid_email?("[email protected]")}"
puts "invalid.email: #{valid_email?("invalid.email")}"

# Validate phone number
def valid_phone?(phone)
  phone =~ /^\d{3}-\d{3}-\d{4}$/
end

puts "123-456-7890: #{valid_phone?("123-456-7890")}"
puts "12345: #{valid_phone?("12345")}"

# Validate password (8+ chars, 1 uppercase, 1 digit)
def valid_password?(password)
  password =~ /^(?=.*[A-Z])(?=.*\d).{8,}$/
end

puts "Pass123word: #{valid_password?("Pass123word")}"
puts "weak: #{valid_password?("weak")}"

# Validate username (alphanumeric, 3-16 chars)
def valid_username?(username)
  username =~ /^[a-zA-Z0-9_]{3,16}$/
end

puts "john_doe: #{valid_username?("john_doe")}"
puts "ab: #{valid_username?("ab")}"

# Validate URL
def valid_url?(url)
  url =~ /^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b/
end

puts "https://example.com: #{valid_url?("https://example.com")}"
puts "not-a-url: #{valid_url?("not-a-url")}"

Output:

[email protected]: 0
invalid.email: 
123-456-7890: 0
12345: 
Pass123word: 0
weak: 
john_doe: 0
ab: 
https://example.com: 0
not-a-url: 

🧠 Test Your Knowledge

Which method replaces ALL occurrences of a pattern?