Ruby Regular Expressions
Pattern matching and text processing with RegEx
🔍 What are Regular Expressions?
Regular expressions (RegEx) are powerful patterns used to match, search, and manipulate text. Ruby has built-in support for regex, making text processing and validation incredibly efficient and flexible.
# Simple regex match
text = "Hello, World!"
if text =~ /World/
puts "Match found!"
end
Output:
Match found!
Key RegEx Concepts
Matching
Find patterns in text
text =~ /pattern/
text.match(/pattern/)
Substitution
Replace matched patterns
text.sub(/old/, "new")
text.gsub(/old/, "new")
Splitting
Split strings by pattern
text.split(/,/)
text.split(/\s+/)
Extraction
Extract matched groups
match = text.match(/(\d+)/)
match[1]
🔹 Basic Pattern Matching
Use the =~ operator or match method to find patterns in strings. The =~ operator returns the position of the match, while match returns a MatchData object with detailed information about the match.
# Using =~ operator (returns position or nil)
text = "The year is 2025"
if text =~ /\d+/
puts "Found numbers at position: #{text =~ /\d+/}"
end
# Using match method (returns MatchData object)
match = text.match(/\d+/)
if match
puts "Matched: #{match[0]}"
end
# Check if string starts with pattern
email = "[email protected]"
if email =~ /^[a-z]/
puts "Email starts with lowercase letter"
end
# Check if string ends with pattern
filename = "document.pdf"
if filename =~ /\.pdf$/
puts "This is a PDF file"
end
# Case-insensitive matching
name = "RUBY"
if name =~ /ruby/i
puts "Match found (case-insensitive)"
end
Output:
Found numbers at position: 12 Matched: 2025 Email starts with lowercase letter This is a PDF file Match found (case-insensitive)
🔹 Common RegEx Patterns
Learn essential regex patterns for everyday text processing tasks. These patterns cover digits, letters, whitespace, word boundaries, and common validation scenarios like emails and phone numbers.
# Digits
text = "Order #12345"
numbers = text.scan(/\d+/)
puts "Numbers: #{numbers.join(", ")}"
# Letters only
name = "John123Doe"
letters = name.scan(/[a-zA-Z]+/)
puts "Letters: #{letters.join(", ")}"
# Whitespace
sentence = "Hello World Ruby"
words = sentence.split(/\s+/)
puts "Words: #{words.join(", ")}"
# Email pattern
email = "[email protected]"
if email =~ /^[\w+\-.]+@[a-z\d\-.]+\.[a-z]+$/i
puts "Valid email format"
end
# Phone number pattern
phone = "123-456-7890"
if phone =~ /^\d{3}-\d{3}-\d{4}$/
puts "Valid phone format"
end
# URL pattern
url = "https://www.example.com"
if url =~ /^https?:\/\//
puts "Valid URL"
end
Output:
Numbers: 12345 Letters: John, Doe Words: Hello, World, Ruby Valid email format Valid phone format Valid URL
🔹 Substitution and Replacement
Replace text using sub (first occurrence) or gsub (all occurrences). You can use simple strings or complex patterns with capture groups for sophisticated text transformations and data cleaning.
# Replace first occurrence with sub
text = "Hello World, Hello Ruby"
result = text.sub("Hello", "Hi")
puts result
# Replace all occurrences with gsub
result = text.gsub("Hello", "Hi")
puts result
# Replace using regex pattern
sentence = "I have 3 apples and 5 oranges"
result = sentence.gsub(/\d+/, "many")
puts result
# Remove all digits
text = "abc123def456"
result = text.gsub(/\d/, "")
puts "Without digits: #{result}"
# Replace with capture groups
date = "2025-01-15"
formatted = date.gsub(/(\d{4})-(\d{2})-(\d{2})/, '\3/\2/\1')
puts "Formatted date: #{formatted}"
# Remove extra whitespace
messy = "Too many spaces"
clean = messy.gsub(/\s+/, " ")
puts "Clean: #{clean}"
Output:
Hi World, Hello Ruby Hi World, Hi Ruby I have many apples and many oranges Without digits: abcdef Formatted date: 15/01/2025 Clean: Too many spaces
🔹 Extracting Data with Capture Groups
Use parentheses to create capture groups that extract specific parts of matched text. This is invaluable for parsing structured data, extracting information from logs, or processing formatted text like dates and addresses.
# Extract date components
date_string = "Today is 2025-01-15"
match = date_string.match(/(\d{4})-(\d{2})-(\d{2})/)
if match
year = match[1]
month = match[2]
day = match[3]
puts "Year: #{year}, Month: #{month}, Day: #{day}"
end
# Extract email parts
email = "[email protected]"
match = email.match(/^(.+)@(.+)\.(.+)$/)
if match
puts "Username: #{match[1]}"
puts "Domain: #{match[2]}"
puts "Extension: #{match[3]}"
end
# Extract all numbers from text
text = "I bought 3 apples, 5 oranges, and 2 bananas"
numbers = text.scan(/\d+/)
puts "Numbers found: #{numbers.join(", ")}"
# Extract words
sentence = "Ruby is awesome!"
words = sentence.scan(/\w+/)
puts "Words: #{words.join(", ")}"
# Named capture groups
phone = "Contact: 123-456-7890"
match = phone.match(/(?\d{3})-(?\d{3})-(?\d{4})/)
if match
puts "Area: #{match[:area]}"
puts "Prefix: #{match[:prefix]}"
puts "Line: #{match[:line]}"
end
Output:
Year: 2025, Month: 01, Day: 15 Username: john.doe Domain: example Extension: com Numbers found: 3, 5, 2 Words: Ruby, is, awesome Area: 123 Prefix: 456 Line: 7890
🔹 Validation Examples
Use regex for input validation to ensure data meets specific format requirements. These patterns help validate user input, check data integrity, and enforce business rules before processing or storing information.
# Validate email
def valid_email?(email)
email =~ /^[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+$/i
end
puts "[email protected]: #{valid_email?("[email protected]")}"
puts "invalid.email: #{valid_email?("invalid.email")}"
# Validate phone number
def valid_phone?(phone)
phone =~ /^\d{3}-\d{3}-\d{4}$/
end
puts "123-456-7890: #{valid_phone?("123-456-7890")}"
puts "12345: #{valid_phone?("12345")}"
# Validate password (8+ chars, 1 uppercase, 1 digit)
def valid_password?(password)
password =~ /^(?=.*[A-Z])(?=.*\d).{8,}$/
end
puts "Pass123word: #{valid_password?("Pass123word")}"
puts "weak: #{valid_password?("weak")}"
# Validate username (alphanumeric, 3-16 chars)
def valid_username?(username)
username =~ /^[a-zA-Z0-9_]{3,16}$/
end
puts "john_doe: #{valid_username?("john_doe")}"
puts "ab: #{valid_username?("ab")}"
# Validate URL
def valid_url?(url)
url =~ /^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b/
end
puts "https://example.com: #{valid_url?("https://example.com")}"
puts "not-a-url: #{valid_url?("not-a-url")}"
Output:
[email protected]: 0 invalid.email: 123-456-7890: 0 12345: Pass123word: 0 weak: john_doe: 0 ab: https://example.com: 0 not-a-url: