C++ Views & Pipelines

Composable data processing with filter, transform, take, and drop

🔄 What are Views & Pipelines?

Views and pipelines in C++ allow you to chain data transformations like filter, transform, take, and drop operations together using the pipe operator, creating readable and efficient data processing workflows.


#include <ranges>
#include <vector>

auto result = data | filter(even) | transform(square) | take(5);
                                    

Core Pipeline Operations

🔍

filter

Keep elements that match condition

data | std::views::filter(predicate)
🔄

transform

Apply function to each element

data | std::views::transform(func)
📥

take

Get first N elements

data | std::views::take(n)
📤

drop

Skip first N elements

data | std::views::drop(n)

🔹 Filter Operations

Filtering selects elements from a sequence based on a predicate, enabling focused data processing. Using functions like std::copy_if or range adaptors, you can extract items that meet specific criteria—such as even numbers, values above a threshold, or combined conditions. This operation is foundational for data querying, cleaning datasets, and preparing inputs for further transformations without modifying the original collection.

#include <ranges>
#include <vector>
#include <iostream>

int main() {
    std::vector<int> numbers = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    
    // Filter even numbers
    auto evens = numbers | std::views::filter([](int n) { 
        return n % 2 == 0; 
    });
    
    // Filter numbers greater than 5
    auto greater_than_five = numbers | std::views::filter([](int n) { 
        return n > 5; 
    });
    
    // Filter with multiple conditions
    auto even_and_large = numbers | std::views::filter([](int n) { 
        return n % 2 == 0 && n > 4; 
    });
    
    std::cout << "Even numbers: ";
    for (int n : evens) {
        std::cout << n << " ";
    }
    std::cout << std::endl;
    
    std::cout << "Greater than 5: ";
    for (int n : greater_than_five) {
        std::cout << n << " ";
    }
    std::cout << std::endl;
    
    std::cout << "Even and > 4: ";
    for (int n : even_and_large) {
        std::cout << n << " ";
    }
    std::cout << std::endl;
    
    return 0;
}

Output:

Even numbers: 2 4 6 8 10

Greater than 5: 6 7 8 9 10

Even and > 4: 6 8 10

🔹 Transform Operations

Transformation applies a function to each element, mapping a range to a new set of values. The std::transform algorithm is commonly used for operations like squaring numbers, doubling values, or converting types (e.g., numbers to strings). This creates a new sequence where each output corresponds to an input element, allowing for data normalization, format conversion, or feature computation in pipelines.

#include <ranges>
#include <vector>
#include <string>
#include <iostream>

int main() {
    std::vector<int> numbers = {1, 2, 3, 4, 5};
    
    // Square each number
    auto squares = numbers | std::views::transform([](int n) { 
        return n * n; 
    });
    
    // Double each number
    auto doubled = numbers | std::views::transform([](int n) { 
        return n * 2; 
    });
    
    // Convert to strings
    auto strings = numbers | std::views::transform([](int n) { 
        return std::to_string(n); 
    });
    
    std::cout << "Original: ";
    for (int n : numbers) {
        std::cout << n << " ";
    }
    std::cout << std::endl;
    
    std::cout << "Squares: ";
    for (int n : squares) {
        std::cout << n << " ";
    }
    std::cout << std::endl;
    
    std::cout << "Doubled: ";
    for (int n : doubled) {
        std::cout << n << " ";
    }
    std::cout << std::endl;
    
    std::cout << "As strings: ";
    for (const auto& s : strings) {
        std::cout << s << " ";
    }
    std::cout << std::endl;
    
    return 0;
}

Output:

Original: 1 2 3 4 5

Squares: 1 4 9 16 25

Doubled: 2 4 6 8 10

As strings: 1 2 3 4 5

🔹 Take and Drop Operations

Take and drop operations control how many elements are processed from the start or end of a sequence. "Take" selects the first N items or elements while a condition holds. "Drop" (or skip) excludes the first N items or elements until a condition is false. These are essential for pagination, windowed analysis, or ignoring headers in data streams, and they work efficiently with lazy evaluation to avoid processing unnecessary data.

#include <ranges>
#include <vector>
#include <iostream>

int main() {
    std::vector<int> data = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100};
    
    // Take first 5 elements
    auto first_five = data | std::views::take(5);
    
    // Drop first 3 elements
    auto skip_three = data | std::views::drop(3);
    
    // Take while condition is true
    auto take_while_small = data | std::views::take_while([](int n) { 
        return n < 60; 
    });
    
    // Drop while condition is true
    auto drop_while_small = data | std::views::drop_while([](int n) { 
        return n < 60; 
    });
    
    std::cout << "Original: ";
    for (int n : data) {
        std::cout << n << " ";
    }
    std::cout << std::endl;
    
    std::cout << "First 5: ";
    for (int n : first_five) {
        std::cout << n << " ";
    }
    std::cout << std::endl;
    
    std::cout << "Skip 3: ";
    for (int n : skip_three) {
        std::cout << n << " ";
    }
    std::cout << std::endl;
    
    std::cout << "Take while < 60: ";
    for (int n : take_while_small) {
        std::cout << n << " ";
    }
    std::cout << std::endl;
    
    std::cout << "Drop while < 60: ";
    for (int n : drop_while_small) {
        std::cout << n << " ";
    }
    std::cout << std::endl;
    
    return 0;
}

Output:

Original: 10 20 30 40 50 60 70 80 90 100

First 5: 10 20 30 40 50

Skip 3: 40 50 60 70 80 90 100

Take while < 60: 10 20 30 40 50

Drop while < 60: 60 70 80 90 100

🔹 Complex Pipeline Examples

Complex pipelines combine multiple operations—like filter, transform, take, and sort—into a single, expressive data-processing flow. For instance, you might filter sales above a threshold, double them, take the top three, and then apply a bonus. Chaining operations using ranges or views makes the logic readable and efficient, as intermediate results are computed on-demand without unnecessary storage or passes over the data.

#include <ranges>
#include <vector>
#include <iostream>

int main() {
    std::vector<int> sales_data = {150, 200, 75, 300, 125, 400, 50, 275, 350, 100};
    
    // Pipeline 1: Find top 3 sales above 100, doubled
    auto top_sales = sales_data 
        | std::views::filter([](int sale) { return sale > 100; })  // Above 100
        | std::views::transform([](int sale) { return sale * 2; }) // Double them
        | std::views::take(3);                                     // Top 3
    
    std::cout << "Top 3 doubled sales > 100: ";
    for (int sale : top_sales) {
        std::cout << sale << " ";
    }
    std::cout << std::endl;
    
    // Pipeline 2: Process middle range data
    auto middle_range = sales_data
        | std::views::drop(2)                                      // Skip first 2
        | std::views::take(6)                                      // Take next 6
        | std::views::filter([](int sale) { return sale >= 100; }) // Filter >= 100
        | std::views::transform([](int sale) { return sale + 50; }); // Add bonus
    
    std::cout << "Middle range with bonus: ";
    for (int sale : middle_range) {
        std::cout << sale << " ";
    }
    std::cout << std::endl;
    
    // Pipeline 3: Complex business logic
    auto processed = sales_data
        | std::views::enumerate                                    // Add indices
        | std::views::filter([](auto pair) {                      // Filter by index and value
            auto [index, value] = pair;
            return index % 2 == 0 && value > 100;
        })
        | std::views::transform([](auto pair) {                    // Transform the values
            auto [index, value] = pair;
            return value * 1.1;  // 10% increase
        })
        | std::views::take(3);
    
    std::cout << "Even indices > 100 with 10% increase: ";
    for (double value : processed) {
        std::cout << value << " ";
    }
    std::cout << std::endl;
    
    return 0;
}

Output:

Top 3 doubled sales > 100: 300 400 150

Middle range with bonus: 350 175 450

Even indices > 100 with 10% increase: 165 137.5 385

🔹 Performance and Best Practices

Writing efficient pipelines requires attention to memory, computation, and readability to ensure optimal performance. Key strategies include avoiding unnecessary data copies by using views, leveraging lazy evaluation to defer calculations, composing operations for clarity, and reserving memory in advance when sizes are known. These practices reduce overhead, improve cache usage, and make your code more maintainable and scalable.

Best Practices:

  • Lazy Evaluation: Views don't process data until iteration
  • Order Matters: Put filter operations before transform when possible
  • Reusable Views: Store views in variables for multiple uses
  • Memory Efficient: Views don't copy data, just reference it
#include <ranges>
#include <vector>
#include <iostream>

int main() {
    std::vector<int> large_dataset(1000000);
    std::iota(large_dataset.begin(), large_dataset.end(), 1);
    
    // Efficient: filter first, then transform
    auto efficient = large_dataset
        | std::views::filter([](int n) { return n % 1000 == 0; })  // Reduces data first
        | std::views::transform([](int n) { return n * n; })       // Then transform less data
        | std::views::take(10);
    
    // Store view for reuse
    auto reusable_view = large_dataset 
        | std::views::filter([](int n) { return n % 2 == 0; })
        | std::views::take(100);
    
    // Use the view multiple times
    auto count = std::ranges::distance(reusable_view);
    std::cout << "Count: " << count << std::endl;
    
    // Views are composable
    auto extended = reusable_view 
        | std::views::transform([](int n) { return n / 2; });
    
    return 0;
}

Key Benefits:

Effective scope management is crucial for writing clean, maintainable, and bug-free code. Key practices include minimizing global variables, using the narrowest scope possible, avoiding variable shadowing where confusing, and leveraging block scope in loops and conditionals. Proper scope usage enhances encapsulation, reduces side effects, and improves code readability and debuggability.

✓ Lazy evaluation saves computation

✓ Composable and readable code

✓ Memory efficient processing

🧠 Test Your Knowledge

Which operation should typically come first in a pipeline for better performance?