Why Does Processing a Sorted Array Run Faster Than Processing an Unsorted Array?

As a C++ and Java developer, I’ve always been fascinated by how seemingly minor changes in data organization can lead to significant differences in performance. Recently, I came across an interesting observation: sorting an array before processing it can significantly speed up the computation, despite the additional cost of sorting. Let’s delve deeper into this occurrence using both C++ and Java examples to understand why this happens.

The Original Code

First, let’s look at the C++ code provided. The code generates an array of random integers, where each integer is between 0 and 255. It then sorts the array and sums all integers greater than or equal to 128 through a nested loop which repeats the summing operation 100,000 times.

#include <algorithm>
#include <ctime>
#include <iostream>

int main() {
    const unsigned arraySize = 32768;
    int data[arraySize];

    for (unsigned c = 0; c < arraySize; ++c)
        data[c] = std::rand() % 256;

    std::sort(data, data + arraySize);

    clock_t start = clock();
    long long sum = 0;
    for (unsigned i = 0; i < 100000; ++i) {
        for (unsigned c = 0; c < arraySize; ++c) {
            if (data[c] >= 128)
                sum += data[c];
        }
    }

    double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;

    std::cout << elapsedTime << '\n';
    std::cout << "sum = " << sum << '\n';
}

And its Java counterpart behaves similarly:

import java.util.Arrays;
import java.util.Random;

public class Main {
    public static void main(String[] args) {
        int arraySize = 32768;
        int data[] = new int[arraySize];

        Random rnd = new Random(0);
        for (int c = 0; c < arraySize; ++c)
            data[c] = rnd.nextInt() % 256;

        Arrays.sort(data);

        long start = System.nanoTime();
        long sum = 0;
        for (int i = 0; i < 100000; ++i) {
            for (int c = 0; c < arraySize; ++c) {
                if (data[c] >= 128)
                    sum += data[c];
            }
        }

        System.out.println((System.nanoTime() - start) / 1000000000.0);
        System.out.println("sum = " + sum);
    }
}

Unraveling the Mystery: Branch Prediction and Data Caching

Why does the sorted array perform better in both C++ and Java? The answer lies in two critical aspects of modern CPU architecture: branch prediction and data caching.

  1. Branch Prediction: Modern CPUs attempt to speed up execution by guessing the paths programs will take. For the conditional if (data[c] >= 128), the CPU tries to predict whether this condition will be true or false. In a sorted array where all false conditions are grouped together and followed by all true conditions, the CPU’s branch predictor can accurately anticipate the outcomes resulting in fewer mispredictions and, consequently, faster execution.
  1. Data Caching: CPUs use a cache to speed up access to data. In our nested loop structure, accessing similar or contiguous memory areas (as is more the case in sorted arrays) increases cache hits. In an unsorted array, jumps between lower and higher values may cause more cache misses, leading to slower data access times.

In Practice

While the experiment shows that sorting speeds up processing, remember that sorting itself has a cost. If you only need to process the array once, sorting might not be beneficial. However, if multiple operations involving structured data access are required, pre-sorting might be a useful optimization.

Sorting improves cache efficiency and branch prediction accuracy, making it an intriguing performance optimization for large data sets processed multiple times. This observation emphasizes the importance of understanding computer architecture and data structures when optimizing software performance.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *