Table Of Contents

Previous topic

Sum3 Problem Overview

Next topic

Sum3 Multi-Core OpenMP Implementation

This Page

Sum3 Simple Implementation (Serial)

For an input array of arbitrary integers (that is, we make no assumptions as to the range of the integers), Sum3 can be solved in O(n^2) time using relatively simple algorithms.

The purpose of this chapter is to emphasize the point that parallelism is not always the best solution to a problem. Sometimes, making algorithmic improvements (resulting in better theoretical complexity) can provide better speedups than parallelism; especially if you don’t have the hardware resources to take advantage of a parallel algorithm.

Using a Hash Table

The easiest approach to code is to take advantage of hash tables. Hash tables are implemented in many languages, so they are cheap from a coding perspective, and they offer O(1) time for inserting an element and determining if an element is in the hash table ... under ideal circumstances. The approach is to:

  1. Load all numbers in a hash table
  2. Generate all possible pairs of numbers
  3. Check if the negation of the sum of each pair of numbers is in the hash table.

However, this approach suffers from some drawbacks. First, hash table operations can degrade to linear time under certain conditions (although this can be managed rather easily). Second, care must be taken to ensure that unique triples are identified. For example, if the input array was -2, 0, 4, one must ensure that -2+-2+4 = 0 does not get counted as a valid triple (since -2 only appears once in the input).

Using a Sorted Input Array

A slightly more complicated O(n^2) algorithm requires that the input array be sorted. The algorithm contains a loop that will traverse the input array from first to last.

Assume a sorted input array array containing n elements. We will first find all triples that add up to 0 and involve the first element in array. We set a=array[0]. A triple is then formed with b=array[0+1] and c=array[n-1]. If a+b+c > 0, then we need to reduce the sum to get it closer to 0, so we set c=array[n-2] (decrease the largest number in the triple to get a smaller sum). If a+b+c < 0, then we need to increase the sum to get it closer to 0, so we set b=array[0+2] (increase the smaller number to get a larger sum). This process continues until b and c refer to the same array element. At the end of this process, all possible triples involving the first array element have been computed.

To compute all triples involving the first array element, we took a pair of numbers from the array. After examining the pair, one number was excluded from further consideration in future triples involving the first array element. Therefore, for n array elements, we will construct n-2 triples due to the following: the first array element is used in every triple, leaving us to create pairs of the remaining n-1 elements; the first pair uses two elements and eliminates one from further consideration; all following pairs will reuse 1 element form the previous pair, and eliminate one element from consideration. Thus, computing all triples involving a single element requires O(n) time. Repeating this for n elements in the input array requires O(n^2) time. A C++ implementation of this algorithm follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#include <iostream>
#include <sstream>

using namespace std;
int getRandInt( int *dataVec, int dataSize )
{
        // load up a vector with random integers
        int num;
        for( int i = 0; i < dataSize; i++ ) {
                // integers will be 1-100
                num = rand() % 100+1;
                if( rand( ) % 2 == 0 ) {
                        // make some integers negative
                        num *= -1;
                }
                dataVec[i] = num;
        }
}


int main(int argc, char * argv[] )
{
        int dataSize = 0;

        if( argc < 2 ) {
                std::cerr << "usage: exe  [num of nums] [optional seed value]" << std::endl;
                exit( -1 );
        }
        {
                std::stringstream ss1;
                ss1 << argv[1];
                ss1 >> dataSize;
        }
        if( argc >= 3 ) {
                std::stringstream ss1;
                int seed;
                ss1 << argv[2];
                ss1 >> seed;
                srand( seed );
        }
        else {
                srand( 0 );
        }

        // create a data vector
        int *data = new int[ dataSize ];

        // load it up with random data
        getRandInt( data, dataSize );
        // sort the data
        sort( data, data + dataSize );

        // do the Sum3 computation.  O(n^2)
        int count = 0;
        int a,b,c, sum; // array elements
        int j,k;   // array indices
        for (int i = 0; i < dataSize-2; i++){
                a = data[i];
                j = i+1;
                k = dataSize-1;
                while( j < k ) {
                        b = data[j];
                        c = data[k];
                        sum = a+b+c;
                        if( sum == 0 ){
                                cerr << a << " " << b << " " << c << endl;

                                count++;
                        }
                        if( sum < 0 ){
                                j++;
                        }
                        else {
                                k--;
                        }
                }
        }
        cout<< count <<endl;

}

One pitfall in this algorithm is that duplicate values cause problems.

Example

Consider the input string array = [-4, 1, 1, 3, 3]. There are 4 triples that sum to 0:

array[0] + array[1] + array[3] = 0
array[0] + array[1] + array[4] = 0
array[0] + array[2] + array[3] = 0
array[0] + array[2] + array[4] = 0

However, the above code will only find 2 triples. The algorithm progresses as follows, with items being examined in bold:

-4, 1, 1, 3, 3 sum = 0, so index k is decremented
-4, 1, 1, 3, 3 sum = 0, so index k is decremented
-4, 1, 1, 3, 3 sum = -2, so index i is incremented, resulting in j == k and the while loop exits

Effectively, the algorithm only finds triples involving the first 1, and none involving the second 1.

In fact, duplicates cause 3 problems that must be addressed:

  1. 3 or more 0s will independently sum to 0. In fact, for n > 2 0s, there will be n choose 3 triples that sum to 0.
  2. A sequence of repeating numbers exists such that a pair of those numbers along with a third forms a triple that sums to 0. For example: `` -4, 1, 2, 2, 2`` contains 3 triples that sum to 0. When such a sequence is detected with length n > 2, there will be n choose 2 triples that sum to 0. Note that a correct handling of this case can also handle the case where 3 or more 0s exist.
  3. The situation in the above example when two sequences of repeating numbers exist such that the repeated element of each sequence along with a third number not equal to those elements sum to 0. For sequences of non-equal numbers x and y with respective lengths m >= 1 and n >= 1, such that there exists a number z | x+y+z = 0, there will be x*y triples that sum to 0 for each individual copy of z.

The following program handles duplicates correctly. Problem 1 and Problem 2 are handled in a single conditional, the first if statement in the while loop. Note that when such a situation occurs, we have effectively found the point at which j and k converge, and so we break out of the while loop. Problem 3 is handled in the second if statement in the while loop; the modification of j and k are again handled specially in this situation (since they may be increased or decreased by more than 1). The third if statement in the while loop handles the situation in which duplicates do not occur. The final else block in the while loop handles the normal increment of j or decrement of k when a triple with a non-zero sum is visited. The complete solution for handling duplicates is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
#include <iostream>
#include <sstream>

using namespace std;
int getRandInt( int *dataVec, int dataSize )
{
        // load up a vector with random integers
        int num;
        for( int i = 0; i < dataSize; i++ ) {
                // integers will be 1-100
                num = rand() % 100 +1;
                if( rand( ) % 2 == 0 ) {
                        // make some integers negative
                        num *= -1;
                }
                dataVec[i] = num;
        }
}


int main(int argc, char * argv[] )
{
        int dataSize = 0;

        if( argc < 2 ) {
                std::cerr << "usage: exe  [num of nums] [optional seed value]" << std::endl;
                exit( -1 );
        }
        {
                std::stringstream ss1;
                ss1 << argv[1];
                ss1 >> dataSize;
        }
        if( argc >= 3 ) {
                std::stringstream ss1;
                int seed;
                ss1 << argv[2];
                ss1 >> seed;
                srand( seed );
        }
        else {
                srand( 0 );
        }

        // create a data vector
        int *data = new int[ dataSize ];

        // load it up with random data
        getRandInt( data, dataSize );
        // sort the data
        sort( data, data + dataSize );

        // do the Sum3 computation.  O(n^2)
        int count = 0;
        int a,b,c, sum; // array elements
        int j,k;   // array indices
        for (int i = 0; i < dataSize-2; i++){
                a = data[i];
                j = i+1;
                k = dataSize-1;
                while( j < k ) {
                        b = data[j];
                        c = data[k];
                        sum = a+b+c;
                        if( sum == 0 && b == c ) {
                                // case where b == c.  ie, -10 + 5 + 5
                                // or where a == b == c == 0
                                int num = k-j+1;
                                count += (num*(num-1))/2;
                                break;
                        }
                        else if( sum == 0 && (data[j+1] == b || data[k-1] == c )){
                                // case where there are multiple copies of b or c
                                // find out how many b's and c's there are
                                int startj = j;
                                while( data[j+1] == b ) j++;
                                int startk = k;
                                while( data[k-1] == c ) k--;
                                count += (j-startj+1) * (startk-k+1);
                                j++;
                                k--;
                        }
                        else if( sum == 0 ){
                                // normal case
                                count++;
                                j++;
                        } else {
                                // if sum is not 0, increment j or k
                                if( sum < 0 ) j++;
                                else k--;
                        }
                }
        }
        cout<< count <<endl;
}

Here are some running times for the previous program:

Array Size Time (s)
500 .005
1000 .008
1500 .013
2000 .019
2500 .026
3000 .033
3500 .043
4000 .054
10000 .299
20000 1.142

Here we show the graph of running times for the cubic and quadratic versions of the algorithm. Notice the vast difference in running times for the two programs. Again, the cubic and quadratic nature of the algorithms are visible.

_images/plot_quadratic_img.png

Exercise

Create a Makefile and use the Makefile to compile the program listed above. Run the program using the time command on linux to see some running times. Collect running times on two different machines. Write a brief report indicating your results and listing the machine specifications. List reasons why one machine is faster than the other.

To find out machine specs on linux, check out the lspci abd lscpu commands, and check out the /proc filesystem. Check the man pages and/or google for information on how to use/interpret these resources