DASH Algorithms

See overview of the STL algorithm collection

Non-modifying sequence operations

STL Algorithm Synopsis Effort DASH Equivalent Scope
all_of, any_of, none_of checks if a predicate is true trivial - local + reduce
for_each applies a function to a range of elements trivial - pure local
count, count_if number of elements satisfying a predicate trivial - local + reduce
mismatch find first position where two ranges differ trivial - local + reduce
equal if two sets of elements are the same trivial - local + reduce
find, find_if, find_if_not find first element satisfying predicate trivial - local + reduce
find_end finds the last sequence of elements in range moderate - local + global
find_first_of searches for any one of a set of elements trivial - local + reduce
adjacent_find find first two adjacent items that are equal trivial - local + reduce
search searches for a range of elements trivial - local + reduce
search_n searches for a number consecutive copies trivial - local + reduce

Modifying sequence operations

STL Algorithm Synopsis Effort DASH Equivalent Scope
copy, copy_if copies a range of elements to a new location high dash::copy global
copy_n copies a number of elements to a new location high - global
copy_backward copies a range of elements in backwards order high - global
move moves a range of elements to a new location moderate - global
move_backward moves a range of elements in backwards order moderate - global
fill assigns a range of elements a certain value trivial dash::fill pure local
fill_n assigns a value to a number of elements trivial - pure local
transform applies a function to a range of elements trivial dash::transform pure local
generate saves the result of a function in a range trivial dash::generate pure local
generate_n saves the result of N applications of a function trivial - pure local

Partitioning operations

STL Algorithm Synopsis Effort DASH Equivalent Scope
is_partitioned if range is partitioned by given predicate trivial - local + reduce
partition divides a range of elements into two groups moderate - local + global
partition_copy copy range dividing the elements into two groups moderate - local + global
stable_partition partition while preserving relative order moderate - local + global
partition_point locate partition point of a partitioned range trivial - local + global

Minimum/maximum operations

STL Algorithm Synopsis Effort DASH Equivalent Scope
max_element returns the largest element in a range trivial dash::max_element local + reduce
min_element returns the smallest element in a range trivial dash::min_element local + reduce
minmax_element smallest and the largest element in a range trivial - local + reduce

Numeric operations

STL Algorithm Synopsis Effort DASH Equivalent Scope
accumulate sums up a range of elements trivial - local + reduce
partial_sum computes the partial sum of a range of elements trivial - local + global
reduce like accumulate, except out of order trivial - local + reduce
exclusive_scan partial_sum excluding element i from sum i trivial - local + global
inclusive_scan partial_sum including element i from sum i trivial - local + global
transform_reduce applies a functor, then reduces out of order trivial - local + reduce

ForEach (dash::for_each)

Fill (dash::fill)

Assigns the given value to the elements in the range [first, last).

Synopsis

dash::fill<T>(
  ForwardIt first,
  ForwardIt last,
  const T & value
);

Generate (dash::generate, dash::generate_n)

Assigns each element in range [first, last) a value generated by the given function object gen.

Synopsis

dash::generate<T>(
  ForwardIt first,
  ForwardIt last,
  Generator gen
);

Copy (dash::copy)

See std::copy in C++ STL

Synopsis

OutputIt out_last =     // B[n], end of target range
  dash::copy(
    InputIt  src_first, // A[0], begin of origin range
    InputIt  src_last,  // A[n], end of origin range
    OutputIt dest_first // B[0], begin of target range
  );

Note that the return type is specified by the type of the target iterator dest_first. When passing a GlobAsyncIter for non-blocking copying, the return value implements a future.

Cases

Local to Global, blocking
Range Type Description
\(A\) *T Contained in single block
\(B\) GlobIter<T>
Global to Global / Global to Local, blocking
Range Type Description
\(A\) GlobIter<T> Contained in single block
\(B\) GlobIter<T>
Local to Global, non-blocking
Range Type Description
\(A\) *T Contained in single block
\(B\) GlobAsyncIter<T> also specifies return type (GlobAsyncIter<T>)

Example:

// Start asynchronous copying
GlobAsyncIter<T> dest_last =
    dash::copy(array_a.lbegin(),
               array_a.lend(),
               array_b.async[200]);
// Overlapping computation here
// ...
// Wait for completion of asynchronous copying:
dest_last.fence();
Global to Global / Global to Local, non-blocking
Range Type Description
\(A\) GlobAsyncIter<T> Contained in single block
\(B\) GlobAsyncIter<T> also specifies return type (GlobAsyncIter<T>)

Example (global to global):

// Start asynchronous copying
GlobAsyncIter<T> dest_last =
    dash::copy(array_a.async[100],
               array_a.async[200],
               array_b.async[100]);
// Overlapping computation here
// ...
// Wait for completion of asynchronous copying:
dest_last.fence();

Example (global to local):

// Start asynchronous copying
GlobAsyncIter<T> dest_last =
    dash::copy(array_a.async[100],
               array_a.async[200],
               array_b.async.lbegin());
// Overlapping computation here
// ...
// Wait for completion of asynchronous copying:
dest_last.fence();

Reduce (dash::reduce)

See reduce operations in OpenSHMEM

Transform (dash::transform)

Synopsis

GlobOutputIt out_last = 
  dash::transform(
    InputIt         in_a_first, // A[0]
    InputIt         in_a_last,  // A[n]
    GlobIntputIt    in_b_first, // B
    GlobOutputIt    out_first,  // C <- op(A, B)
    ReduceOperation op          // op
  );

Cases

With global disposition \(g\) and respective local disposition \(l\), assuming \(L_{l+x} = G_{g+x}\):

Global += Local:
Range Type Description
\(A\) *T Contained in single block
\(B\) GlobIter<T> \(B_g := op(B_g, A_l)\)
\(C\) GlobIter<T> \(= B\)

Example:

dash::transform(lvalues.lbegin(),    // A[0]
                lvalues.lend(),      // A[n]
                gdest[g],            // B
                gdest[g],            // C = B <- dash::plus<int>(B, A)
                dash::plus<int>());

First input range is does not need to be transformed as it is local (native) already, as expected by dart_accumulate_*.

Global += Global
Range Type Description
\(A\) GlobIter<T>
\(B\) GlobIter<T> \(B_g := op(B_g, A_g)\)
\(C\) GlobIter<T> \(= B\)

Example:

dash::transform(gvalues.begin(),     // A[0]
                gvalues.end(),       // A[n]
                gdest[g],            // B
                gdest[g],            // C = B <- dash::plus<int>(B, A)
                dash::plus<int>());

First input range gvalues has to be transformed to local subrange, which is then passed to dart_accumulate_*. In effect, every unit only accumulates local values of \(A\).

Global = Global + Local
Range Type Description
\(A\) GlobIter<T>
\(B\) GlobIter<T> \(\neq C\)
\(C\) GlobIter<T> \(C_g := op(B_g, A_l)\)

Example:

dash::transform(lvalues.lbegin(),    // A[0]
                lvalues.lend(),      // A[n]
                gvalues[g],          // B
                gdest[g],            // C <- dash::plus<int>(B, A)
                dash::plus<int>());