DASH Example bench.08.min-element

Examples Index

Synopsis:

Algorithm benchmark application for performance evaluation of the dash::min_element implementation.

Usage (DART-MPI):

$ DASH_MAX_UNIT_THREADS=<T> mpirun -n <P> ./bin/bench.08.min-element.mpi

Options

Parameter Description Default
\(\texttt{T}\) Number of threads per unit (process) total CPU cores / \(P\)
\(\texttt{P}\) Number of units (processes) -
\(\texttt{-smin}~<S_{min}>\) Minimum size of the distributed array to be searched 8000000
\(\texttt{-sb}~<S_{base}>\) Exp. base of the size of the distributed array to be searched 2
\(\texttt{-rmax}~<R_{max}>\) Maximum number of repetitions per test iteration 2560
\(\texttt{-rmin}~<R_{min}>\) Minimum number of repetitions per test iteration 10
\(\texttt{-rb}~<R_{base}>\) Exp. base of number of repetitions per test iteration 2
\(\texttt{-i}~<I>\) Number of iterations (1 measurement / iteration) 8
\(\texttt{-v}~<V>\) Whether to verify results (1) or not (0) 0

Array size \(S\) and number of repetitions \(R\) in iteration \(i\) are:

\[ \begin{align} S & = (S_{base})^i \cdot S_{min} \\ R & = \text{max}(R_{min}, R_{max} / (R_{base})^i) \end{align} \]

Evaluation of Locality-based Load-Balancing

Trace visualization illustrating locality-based load balancing

Trace visualization illustrating locality-based load balancing

The implementation of dash::min_element supports OpenMP to parallelize the units’ local task.
As SuperMIC is a heterogenous system, the benchmark demonstrates automatic load balancing capabilities based on locality discovery provided by DASH.

The trace timeline illustrates the time span spent in the units’ respective local operation.
The time to completion should be balanced between units on hosts and MIC targets.

Also see the SuperMIC system description and developer notes

Sample Output

projekt03: 4 units, 3 threads / unit

$ DASH_MAX_UNIT_THREADS=3 mpirun -n 4 numactl --physcpubind=0-11 ./bin/bench.08.min-element.mpi

Measurements:

units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
    4,    mpich,    2560,    8000000,   6.57,     2552.00,     2562.00,     4906.00,       49.33,    6.60, 2974.77
    4,    mpich,    1280,   16000000,   6.58,     5127.00,     5138.00,     5459.00,       21.68,    6.62, 2968.78
    4,    mpich,     640,   32000000,   6.55,    10202.00,    10224.00,    10497.00,       26.31,    6.61, 2983.94
    4,    mpich,     320,   64000000,   6.53,    20371.00,    20393.00,    20632.00,       29.50,    6.64, 2992.24
    4,    mpich,     160,  128000000,   6.52,    40698.00,    40732.00,    40977.00,       23.95,    6.74, 2996.77
    4,    mpich,      80,  256000000,   6.51,    81329.00,    81368.00,    81833.00,       57.97,    6.91, 3000.05
    4,    mpich,      40,  512000000,   6.51,   162610.00,   162671.00,   162772.00,       31.65,    7.32, 3001.60
    4,    mpich,      20, 1024000000,   6.51,   325232.00,   325283.00,   325344.00,       24.20,    8.14, 3002.19

Full output with preamble

projekt03: 4 units, 2 threads / unit

$ DASH_MAX_UNIT_THREADS=2 mpirun -n 4 numactl --physcpubind=0-11 ./bin/bench.08.min-element.mpi

Measurements:

units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
    4,    mpich,    2560,    8000000,   9.77,     3804.00,     3812.00,     6038.00,       47.99,    9.80, 1999.79
    4,    mpich,    1280,   16000000,   9.79,     7633.00,     7646.00,     7947.00,       26.84,    9.83, 1995.11
    4,    mpich,     640,   32000000,   9.75,    15220.00,    15232.00,    15638.00,       38.30,    9.81, 2002.59
    4,    mpich,     320,   64000000,   9.73,    30387.00,    30404.00,    30715.00,       53.13,    9.84, 2006.63
    4,    mpich,     160,  128000000,   9.72,    60711.00,    60758.00,    61288.00,       75.49,    9.94, 2008.51
    4,    mpich,      80,  256000000,   9.72,   121396.00,   121443.00,   121765.00,       89.94,   10.12, 2009.90
    4,    mpich,      40,  512000000,   9.71,   242766.00,   242808.00,   243295.00,      133.48,   10.58, 2010.46
    4,    mpich,      20, 1024000000,   9.71,   485498.00,   485619.00,   486344.00,      177.93,   11.29, 2010.89

Full output with preamble

Unbalanced: SuperMIC: Host: 16p x 1t, MICs: 2 x 6p x 40t

Job file template

$ subgenmiccmd ~/jobs/supermic/symmetric/bench.08.min-element.tpl <num nodes> 16 1 6 40

1 Node: (Full output)

units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s,   mkeys/s
   28, intelmpi,    6400,    8000000,   8.11,     1165.00,     1207.00,   186115.00,     2312.26,    9.70,   6020.59
   28, intelmpi,    3200,   16000000,   7.22,     2154.00,     2206.00,     4536.00,      121.21,    8.13,   6758.88
   28, intelmpi,    1600,   32000000,   6.67,     3976.00,     4140.00,     5026.00,      110.06,    7.22,   7322.65
   28, intelmpi,     800,   64000000,   6.52,     7620.00,     8071.00,     9954.00,      330.15,    6.95,   7484.49
   28, intelmpi,     400,  128000000,   6.11,    14737.00,    15215.00,    16495.00,      314.95,    6.59,   7991.33
   28, intelmpi,     200,  256000000,   5.98,    28987.00,    29866.00,    31040.00,      463.84,    6.67,   8164.92
   28, intelmpi,     128,  512000000,   7.71,    57867.00,    60388.00,    61560.00,      709.47,    9.02,   8101.16
   28, intelmpi,     128, 1024000000,  15.06,   113894.00,   117721.00,   121243.00,     1775.94,   17.61,   8300.61

2 Nodes (Full output)

units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s,   mkeys/s
   56, intelmpi,    6400,    8000000,   6.59,      768.00,     1003.00,   116102.00,     1447.72,   14.36,   7409.08
   56, intelmpi,    3200,   16000000,   4.97,     1324.00,     1484.00,     3622.00,      202.56,    9.11,   9823.13
   56, intelmpi,    1600,   32000000,   4.39,     2282.00,     2612.00,     4502.00,      375.74,    6.59,  11132.94
   56, intelmpi,     800,   64000000,   3.93,     4182.00,     4910.00,     6460.00,      195.29,    5.13,  12437.91
   56, intelmpi,     400,  128000000,   3.27,     7738.00,     8064.00,    11282.00,      348.53,    4.02,  14947.25
   56, intelmpi,     200,  256000000,   3.05,    14815.00,    15245.00,    16125.00,      246.63,    3.69,  15985.42
   56, intelmpi,     128,  512000000,   3.98,    29196.00,    31157.00,    31807.00,      501.32,    4.83,  15720.55
   56, intelmpi,     128, 1024000000,   7.89,    60392.00,    61574.00,    63773.00,      695.53,    9.34,  15843.19

4 Nodes: (Full output)

units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s,   mkeys/s
  112, intelmpi,    6400,    8000000,   5.30,      506.00,      783.00,   158868.00,     1979.05,   30.24,   9217.71
  112, intelmpi,    3200,   16000000,   3.63,      848.00,     1132.00,     1851.00,      151.35,   16.07,  13463.35
  112, intelmpi,    1600,   32000000,   2.97,     1392.00,     1754.00,     2795.00,      229.85,    9.28,  16464.01
  112, intelmpi,     800,   64000000,   2.24,     2359.00,     2693.00,     4500.00,      348.67,    5.50,  21759.82
  112, intelmpi,     400,  128000000,   1.91,     4362.00,     4698.00,     7401.00,      360.69,    3.63,  25528.54
  112, intelmpi,     200,  256000000,   1.70,     8008.00,     8393.00,    10947.00,      423.67,    2.70,  28727.33
  112, intelmpi,     128,  512000000,   2.34,    15303.00,    19059.00,    20032.00,     1514.33,    3.23,  26752.55
  112, intelmpi,     128, 1024000000,   4.01,    30452.00,    31354.00,    32848.00,      386.93,    5.18,  31163.63

8 Nodes: (Full output)

units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s,   mkeys/s
  224, intelmpi,    6400,    8000000,   6.81,      661.00,      873.00,   597949.00,     7464.86,  129.21,   7165.12
  224, intelmpi,    3200,   16000000,   3.56,      731.00,     1025.00,     2494.00,      202.22,   64.38,  13708.49
  224, intelmpi,    1600,   32000000,   2.49,     1134.00,     1451.00,     3504.00,      216.63,   32.97,  19617.50
  224, intelmpi,     800,   64000000,   2.08,     2127.00,     2658.00,     3420.00,      223.53,   17.29,  23523.00
  224, intelmpi,     400,  128000000,   1.66,     2670.00,     4174.00,     5483.00,      554.02,    9.33,  29370.88
  224, intelmpi,     200,  256000000,   1.29,     4679.00,     6518.00,     8855.00,     1139.26,    5.18,  37770.27
  224, intelmpi,     128,  512000000,   1.78,    10355.00,    14620.00,    16597.00,     1941.80,    4.41,  35020.32
  224, intelmpi,     128, 1024000000,   3.82,    25534.00,    30075.00,    31856.00,     1156.03,    6.62,  32690.44

Unbalanced: SuperMIC: Host: 16p x 2t, MICs: 2 x 6p x 40t

Job file template

$ subgenmiccmd ~/jobs/supermic/symmetric/bench.08.min-element.tpl <num nodes> 16 2 6 40

1 Node: (Full output)

units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s,   mkeys/s
   28, intelmpi,    3200,   18000000,   5.86,     1688.00,     1740.00,   224097.00,     3930.12,    6.74,   9378.87
   28, intelmpi,    1600,   36000000,   5.38,     3297.00,     3337.00,     3794.00,       59.24,    5.90,  10216.01
   28, intelmpi,     800,   72000000,   5.16,     6359.00,     6428.00,     7090.00,       63.97,    5.51,  10653.75
   28, intelmpi,     400,  144000000,   5.04,    12488.00,    12585.00,    12964.00,       73.56,    5.39,  10899.55
   28, intelmpi,     200,  288000000,   4.98,    24751.00,    24890.00,    25236.00,       90.36,    5.49,  11027.44
   28, intelmpi,     128,  576000000,   6.33,    49262.00,    49425.00,    49955.00,      123.48,    7.25,  11111.81
   28, intelmpi,     128, 1152000000,  12.61,    98135.00,    98462.00,   105764.00,      667.84,   14.45,  11148.26
   28, intelmpi,     128, 2304000000,  25.15,   196080.00,   196440.00,   197094.00,      226.27,   28.68,  11184.35

2 Nodes: (Full output)

units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s,   mkeys/s
   56, intelmpi,    3200,   18000000,   3.40,      914.00,      999.00,   130060.00,     2283.87,    7.38,  16160.90
   56, intelmpi,    1600,   36000000,   2.95,     1726.00,     1848.00,     2254.00,       66.34,    5.07,  18615.68
   56, intelmpi,     800,   72000000,   2.72,     3336.00,     3387.00,     3794.00,       55.63,    3.87,  20168.75
   56, intelmpi,     400,  144000000,   2.60,     6408.00,     6477.00,     6961.00,       67.52,    3.26,  21150.67
   56, intelmpi,     200,  288000000,   2.53,    12540.00,    12645.00,    13075.00,       75.58,    3.04,  21701.84
   56, intelmpi,     128,  576000000,   3.20,    24829.00,    24971.00,    25684.00,      110.04,    3.83,  21984.29
   56, intelmpi,     128, 1152000000,   6.35,    49279.00,    49508.00,    60223.00,      951.65,    7.44,  22138.23
   56, intelmpi,     128, 2304000000,  12.61,    98207.00,    98501.00,    99079.00,      153.29,   14.57,  22302.89

4 Nodes: (Full output)

units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s,  mkeys/s
  112, intelmpi,    3200,   18000000,   2.57,      607.00,      691.00,   186093.00,     3278.42,   14.40,  21349.23
  112, intelmpi,    1600,   36000000,   1.83,      995.00,     1116.00,     3213.00,      120.65,    7.62,  29989.68
  112, intelmpi,     800,   72000000,   1.58,     1831.00,     1962.00,     3185.00,       88.00,    4.57,  34720.85
  112, intelmpi,     400,  144000000,   1.41,     3386.00,     3518.00,     4094.00,       78.54,    2.96,  38878.40
  112, intelmpi,     200,  288000000,   1.32,     6507.00,     6592.00,     7101.00,       85.11,    2.19,  41524.85
  112, intelmpi,     128,  576000000,   1.64,    12645.00,    12760.00,    13215.00,      102.19,    2.35,  42965.25
  112, intelmpi,     128, 1152000000,   3.21,    24887.00,    25053.00,    25626.00,      147.65,    4.18,  43778.33
  112, intelmpi,     128, 2304000000,   6.36,    49405.00,    49637.00,    50156.00,      144.99,    7.79,  44241.80

8 Nodes: (Full output)

units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s,   mkeys/s
  224, intelmpi,    3200,   18000000,   4.42,      576.00,     1335.00,   600904.00,    10606.40,   66.08,  12421.61
  224, intelmpi,    1600,   36000000,   2.02,      752.00,     1414.00,     4594.00,      307.05,   32.59,  27149.38
  224, intelmpi,     800,   72000000,   1.29,     1179.00,     1601.00,     2592.00,      330.16,   16.57,  42495.30
  224, intelmpi,     400,  144000000,   1.00,     1994.00,     2679.00,     3349.00,      336.83,    8.66,  55151.75
  224, intelmpi,     200,  288000000,   0.84,     3579.00,     4402.00,     5099.00,      379.67,    4.71,  65173.53
  224, intelmpi,     128,  576000000,   0.93,     6714.00,     7450.00,     8069.00,      360.12,    3.48,  75700.07
  224, intelmpi,     128, 1152000000,   1.72,    12825.00,    13635.00,    14178.00,      385.89,    4.40,  81856.38
  224, intelmpi,     128, 2304000000,   3.29,    25060.00,    25858.00,    27102.00,      378.56,    6.19,  85505.06

16 Nodes: (Full output)

units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s,   mkeys/s
  448, intelmpi,     800,   18000000,   2.03,     1007.00,     1643.00,   638062.00,    22486.95,   70.62,   6763.99
  448, intelmpi,     400,   36000000,   0.68,     1104.00,     1659.00,     4345.00,      342.27,   33.98,  20115.94
  448, intelmpi,     200,   72000000,   0.38,     1324.00,     1874.00,     3141.00,      270.82,   17.05,  35925.76
  448, intelmpi,     128,  144000000,   0.28,     1617.00,     2139.00,     4740.00,      364.55,   11.00,  62307.93
  448, intelmpi,     128,  288000000,   0.40,     2532.00,     3072.00,     5466.00,      365.13,   11.10,  87572.52
  448, intelmpi,     128,  576000000,   0.60,     3905.00,     4553.00,     7642.00,      473.67,   11.33, 117611.10
  448, intelmpi,     128, 1152000000,   1.00,     6835.00,     7677.00,    13279.00,      785.81,   11.79, 140227.46
  448, intelmpi,     128, 2304000000,   1.79,    12966.00,    13809.00,    20651.00,      753.61,   12.71, 157445.45

Balanced: SuperMIC: Host: 16p x 2t, MICs: 2 x 6p x 40t

units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s,   mkeys/s
   28, intelmpi,    3200,   18000000,   5.38,     1574.00,     1617.00,    81116.00,     1408.07,    6.31,  10213.16
   28, intelmpi,    1600,   36000000,   5.00,     3053.00,     3096.00,     4738.00,       84.34,    5.55,  10985.86
   28, intelmpi,     800,   72000000,   4.75,     5859.00,     5923.00,     6396.00,       59.87,    5.12,  11565.41
   28, intelmpi,     400,  144000000,   4.63,    11493.00,    11560.00,    12083.00,       76.94,    5.00,  11860.26
   28, intelmpi,     200,  288000000,   4.57,    22727.00,    22822.00,    23405.00,       87.34,    5.10,  12026.77
   28, intelmpi,     128,  576000000,   5.81,    45201.00,    45371.00,    45742.00,      115.39,    6.81,  12102.61
   28, intelmpi,     128, 1152000000,  11.58,    90233.00,    90465.00,    91147.00,      144.58,   13.51,  12140.26
   28, intelmpi,     128, 2304000000,  23.13,   180253.00,   180664.00,   181519.00,      234.93,   26.96,  12160.96
units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s,   mkeys/s
   56, intelmpi,    3200,   18000000,   3.32,      822.00,      967.00,    89915.00,     1574.87,    7.31,  16557.37
   56, intelmpi,    1600,   36000000,   2.78,     1600.00,     1690.00,     3058.00,      122.47,    4.91,  19771.57
   56, intelmpi,     800,   72000000,   2.50,     3025.00,     3099.00,     5022.00,      104.53,    3.65,  21948.74
   56, intelmpi,     400,  144000000,   2.37,     5838.00,     5898.00,     6581.00,       67.82,    3.04,  23225.73
   56, intelmpi,     200,  288000000,   2.31,    11382.00,    11519.00,    11901.00,       67.51,    2.83,  23804.83
   56, intelmpi,     128,  576000000,   2.92,    22641.00,    22772.00,    23399.00,      128.91,    3.58,  24081.48
   56, intelmpi,     128, 1152000000,   5.80,    45092.00,    45247.00,    45883.00,      132.91,    6.94,  24259.46
   56, intelmpi,     128, 2304000000,  11.56,    90039.00,    90324.00,    90789.00,      130.59,   13.73,  24327.70
units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s,   mkeys/s
  112, intelmpi,    3200,   18000000,   3.19,      671.00,      755.00,   554539.00,     9802.77,   15.34,  17240.91
  112, intelmpi,    1600,   36000000,   1.88,     1018.00,     1113.00,     1782.00,      123.08,    7.87,  29165.82
  112, intelmpi,     800,   72000000,   1.49,     1704.00,     1813.00,     2869.00,      124.13,    4.55,  36866.03
  112, intelmpi,     400,  144000000,   1.31,     3175.00,     3263.00,     3775.00,       76.16,    2.90,  41834.10
  112, intelmpi,     200,  288000000,   1.22,     5988.00,     6070.00,     6413.00,       81.05,    2.12,  45072.63
  112, intelmpi,     128,  576000000,   1.51,    11619.00,    11714.00,    16103.00,      564.01,    2.26,  46445.81
  112, intelmpi,     128, 1152000000,   2.95,    22883.00,    22984.00,    23649.00,      109.95,    3.94,  47729.03
  112, intelmpi,     128, 2304000000,   5.83,    45341.00,    45514.00,    46016.00,      125.44,    7.30,  48232.91
units, mpi.impl, repeats,       size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s,   mkeys/s
  224, intelmpi,    3200,   18000000,   4.54,      637.00,     1331.00,   621427.00,    10968.71,   66.76,  12091.51
  224, intelmpi,    1600,   36000000,   2.12,      871.00,     1158.00,     5575.00,      317.87,   32.97,  25877.84
  224, intelmpi,     800,   72000000,   1.41,     1253.00,     1689.00,     2872.00,      303.87,   16.78,  38998.22
  224, intelmpi,     400,  144000000,   1.06,     2106.00,     2796.00,     3475.00,      317.98,    8.77,  51741.61
  224, intelmpi,     200,  288000000,   0.89,     3703.00,     4545.00,     6853.00,      343.09,    4.77,  61429.47
  224, intelmpi,     128,  576000000,   0.91,     6135.00,     6927.00,    10256.00,      681.82,    3.49,  77069.83
  224, intelmpi,     128, 1152000000,   1.71,    11762.00,    13567.00,    16112.00,     1026.50,    4.40,  82360.67
  224, intelmpi,     128, 2304000000,   3.19,    22969.00,    23972.00,    30356.00,     1666.24,    6.14,  88227.13