DASH Example bench.08.min-element
Synopsis:
Algorithm benchmark application for performance evaluation of the dash::min_element
implementation.
Usage (DART-MPI):
$ DASH_MAX_UNIT_THREADS=<T> mpirun -n <P> ./bin/bench.08.min-element.mpi
Options
Parameter | Description | Default |
---|---|---|
\(\texttt{T}\) | Number of threads per unit (process) | total CPU cores / \(P\) |
\(\texttt{P}\) | Number of units (processes) | - |
\(\texttt{-smin}~<S_{min}>\) | Minimum size of the distributed array to be searched | 8000000 |
\(\texttt{-sb}~<S_{base}>\) | Exp. base of the size of the distributed array to be searched | 2 |
\(\texttt{-rmax}~<R_{max}>\) | Maximum number of repetitions per test iteration | 2560 |
\(\texttt{-rmin}~<R_{min}>\) | Minimum number of repetitions per test iteration | 10 |
\(\texttt{-rb}~<R_{base}>\) | Exp. base of number of repetitions per test iteration | 2 |
\(\texttt{-i}~<I>\) | Number of iterations (1 measurement / iteration) | 8 |
\(\texttt{-v}~<V>\) | Whether to verify results (1) or not (0) | 0 |
Array size \(S\) and number of repetitions \(R\) in iteration \(i\) are:
\[ \begin{align} S & = (S_{base})^i \cdot S_{min} \\ R & = \text{max}(R_{min}, R_{max} / (R_{base})^i) \end{align} \]
Evaluation of Locality-based Load-Balancing

Trace visualization illustrating locality-based load balancing
The implementation of dash::min_element
supports OpenMP to parallelize the units’ local task.
As SuperMIC is a heterogenous system, the benchmark demonstrates automatic load balancing capabilities based on locality discovery provided by DASH.
The trace timeline illustrates the time span spent in the units’ respective local operation.
The time to completion should be balanced between units on hosts and MIC targets.
Also see the SuperMIC system description and developer notes
Sample Output
projekt03: 4 units, 3 threads / unit
$ DASH_MAX_UNIT_THREADS=3 mpirun -n 4 numactl --physcpubind=0-11 ./bin/bench.08.min-element.mpi
Measurements:
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
4, mpich, 2560, 8000000, 6.57, 2552.00, 2562.00, 4906.00, 49.33, 6.60, 2974.77
4, mpich, 1280, 16000000, 6.58, 5127.00, 5138.00, 5459.00, 21.68, 6.62, 2968.78
4, mpich, 640, 32000000, 6.55, 10202.00, 10224.00, 10497.00, 26.31, 6.61, 2983.94
4, mpich, 320, 64000000, 6.53, 20371.00, 20393.00, 20632.00, 29.50, 6.64, 2992.24
4, mpich, 160, 128000000, 6.52, 40698.00, 40732.00, 40977.00, 23.95, 6.74, 2996.77
4, mpich, 80, 256000000, 6.51, 81329.00, 81368.00, 81833.00, 57.97, 6.91, 3000.05
4, mpich, 40, 512000000, 6.51, 162610.00, 162671.00, 162772.00, 31.65, 7.32, 3001.60
4, mpich, 20, 1024000000, 6.51, 325232.00, 325283.00, 325344.00, 24.20, 8.14, 3002.19
projekt03: 4 units, 2 threads / unit
$ DASH_MAX_UNIT_THREADS=2 mpirun -n 4 numactl --physcpubind=0-11 ./bin/bench.08.min-element.mpi
Measurements:
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
4, mpich, 2560, 8000000, 9.77, 3804.00, 3812.00, 6038.00, 47.99, 9.80, 1999.79
4, mpich, 1280, 16000000, 9.79, 7633.00, 7646.00, 7947.00, 26.84, 9.83, 1995.11
4, mpich, 640, 32000000, 9.75, 15220.00, 15232.00, 15638.00, 38.30, 9.81, 2002.59
4, mpich, 320, 64000000, 9.73, 30387.00, 30404.00, 30715.00, 53.13, 9.84, 2006.63
4, mpich, 160, 128000000, 9.72, 60711.00, 60758.00, 61288.00, 75.49, 9.94, 2008.51
4, mpich, 80, 256000000, 9.72, 121396.00, 121443.00, 121765.00, 89.94, 10.12, 2009.90
4, mpich, 40, 512000000, 9.71, 242766.00, 242808.00, 243295.00, 133.48, 10.58, 2010.46
4, mpich, 20, 1024000000, 9.71, 485498.00, 485619.00, 486344.00, 177.93, 11.29, 2010.89
Unbalanced: SuperMIC: Host: 16p x 1t, MICs: 2 x 6p x 40t
$ subgenmiccmd ~/jobs/supermic/symmetric/bench.08.min-element.tpl <num nodes> 16 1 6 40
1 Node: (Full output)
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
28, intelmpi, 6400, 8000000, 8.11, 1165.00, 1207.00, 186115.00, 2312.26, 9.70, 6020.59
28, intelmpi, 3200, 16000000, 7.22, 2154.00, 2206.00, 4536.00, 121.21, 8.13, 6758.88
28, intelmpi, 1600, 32000000, 6.67, 3976.00, 4140.00, 5026.00, 110.06, 7.22, 7322.65
28, intelmpi, 800, 64000000, 6.52, 7620.00, 8071.00, 9954.00, 330.15, 6.95, 7484.49
28, intelmpi, 400, 128000000, 6.11, 14737.00, 15215.00, 16495.00, 314.95, 6.59, 7991.33
28, intelmpi, 200, 256000000, 5.98, 28987.00, 29866.00, 31040.00, 463.84, 6.67, 8164.92
28, intelmpi, 128, 512000000, 7.71, 57867.00, 60388.00, 61560.00, 709.47, 9.02, 8101.16
28, intelmpi, 128, 1024000000, 15.06, 113894.00, 117721.00, 121243.00, 1775.94, 17.61, 8300.61
2 Nodes (Full output)
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
56, intelmpi, 6400, 8000000, 6.59, 768.00, 1003.00, 116102.00, 1447.72, 14.36, 7409.08
56, intelmpi, 3200, 16000000, 4.97, 1324.00, 1484.00, 3622.00, 202.56, 9.11, 9823.13
56, intelmpi, 1600, 32000000, 4.39, 2282.00, 2612.00, 4502.00, 375.74, 6.59, 11132.94
56, intelmpi, 800, 64000000, 3.93, 4182.00, 4910.00, 6460.00, 195.29, 5.13, 12437.91
56, intelmpi, 400, 128000000, 3.27, 7738.00, 8064.00, 11282.00, 348.53, 4.02, 14947.25
56, intelmpi, 200, 256000000, 3.05, 14815.00, 15245.00, 16125.00, 246.63, 3.69, 15985.42
56, intelmpi, 128, 512000000, 3.98, 29196.00, 31157.00, 31807.00, 501.32, 4.83, 15720.55
56, intelmpi, 128, 1024000000, 7.89, 60392.00, 61574.00, 63773.00, 695.53, 9.34, 15843.19
4 Nodes: (Full output)
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
112, intelmpi, 6400, 8000000, 5.30, 506.00, 783.00, 158868.00, 1979.05, 30.24, 9217.71
112, intelmpi, 3200, 16000000, 3.63, 848.00, 1132.00, 1851.00, 151.35, 16.07, 13463.35
112, intelmpi, 1600, 32000000, 2.97, 1392.00, 1754.00, 2795.00, 229.85, 9.28, 16464.01
112, intelmpi, 800, 64000000, 2.24, 2359.00, 2693.00, 4500.00, 348.67, 5.50, 21759.82
112, intelmpi, 400, 128000000, 1.91, 4362.00, 4698.00, 7401.00, 360.69, 3.63, 25528.54
112, intelmpi, 200, 256000000, 1.70, 8008.00, 8393.00, 10947.00, 423.67, 2.70, 28727.33
112, intelmpi, 128, 512000000, 2.34, 15303.00, 19059.00, 20032.00, 1514.33, 3.23, 26752.55
112, intelmpi, 128, 1024000000, 4.01, 30452.00, 31354.00, 32848.00, 386.93, 5.18, 31163.63
8 Nodes: (Full output)
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
224, intelmpi, 6400, 8000000, 6.81, 661.00, 873.00, 597949.00, 7464.86, 129.21, 7165.12
224, intelmpi, 3200, 16000000, 3.56, 731.00, 1025.00, 2494.00, 202.22, 64.38, 13708.49
224, intelmpi, 1600, 32000000, 2.49, 1134.00, 1451.00, 3504.00, 216.63, 32.97, 19617.50
224, intelmpi, 800, 64000000, 2.08, 2127.00, 2658.00, 3420.00, 223.53, 17.29, 23523.00
224, intelmpi, 400, 128000000, 1.66, 2670.00, 4174.00, 5483.00, 554.02, 9.33, 29370.88
224, intelmpi, 200, 256000000, 1.29, 4679.00, 6518.00, 8855.00, 1139.26, 5.18, 37770.27
224, intelmpi, 128, 512000000, 1.78, 10355.00, 14620.00, 16597.00, 1941.80, 4.41, 35020.32
224, intelmpi, 128, 1024000000, 3.82, 25534.00, 30075.00, 31856.00, 1156.03, 6.62, 32690.44
Unbalanced: SuperMIC: Host: 16p x 2t, MICs: 2 x 6p x 40t
$ subgenmiccmd ~/jobs/supermic/symmetric/bench.08.min-element.tpl <num nodes> 16 2 6 40
1 Node: (Full output)
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
28, intelmpi, 3200, 18000000, 5.86, 1688.00, 1740.00, 224097.00, 3930.12, 6.74, 9378.87
28, intelmpi, 1600, 36000000, 5.38, 3297.00, 3337.00, 3794.00, 59.24, 5.90, 10216.01
28, intelmpi, 800, 72000000, 5.16, 6359.00, 6428.00, 7090.00, 63.97, 5.51, 10653.75
28, intelmpi, 400, 144000000, 5.04, 12488.00, 12585.00, 12964.00, 73.56, 5.39, 10899.55
28, intelmpi, 200, 288000000, 4.98, 24751.00, 24890.00, 25236.00, 90.36, 5.49, 11027.44
28, intelmpi, 128, 576000000, 6.33, 49262.00, 49425.00, 49955.00, 123.48, 7.25, 11111.81
28, intelmpi, 128, 1152000000, 12.61, 98135.00, 98462.00, 105764.00, 667.84, 14.45, 11148.26
28, intelmpi, 128, 2304000000, 25.15, 196080.00, 196440.00, 197094.00, 226.27, 28.68, 11184.35
2 Nodes: (Full output)
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
56, intelmpi, 3200, 18000000, 3.40, 914.00, 999.00, 130060.00, 2283.87, 7.38, 16160.90
56, intelmpi, 1600, 36000000, 2.95, 1726.00, 1848.00, 2254.00, 66.34, 5.07, 18615.68
56, intelmpi, 800, 72000000, 2.72, 3336.00, 3387.00, 3794.00, 55.63, 3.87, 20168.75
56, intelmpi, 400, 144000000, 2.60, 6408.00, 6477.00, 6961.00, 67.52, 3.26, 21150.67
56, intelmpi, 200, 288000000, 2.53, 12540.00, 12645.00, 13075.00, 75.58, 3.04, 21701.84
56, intelmpi, 128, 576000000, 3.20, 24829.00, 24971.00, 25684.00, 110.04, 3.83, 21984.29
56, intelmpi, 128, 1152000000, 6.35, 49279.00, 49508.00, 60223.00, 951.65, 7.44, 22138.23
56, intelmpi, 128, 2304000000, 12.61, 98207.00, 98501.00, 99079.00, 153.29, 14.57, 22302.89
4 Nodes: (Full output)
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
112, intelmpi, 3200, 18000000, 2.57, 607.00, 691.00, 186093.00, 3278.42, 14.40, 21349.23
112, intelmpi, 1600, 36000000, 1.83, 995.00, 1116.00, 3213.00, 120.65, 7.62, 29989.68
112, intelmpi, 800, 72000000, 1.58, 1831.00, 1962.00, 3185.00, 88.00, 4.57, 34720.85
112, intelmpi, 400, 144000000, 1.41, 3386.00, 3518.00, 4094.00, 78.54, 2.96, 38878.40
112, intelmpi, 200, 288000000, 1.32, 6507.00, 6592.00, 7101.00, 85.11, 2.19, 41524.85
112, intelmpi, 128, 576000000, 1.64, 12645.00, 12760.00, 13215.00, 102.19, 2.35, 42965.25
112, intelmpi, 128, 1152000000, 3.21, 24887.00, 25053.00, 25626.00, 147.65, 4.18, 43778.33
112, intelmpi, 128, 2304000000, 6.36, 49405.00, 49637.00, 50156.00, 144.99, 7.79, 44241.80
8 Nodes: (Full output)
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
224, intelmpi, 3200, 18000000, 4.42, 576.00, 1335.00, 600904.00, 10606.40, 66.08, 12421.61
224, intelmpi, 1600, 36000000, 2.02, 752.00, 1414.00, 4594.00, 307.05, 32.59, 27149.38
224, intelmpi, 800, 72000000, 1.29, 1179.00, 1601.00, 2592.00, 330.16, 16.57, 42495.30
224, intelmpi, 400, 144000000, 1.00, 1994.00, 2679.00, 3349.00, 336.83, 8.66, 55151.75
224, intelmpi, 200, 288000000, 0.84, 3579.00, 4402.00, 5099.00, 379.67, 4.71, 65173.53
224, intelmpi, 128, 576000000, 0.93, 6714.00, 7450.00, 8069.00, 360.12, 3.48, 75700.07
224, intelmpi, 128, 1152000000, 1.72, 12825.00, 13635.00, 14178.00, 385.89, 4.40, 81856.38
224, intelmpi, 128, 2304000000, 3.29, 25060.00, 25858.00, 27102.00, 378.56, 6.19, 85505.06
16 Nodes: (Full output)
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
448, intelmpi, 800, 18000000, 2.03, 1007.00, 1643.00, 638062.00, 22486.95, 70.62, 6763.99
448, intelmpi, 400, 36000000, 0.68, 1104.00, 1659.00, 4345.00, 342.27, 33.98, 20115.94
448, intelmpi, 200, 72000000, 0.38, 1324.00, 1874.00, 3141.00, 270.82, 17.05, 35925.76
448, intelmpi, 128, 144000000, 0.28, 1617.00, 2139.00, 4740.00, 364.55, 11.00, 62307.93
448, intelmpi, 128, 288000000, 0.40, 2532.00, 3072.00, 5466.00, 365.13, 11.10, 87572.52
448, intelmpi, 128, 576000000, 0.60, 3905.00, 4553.00, 7642.00, 473.67, 11.33, 117611.10
448, intelmpi, 128, 1152000000, 1.00, 6835.00, 7677.00, 13279.00, 785.81, 11.79, 140227.46
448, intelmpi, 128, 2304000000, 1.79, 12966.00, 13809.00, 20651.00, 753.61, 12.71, 157445.45
Balanced: SuperMIC: Host: 16p x 2t, MICs: 2 x 6p x 40t
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
28, intelmpi, 3200, 18000000, 5.38, 1574.00, 1617.00, 81116.00, 1408.07, 6.31, 10213.16
28, intelmpi, 1600, 36000000, 5.00, 3053.00, 3096.00, 4738.00, 84.34, 5.55, 10985.86
28, intelmpi, 800, 72000000, 4.75, 5859.00, 5923.00, 6396.00, 59.87, 5.12, 11565.41
28, intelmpi, 400, 144000000, 4.63, 11493.00, 11560.00, 12083.00, 76.94, 5.00, 11860.26
28, intelmpi, 200, 288000000, 4.57, 22727.00, 22822.00, 23405.00, 87.34, 5.10, 12026.77
28, intelmpi, 128, 576000000, 5.81, 45201.00, 45371.00, 45742.00, 115.39, 6.81, 12102.61
28, intelmpi, 128, 1152000000, 11.58, 90233.00, 90465.00, 91147.00, 144.58, 13.51, 12140.26
28, intelmpi, 128, 2304000000, 23.13, 180253.00, 180664.00, 181519.00, 234.93, 26.96, 12160.96
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
56, intelmpi, 3200, 18000000, 3.32, 822.00, 967.00, 89915.00, 1574.87, 7.31, 16557.37
56, intelmpi, 1600, 36000000, 2.78, 1600.00, 1690.00, 3058.00, 122.47, 4.91, 19771.57
56, intelmpi, 800, 72000000, 2.50, 3025.00, 3099.00, 5022.00, 104.53, 3.65, 21948.74
56, intelmpi, 400, 144000000, 2.37, 5838.00, 5898.00, 6581.00, 67.82, 3.04, 23225.73
56, intelmpi, 200, 288000000, 2.31, 11382.00, 11519.00, 11901.00, 67.51, 2.83, 23804.83
56, intelmpi, 128, 576000000, 2.92, 22641.00, 22772.00, 23399.00, 128.91, 3.58, 24081.48
56, intelmpi, 128, 1152000000, 5.80, 45092.00, 45247.00, 45883.00, 132.91, 6.94, 24259.46
56, intelmpi, 128, 2304000000, 11.56, 90039.00, 90324.00, 90789.00, 130.59, 13.73, 24327.70
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
112, intelmpi, 3200, 18000000, 3.19, 671.00, 755.00, 554539.00, 9802.77, 15.34, 17240.91
112, intelmpi, 1600, 36000000, 1.88, 1018.00, 1113.00, 1782.00, 123.08, 7.87, 29165.82
112, intelmpi, 800, 72000000, 1.49, 1704.00, 1813.00, 2869.00, 124.13, 4.55, 36866.03
112, intelmpi, 400, 144000000, 1.31, 3175.00, 3263.00, 3775.00, 76.16, 2.90, 41834.10
112, intelmpi, 200, 288000000, 1.22, 5988.00, 6070.00, 6413.00, 81.05, 2.12, 45072.63
112, intelmpi, 128, 576000000, 1.51, 11619.00, 11714.00, 16103.00, 564.01, 2.26, 46445.81
112, intelmpi, 128, 1152000000, 2.95, 22883.00, 22984.00, 23649.00, 109.95, 3.94, 47729.03
112, intelmpi, 128, 2304000000, 5.83, 45341.00, 45514.00, 46016.00, 125.44, 7.30, 48232.91
units, mpi.impl, repeats, size, time.s, time.min.us, time.med.us, time.max.us, time.sdv.us, total.s, mkeys/s
224, intelmpi, 3200, 18000000, 4.54, 637.00, 1331.00, 621427.00, 10968.71, 66.76, 12091.51
224, intelmpi, 1600, 36000000, 2.12, 871.00, 1158.00, 5575.00, 317.87, 32.97, 25877.84
224, intelmpi, 800, 72000000, 1.41, 1253.00, 1689.00, 2872.00, 303.87, 16.78, 38998.22
224, intelmpi, 400, 144000000, 1.06, 2106.00, 2796.00, 3475.00, 317.98, 8.77, 51741.61
224, intelmpi, 200, 288000000, 0.89, 3703.00, 4545.00, 6853.00, 343.09, 4.77, 61429.47
224, intelmpi, 128, 576000000, 0.91, 6135.00, 6927.00, 10256.00, 681.82, 3.49, 77069.83
224, intelmpi, 128, 1152000000, 1.71, 11762.00, 13567.00, 16112.00, 1026.50, 4.40, 82360.67
224, intelmpi, 128, 2304000000, 3.19, 22969.00, 23972.00, 30356.00, 1666.24, 6.14, 88227.13