OK the timings! Both tests using GeForce GTX TITAN GPU, Kepler generation, compute capability 3.5, 14 SMX, 6 GB graphics RAM; installed on an ASUS Z97-based system, Intel i7-4790K 4 GHz, 32 GB system RAM. Windows pagefile is with OS on SSD1 (256 GB). TEMP and the source .map project are both on SSD2 (480 GB). Other (spinning) drives are not used in the tests.
I had Task Manager set up like this (this is during data copying, prior to testing--no GPGPU load yet).
Here is the test query, as written by the Curvature, Mean transform template, with one manual adjustment: the number of threads is reduced from SystemCpuCount() (in this 8) to 6 [and with the 'FieldCoordSystem.Tile' string truncated to '...'].
CREATE TABLE [USGS_NED_Fused_DEM Curvature, Mean] (
INDEX [mfd_id_x] BTREE ([mfd_id]),
INDEX [X_Y_Tile_x] RTREE ([X], [Y], [Tile] TILESIZE (128, 128) TILETYPE FLOAT32),
PROPERTY 'FieldCoordSystem.Tile' '...',
PROPERTY 'FieldTileSize.Tile' '[ 128, 128 ]',
PROPERTY 'FieldTileType.Tile' 'float32'
CREATE IMAGE [USGS_NED_Fused_DEM Curvature, Mean Image] (
PROPERTY 'Table' '[USGS_NED_Fused_DEM Curvature, Mean]',
PROPERTY 'FieldTile' 'Tile',
PROPERTY 'FieldX' 'X',
PROPERTY 'FieldY' 'Y',
PROPERTY 'Rect' '[ 0, 0, 327603, 115217 ]'
PRAGMA ('progress.percentnext' = '100');
VALUE @scales FLOAT64X3 = ComponentCoordSystemScaleXYZ([USGS_NED_Fused_DEM]);
INSERT INTO [USGS_NED_Fused_DEM Curvature, Mean] (
CASTV ((TileRemoveBorder(TileCurvMean(TileCutBorder([USGS_NED_Fused_DEM], VectorMakeX2([X], [Y]), 3), 3, @scales), 3)) AS FLOAT32)
TABLE CALL TileUpdatePyramids([USGS_NED_Fused_DEM Curvature, Mean Image]);
(1) The first test is using all 896 64-bit CUDA cores (ratio 1/3), with the GPU set up like this. (See post above for an explanation.)
GPGPU usage went straight to 98-99%. CPU usage went to 75% (meaning that 6 of 8 virtual cores were fully saturated; all 6 allocated threads were fully supplied with work). Those figures both stayed close to constant throughout the test, until the TileUpdatePyramids phase at the end, when GPGPU went to 0% and CPU usage dropped to about 28%.
Graphics RAM usage was constant at 6 GB (that is, all of it). System RAM usage was initially about 6 GB (pinned/shared?), rising to 30 GB during the TileUpdatePyramids phase.
SSD2 was normally about 24% busy, rising to 75% during the TileUpdatePyramids phase. The project used a maximum of 151 GB of TEMP space. There was ~0% activity on SDD1 (so no significant pagefile use).
Here is a typical progress shot.
And during the TileUpdatePyramids phase.
Total processing time using all 896 cores was 2566.518 sec, 42mn 47s.
(2) The second test used 112 64-bit cores (ratio 1/24), otherwise the same. The statements about utilization for the first test also apply with no substantial differences.
Total processing time using 112 cores was 3076.059 sec, 51mn 16s.
This compares with Stan's total time of 8839.753 sec, 2h 27mn, using 112 64-bit cores (if both cards were used) or 56 cores (if not).
So my first test was 3.44x faster than Stan's test, the second 2.87x faster.
Of my two tests, the first (896 cores) was 1.20x faster than the second (112 cores).
Tentative conclusions to come in a separate post.
One more thing: I did set up a test on this laptop. Again GPGPU usage went straight to ~98%, and CPU use was pegged at ~75%. Only, the test turned the laptop into a hairdryer, and I cancelled the test after a few minutes rather than continue to torture it.
I will make a separate test on a Geforce GTX 1060 in the next few days.
GTX TITAN settings.png
Task Manager GPGPU.png