The pointer to NUMA architectures was an eye-opener,
One task for current builds is to run in a NUMA processor without many threads causing a slow down. I know that doesn't sound particularly ambitious, but it is a good first step, with optimizations for NUMA still ahead.
The video is mainly useful as a technical exploration of using many threads in a mixed performance environment where some threads have better RAM bandwidth and where disk access is a factor as well. That's a useful model for exploring what happens when you distribute parallel tasks across multiple machines, where varying RAM bandwidth in the Threadripper and disk access latency are useful proxies for varying machine performance and varying network latencies in local networks or clouds. There is a lot more work to be done in this area. But at least for now it provides a better setting for throwing many threads at those GIS tasks where, all other things (disk access, etc) being equal, more threads can help.
You can see the effects of faster disk access running the same job on a 3900x with faster disk access. The system in the video uses a pretty old X399 motherboard with an SATA III, 500 GB Samsung EVO SSD that provides a maximum of about 500 MB/s read speed for sequential reads... not anywhere near as fast as more modern M.2 SSD.
To compare this to a Ryzen 9 3900x with faster data access, I first re-ran the calculation on the Threadripper using the latest build, 220.127.116.11, which was published after that video. The 170.2 build has an improved raster shortest path algorithm, which resolves plateaus (there are many in that data set due to the lakes/reservoirs in the Montara region) faster and better. The timings, run three times to average out Windows cache effects:
AMD Threadripper 2970WX 48 threads X399 motherboard
(Samsung EVO SSD SATA III 550 MB/s read)
2020-01-01 09:53:33 -- Transform (Watershed Areas): [Montara] (18.391 sec)
2020-01-01 09:54:58 -- Transform (Watershed Areas): [Montara] (18.516 sec)
2020-01-01 09:55:52 -- Transform (Watershed Areas): [Montara] (18.406 sec)
Compare that to a system that has ten times faster data access (PCIe 4.0 M.2 SSD), using a faster CPU with uniform (non-NUMA) memory access:
AMD Ryzen 9 3900x 24 threads X570 motherboard
(PCIe 4.0, M.2 NVMe SSD, Sabrent Rocket 5000 MB/s read)
2020-01-01 09:57:47 -- Transform (Watershed Areas): [Montara] (11.006 sec)
2020-01-01 09:58:35 -- Transform (Watershed Areas): [Montara] (10.753 sec)
2020-01-01 09:59:09 -- Transform (Watershed Areas): [Montara] (10.596 sec)
The Ryzen 9 test ran with from 91% to 99% overall CPU utilization.
Despite having only half as many threads, the system with ten times faster data storage does the job in 11 seconds or less as compared to about 18.5 seconds. It's true the Ryzen 9 has a base clock about 22% faster than the older Threadripper, and the Ryzen 9 has 25% faster main memory in the X570 motherboard (DDR4-3600) compared to the older X399 (DDR-2666), and the Ryzen 9 has uniform memory architecture, compared to the NUMA architecture of the older Threadripper where half the cores cannot reach memory directly but must go through another chiplet.
The data for the above can be downloaded from: http://www.manifoldgis.com/files/Montara_SRTM3.map (288,192 KB)
The 2970WX Threadripper in the video at $900 is a pretty good deal for 48 threads. The equivalent 3rd gen 48 thread Threadripper is out of stock, but typically priced at around $1500 to $1700.
Once they become available and prices come down, a Ryzen 9 3950x with 32 threads will be great, but for now the sweet spot I think is a 24 thread Ryzen 9 3900x for around $490. Add a 1 TB Sabrent Rocket PCIe 4.0 M.2 SSD for around $120 and you get many fast threads with 5000 MB/s "disk" data access.
There's obviously many moving parts going on in all this, but in general I think it's wonderful we're entering 2020 with opportunities to get faster/better pretty much everywhere you look.
Manycore processors are getting cheaper and more available, 1 TB super-fast 4th gen M.2 SSDs are getting absurdly cheap, NVIDIA keeps churning out better and better GPUs at a lower price per core, and there is steady progress within Manifold for using more CPU cores as well as more GPU cores.
There's also interactivity with progress in one area (like the new raster shortest path work) that helps in other areas (faster plateau reckoning with watersheds). Connecting up the dots so GPU is used in mixed vector / raster applications like watersheds, visibility zones, etc., will also be helpful. It's nice to see that the original investment into 9 as a platform that facilitates reliable expansion and upgraded function is paying off.