I agree "deep learning" (also called "AI", "neural networks," and other phrases that all mean the same things these days...) is a sexy thing. You're right that Manifold's mastery of GPU plus the ability to handle big images means a lot of the necessary infrastructure is in place.
But the article does not go into how the practical reality of what needs to be done to use deep learning means that very few Manifold people would actually use it.
The key takeaway from that article is this:
At the highest level, deep learning, which is a type of machine learning, is a process where the user creates training samples, for example by drawing polygons over rooftops, and the computer model learns from these training samples and scans the rest of the image to identify similar features.
But the above phrase doesn't tell you how many training samples the user has to provide for the capability to be useful. Typical models (which is what you create by providing training samples) require tens of thousands of training samples, and many models require hundreds of thousands of training samples.
Say you want to create a model that finds swimming pools in aerial photos. First, you need many thousands of aerial photos of swimming pools and then you need somebody who will sit there marking off in each of those photos what's a swimming pool. It's not like you do only five or ten photos and then the machine takes over. Training is a very costly investment of human labor.
The reality is that in most cases it's quicker and cheaper to simply digitize manually. If you're some very big organization that needs to find swimming pools for tens of thousands of different cities and counties around the world, well, then maybe it would be cost effective for you to train up a model that could then use deep learning to automate some of the work.
As you can see from the above, deep learning is one of those things that works great as a marketing pitch so long as the practical limitations on using it aren't considered. ESRI's pitching it hard because it's an easy way for them to be able to say "Hey, we're not idiots about GPU, we have deep learning!" - even though only about one in a thousand of their customers can make use of it.
From an implementation perspective, deep learning is much easier than general purpose parallelism on GPU. Why? Because deep learning is basically the same thing for all applications. It's putting together a neural net, the very same neural net software in most cases, and training it. The difference between looking for swimming pools and looking for roadways or rooftops is just the training material you feed it. Even easier is that Nvidia hands out free libraries that do the neural net for you on the GPUs. There is basically no GPU code to write, which is why an organization with such poor GPU skills as ESRI can do it.
In contrast, every use of GPU for parallel processing is different: it's a different parallel algorithm with different math to process, say filters as compared to say, cosines, and you have to write that code in each case. That is much harder to do, which is why ESRI to this day still has only three simple raster functions that are GPU parallel.
It's true Manifold has most of the infrastructure required to support neural networks applied to raster processing. But Manifold doesn't have all of it. The missing parts are a variety of user interfaces to enable training and to manage saving, loading and use of models that have been trained. There is also some very straightforward work to manage installation and use of machine learning modules that NVIDIA provides for use with NVIDIA GPUs, and to do that in a way that doesn't turn Manifold and Viewer installation packages into utter bloatware. All of that stuff is easy - no rocket science involved - but there's a fair amount of it.
But when you do that what do you have? A collection of features that allow somebody who has the time and resources to acquire tens or hundreds of thousands of images and then to hand-process each of those images to teach the model what's desired. That's not something most people in the Manifold user community have the resources to do.
And, even for those who have the resources, it takes a really huge investment in training labor to get results that are worthwhile. A good example is in this topic, which uses a buildings footprints data set created by Microsoft using a neural network (that is, a deep learning process). Despite effectively infinite resources, the result of Microsoft's deep learning process is really awful and inaccurate, as examples in that topic show.
There are cases where deep learning can help despite inaccuracies. Finding swimming pools is one of those cases, because it allows tax assessors to hunt for undeclared pools using the output of a well-trained model as a first cut filter of possible pools to check. But getting all the training done is a big deal even for organizations that are as large as, say, Los Angeles County tax office.
One shortcut to that is to provide a capability to load pre-trained models that other organizations have created and then sell. I think that's where ESRI is going with this, since it's clear that while few people have the resources to train models, many more people would like to take advantage of somebody else's training investment. But that, too, is an expensive, niche interest, not a broad interest like many other things within the Manifold user community.
I don't doubt this is something Manifold will do eventually, as it's a cool thing and, you're right, given Manifold's ability to handle GPU well and to handle big rasters well, it's an intellectually appealing fit. But first Manifold has to focus on a variety of pending goals of broader interest. :-)