In this thread of Volker's I mentioned in brackets that (BTW in 9 this step [image reclassification] could be done all on GPU, using multiplication and addition to avoid a conditional.)
I thought I should say what I meant, now that I have tried doing this and having got useful results. I am using Volker's task from that thread as the example. That is, to reclassify an image showing aspect into two classes: where aspect is from -67 to 113° (through ±180° or south), class 1; and where aspect is from -67 to 113° (through 0° or north) , class 2. (I treat the boundaries slightly differently from Volker, using >= -68 and < 113 for class 1, and < -68 or >= 113 for class 2, dividing the compass exactly in half.) The data I am using is an 8m DEM of the North Island of New Zealand, 73728 x 106496 pixels, single-precision floating-point (FLOAT32).
(1) First, how would we do this in Manifold 8? We have two language interfaces: Spatial SQL and Surface Tools. In SQL we might do this (remembering that Height (I) in this case encodes aspect): --SQL UPDATE [Surface] SET [Height (I)] = CASE WHEN [Height (I)] >= -67 AND [Height (I)] < 133 THEN 1 WHEN [Height (I)] < -67 OR [Height (I)] >= 133 THEN 2 END; We don't really need the second condition, since if the first condition fails then the second follows necessarily. I include both for the sake of the example. This query will execute entirely on CPU, since Spatial SQL is not GPU-enabled in Manifold 8. Using Surface Tools, we can do it like this: IIf([Surface] >= -67, 1, IIf([Surface] < 133, 1, IIf([Surface] < -67, 2, IIf([Surface] >= 133, 2, Null )))) Many Surface Transform functions in Manifold 8 are GPU-enabled. However, a conditional expression is always evaluated on CPU. That is because, while conditional statements can be evaluated on GPU (using CUDA C), there is no point in doing so: the necessary branch synchronization negates any advantage to be gained from GPU parallelism. CPUs are simply better at branching. (As you'll see, I'm begging a question here. Why not use the approach used below for Manifold 9 in Manifold 8 as well? We'll come back to that at the end.)
(2) Now, what are our options in Manifold 9? In 9, images are made of tiles, and tiles are made of pixels. So one option would be to unpack all the tiles in the image, using the TileToValues function, then perform the classification on the pixels using SQL (much as in the example for Manifold 8), then rebuild all the pixels back into tiles. That would work, but it would be a bad idea. First because the unpacking and repacking of tiles is wasteful. Secondly because none of the three processing stages can benefit from parallelism: the unpacking and repacking are just data transport, and classification using a conditional will be kept on CPU. Ideally in Manifold 9, we would like to operate on tiles per se, without unpacking. OK, but how do we do that, if the objective is classification? We can't use the ordinary <, <=, >, >= or BETWEEN comparison operators on tile data. If we could, what would it mean? Would we mean tiles or pixels here? Well, on the other hand, we can use basic arithmetic operations (+, -, *, /, DIV, MOD, ^) on tile data, plus bitwise operators, and a large family of special Tile* functions, including powers, roots and logs, trigonometry, rounding and other functions. In all of these cases it is clear that we do mean to operate on pixel values--without unpacking their tiles. I think the reason these functions work for tiles but the standard comparision operators do not, is again that comparison operators involve branching, or conditional statements, which are a poor fit for tile-by-tile operations just as they are a poor fit for GPGPU. However, amongst the Tile* functions we also have some special comparision operators, that do not branch, but simply store (or pass on) the result of their comparison. These functions are TileCompare, TileMin, TileMax, and TileSign. They can be nested, and in combination we can use them to build non-branching equivalents to the comparison operators <, <=, >, >= which we need for image classification. OK, how? To cut a long story short, like this. To test whether x < y or x > y (strictly less or greater than), we can take the difference x - y, compare it to zero, and check the sign. iff x < y then -Sign(Min(x - y, 0)) = 1 (else 0) iff x > y then Sign(Max(x - y, 0)) = 1 (else 0) To test whether x <= y or x >= y, we take the opposite difference y - x, again compare to zero, and check the complement of the sign. iff x <= y then 1 - Sign(Min(y - x, 0)) = 1 (else 0) iff x >= y 1 - Sign(Max(y - x, 0)) = 1 (else 0) Since the results of these four tests are binary (either 0 or 1), we can multiply and add them together to produce compound tests, still without using a conditional expression or introducing branching.
So now we can rewrite the tests used in Manifold 8 for Manifold 9, operating on whole tiles rather than pixels. Moreover, because we gave got rid of the conditional statements, we can throw the whole image worth of tiles at the GPU, for efficient massively parallel processing. That's pretty good! Here is a query to do Volker's image classification in Manifold 9. We only need the expressions for < and >= in this case. --SQL9 VALUE @target_table TABLE = [Aspect]; VALUE @target_image TABLE = [Aspect image]; PRAGMA ( 'gpgpu' = 'auto', --'gpgpu' = 'aggressive', --'gpgpu' = 'none', 'gpgpu.fp' = '32' ); UPDATE ( SELECT [mfd_id], [Tile], CASTV( -- when >= -68 and < 113 1 * (1 - TileSign(TileMax(-68 - [Tile], 0))) -- 1 if [Tile] >= -68 else 0 * -TileSign(TileMin([Tile] - 113, 0)) -- 1 if [Tile] < 113 else 0 + -- when < -68 or >= 113 2 * ( -TileSign(TileMin([Tile] - -68, 0)) -- 1 if [Tile] < -68 else 0 + (1 - TileSign(TileMax(113 - [Tile], 0))) -- 1 if [Tile] >= 113 else 0 ) AS FLOAT32 ) AS [Tile'] FROM @target_table THREADS SystemCpuCount() ) SET [Tile] = [Tile'] ; TABLE CALL TileUpdatePyramids(@target_image) ; Note the nested SELECT in the UPDATE statement. We could more simply write --SQL9 UPDATE @target_table SET [Tile] = CASTV(...) However, that would be suboptimal, because the UPDATE phase of the query will only make effective use of a single thread (since it is mainly transport). Putting the main work into a subquery with a THREADS directive allows that part of the query to utilize multiple CPU threads (which are used to feed GPU parallelism). I'll come back soon with some timings (and a note about the question I was begging).
|