Tim laid most things out already. In short, all of this is expected behavior although we could perhaps make a couple of things better.
A couple of notes:
Images in the MXB have different pixel sizes. The merge dialog has to select a single pixel size, because it produces a single image. Initially the merge dialog uses the same coordinate system as the map it opens on, but that could be changed. The dialog displays current pixel size as well as the estimated number of pixels in the resulting image above the list, it makes sense to check that prior to performing the merge.
If we open Map, and start the merge dialog on that (View - Merge - Merge Images), the initial coordinate system is that of the map, and the pixel size is 1x1 m. That's too coarse, and it's seen that that's too coarse, because the estimated number of pixels in the resulting image is 40x40. It is also seen that both images are going to be projected during the merge, and that this projection is going to be more than just shift / scale - this is a little of a red herring, because the projection is going to be very close to shift / scale, the culprit is the eccentricity value: the value used by images is very slightly different from the value used by the map.
You can right-click the first image (2 Image) and select Use Coordinate System. This will set the pixel size to 0.057 m and the estimated number of pixels for the resulting image will become 678 x 681 (much better than 40 x 40). The clicked image will hide the coordinate system icon in the list, indicating that it is not going to be projected at all (just shifted, but that does not alter pixel values, merely moves them around in tiles). The remaining image will show the gray coordinate system icon in the list, indicating that it is going to be reprojected, but that the reprojection will be limited to scale / shift, nothing curvilinear.
You can also right-click the second image (1 Image) and select Use Coordinate System for that. This will set the pixel size to 0.026 m, and make the estimated number of pixels for the resulting image increase to 1,505 x 1,510. This is probably the setting you want to use for the merged image.
We think the above works fairly well already. Maybe we might add some sort of a confirmation for when the resulting image seems way too small.
RGB vs BGR
Our images separate data from interpretation of data. Pixel values stored in tiles are data. Style properties direct how that data gets converted into colors for display. This could be a channel mapping or, say, a palette, where each pixel has a single number and you specify that 1 means red (255, 0, 0), 2 means green (0, 255, 0), etc.
The merge dialog operates on data, it ignores styles. If you try merging two images with three channels each, with one image using BGR and another using RGB, the resulting image is going to have a mix of B from the first image and R from the second image in the first channel, etc. Because merging is done on data.
We merge data because that's what is mostly needed. If you want to merge images that use different channel arrangements, you are supposed to take care of that first. Most frequently, the merge is done on a series of images that are similar to each other, so this is a non-issue, but more importantly, we do not want to be reshuffling channels based on temporary rules, because that loses data and makes the operation hard to reason about and use.
That said, we can have an option to merge not data, but rather colors. That way, we would render all images to RGBA applying the rules specified in styles the same way we do when they are displayed, and we would produce a single RGBA image. This is going to lose all floating-point values, etc, but the resulting image would be consistent in terms of display - which is useful. We might add such an option in the future.
Hope this helps a bit.