I posed your question to the Montage team and received the answer below; I hope this helps.
There is enough information from the Montage processing to do this if you know where to look, though of course there will be some massaging involved.
The reprojection part of Montage involves getting all the images to the same projection. In the case of FITS WCS that means the headers will all be exactly the same except for relative pixel offsets in X and Y; exactly what he needs to lay the images out relative to each other.
A side note: You can't bypass the reprojection since the image-to-image differences are almost never simple shift/rotate in nature.
So after reprojecting and building a metadata table (mImgtbl in Montage) you can place the images relative to each other just based on relative CRPIX1 (X) and CRPIX2 (Y) offsets from that table (with appropriate attention to the sign of the offset).
At this time, Montage only outputs JPEGs but so long as you don't crop, shrink or otherwise transform them, the equivalent PNG or TIFF images could be placed relative to each other based on the same CRPIX offsets.
I can give more details if this seems like the right general direction.
Code-seeking owl at your service