I am trying to understand the different elements of the code. However, I fail to see why you multiply by a matrix called fix when computing the tag pose. I have an intuition that it has something to do with the direction of the Z axis pointing away or to the camera but why can't that be directly estimated via the homography?
Here is the matrix in question
https://github.com/AprilRobotics/apriltag/blob/master/apriltag_pose.c#L463
Could one of the authors please elaborate on this or point to an article where they describe why this is necessary?
Thank you