There are a lot of pictures over the internet illustrating the principle of phase detection autofocus, such as this one
The simplest way to understand how PDAF works is to start by thinking about light passing the camera lens at the very extreme edges. When in perfect focus, light from even these extremes of the lens will refract back to meet at an exact point on the camera sensor.
When the light reaches these two sensors, if an object is in focus, light rays from the extreme sides of the lens converge right in the center of each sensor (like they would on an image sensor). Both sensors would have identical images on them, indicating that the object is indeed in perfect focus.
For on-sensor PDAF technique, there are many special pixels with an opaque mask over one half.
It may be look like this:
The right-mask pixels and left-mask pixels are not adjacent.
How can the left image with left-mask pixels and the right image with right-mask pixels be identical when the object is in focus? From the first figure, the object points should be imaged into one pixel location when the object is in focus.