How does the Arabic typographic layout system work on a high level?

I have some Arabic content that is justified by Western conventions.

I justified it because it is justified in early sources:

However, the way the Arabic text alignment works is to stretch the italic words, not the spaces.

So what I want to do is go to the font source, convert it to SVG, and select each glyph. This is uncomplicated for Hebrew, but now I am losing to Arabic.

Each Arabic letter can have several forms (ie, isolated, initial, medial, final, some with all, some with few). I ignore the diacritics at the moment, since the early Qur'an of these pictures above did not use them.

What I noticed when rendering Arabic fonts (and I do not know much about Arabic) is that the italic letters change their shape significantly when used in different places in a word. Maybe it's all determined by the initial / media / final business, I'm not sure.

When I look at an Arabic script, I see individual glyphs in their isolated form. How do I see them in the first / middle / last forms within the script? Especially if I change the font to SVG?

If all that matters is that each letter can have 2 to 4 shapes and I ignore the diacritics and the alphabet contains 28 characters (even fewer if you remove all diacritical marks), you should see 2 to 4 in SVG. 4 * 28 characters, right?

Then for the rules of the layout. If I just draw one SVG path per glyph, do I have to do something special to connect the edges of the glyphs to be italic throughout? Do I have to test each letter with every other letter in every combination? Or is it somehow constructed so that I only work on each letter variant and because of the positioning of the strokes, everything will simply "fit together"?

That's why I'd like to give a brief overview of how the font rendering engine plays Arabic text. What's going on under the hood that would help me better understand how to do this project? How is the seamless connection between letters made? How does it choose the variant (it's simple: "If it's in the middle, use medial, otherwise at the end, use final, otherwise initially use initial, otherwise use isolated thing?)