How does the weakness of the bech32 length extension work?

Peter Wuille's comment gives a nice summary:

Basically, take a bech32 string, x or a 1 in the last character, and press or place as many qs as you want and x or a 1 in the last character to always get a valid new bech32 string

The checksum code of Bitcoin Core is:

uint32_t PolyMod(const data& v)
{
    uint32_t c = 1;
    for (const auto v_i : v) {
        uint8_t c0 = c >> 25;

        c = ((c & 0x1ffffff) << 5) ^ v_i;

        if (c0 & 1)  c ^= 0x3b6a57b2;
        if (c0 & 2)  c ^= 0x26508e6d;
        if (c0 & 4)  c ^= 0x1ea119fa;
        if (c0 & 8)  c ^= 0x3d4233dd;
        if (c0 & 16) c ^= 0x2a1462b3;
    }
    return c;
}

Basically, the checksum function has an internal variable that changes with every 5-bit character, similar to SHA without padding and with different sizes. What you should keep in mind is that both this and SHA are vulnerable to length extension attacks, which means someone does not know this x but know H(x) can calculate H(x || A), Where A is any data sequence and || is the concat operator.

Called by:

data CreateChecksum(const std::string& hrp, const data& values)
{
    data enc = Cat(ExpandHRP(hrp), values);
    enc.resize(enc.size() + 6); // Append 6 zeroes
    uint32_t mod = PolyMod(enc) ^ 1; // Determine what to XOR into those 6 zeroes.
    data ret(6);
    for (size_t i = 0; i < 6; ++i) {
        // Convert the 5-bit groups in mod to checksum values.
        ret(i) = (mod >> (5 * (5 - i))) & 31;
    }
    return ret;
}

In this function, the result is simply serialized by PolyMod, but the least significant bit of the checksum is XORed with 1. If you undo this, you can extend the checksum by typing the PolyMod function (or the Initialize-Update-Finalize version of it) a number of five-bit zeros and, if you recheck the least significant bit with 1, XOR, You will receive a valid checksum.

Why? Because a zero byte does not cause any if Branching to be executed in the PolyMod function loop.

The reason why XOR-1 was part of the specification was (to my knowledge) to prevent adding an extra character at the end so that the checksum is still correct. Otherwise,

  • ii2134hk2xmat79tqq
  • ii2134hk2xmat79tqqq
  • ii2134hk2xmat79tqqqq

Everything would be fine, which is even worse.

In the CreateChecksum function, you see that 6 empty five-bit characters are appended to create a signature (since BCH works this way, the checksum is set to empty before the calculation). Now that PolyMod knows if the zeroes are part of the input or the empty checksum, they are treated the same. That's why you can:

1) Take any Bech32 address as a five-bit byte sequence

2) X or the last byte with 1

3) Add any numbers

4) Add 6 empty bytes

5) Calculate the PolyMod result

6) Xor the last byte with 1

7) Coding

The peculiarity described in the given link is when these "arbitrary numbers" in step 3 are zeros. The PolyMod requests a payload with 6 attached zeroes to form the checksum, and the checksum should come directly after the payload if the checksum is to be correct. You can add zeros to the end so that it stays correct. Only the XOR-1 thing needs to be done.