ipsec – Why does IKE have two phases?

The two phases part here is a bit of a separate matter to the rest. Those phases are just two parts of the handshake, and they’re just logical separations between the initial identity validation, authenticated key exchange, and parameter agreement (e.g. modes, features, etc.), which make up phase 1, and the derivation and setup of the secure channel used for bulk data transfer, which is done in phase 2.

This is very similar to how TLS in HTTPS works – the handshake initially exchanges information about the identities of the parties and what protocol version and features they supports, agrees on what features should be used, and (usually) does an initial key exchange which is authenticated by the long-term key from the certificate (as in IKE phase 1). It then uses that exchanged key to derive the necessary parameters for the secure channel that the bulk data is transferred over, and initiates that channel (as in IKE phase 2).

The duplication of security associations is because of certain usage requirements, partly due to historical hardware and telecommunications limitations.

The most commonly given example is in a distributed cellular network, whereby one client device might be talking to multiple cell towers. Each cell tower has plenty of processing power, is powered from the grid, and has a (relatively) high bandwidth network connection to the other towers. Compare that with your average late-90s cellular device, which has little processing power, runs on a small battery, has very limited bandwidth to the cell towers, and might be moving between towers quite a lot. Instead of making the phone perform a complete new two-way IKE handshake with every tower it comes across, there are instead two unidirectional channels – one for the client to talk to the tower, and another for the tower to talk back to the client – each with their own security parameters. The idea here is that the cellular device can broadcast data on one security association which is shared between all towers, allowing any tower to decrypt it, whereas data coming back from each tower to the cellular device has its own separate channel.

The differences in supported features, lifetimes, and renegotiations are precisely because of this difference in functionality. The outbound channel from the cellular device to the towers needs to be able to work in a broadcast environment, but the inbound channel does not. The security properties of each channel are tuned so that they are most effective for the specific use-case, providing the most security possible given the requirements of the system.