The many Houses of the Seven Kingdoms of Westeros have a problem. They need to send messages to each other, and in a way that’s both fast and secure.
How do they accomplish such a task? Well, they use these guys:
The way that the houses use these messenger ravens isn’t all that different from how IPsec is used on the Internet today to secure messages between two private networks. Both the ravens and IPsec use a public medium to deliver their message. They’re both susceptible to interception and tampering. They’re both at the whim of the environment – a forest fire is just as likely to be hard on the messenger ravens as packet loss is to an IPsec packet. In fact, both methods deliver their messages in “packets”: the ravens are just more efficient at it.
But the most important way that these two mediums are alike is that it all starts with an agreement. Two parties must meet somewhere, at some time, and agree to terms of how future messages will be exchanged. In Westeros, this might be done by meeting in secret at some point. In IPsec, we call this the “Phase 1” negotiation.
It’s important to recognize that when an IPsec tunnel is established, it simply means that two parties have agreed to how they will exchange packets of information in the future. IPsec is not synchronous. It’s not like a traditional tunnel or most peer to peer tunnels where data is exchanged over a TCP stream. IPsec packets are marked as their own protocol (they are neither UDP or TCP), and it’s up to the sender of the packet to ensure that it’s sent in such a way that the recipient knows how to decode it.
A lot of products on the market today are a little misleading about how they present the status of IPsec tunnels. As human beings, we want to be able to look at the status of something and know whether it’s working or not right away. Take these examples from three popular firewall products:
All would lead you to believe that the tunnel is up and running (a green indicator is most popular.) But IPsec tunnels are not so simple. In fact, these are just indications that the Phase 1 negotiation has succeeded. The gateway is simply saying, “yep, we’ve negotiated an agreement!” It’s not actually giving you an indication of whether that agreement is working or not.
Firewall dashboards like these are great ways to check whether a tunnel was negotiated successfully, but they’re not a good way to check if a tunnel is operating properly.
(post phase 1 negotiation)
- Public IP Changes
- Internet Performance
- Rekey Races
Desynchronization is a problem that happens when one member of the IPsec agreement gets out of sync with the other. Maybe one member was expecting the cryptographic cipher to change on a schedule, but the other member didn’t actually change it. Once the cipher has changed, the member that changed it can’t go back to using the old one: that would open up a vulnerability (by allowing someone to send a message with the old cipher much later.)
Public IP Changes are easy to detect, by verifying the real Internet IP address on both gateways participating. Depending on the software or equipment being used, it may or may not be possible to configure the gateway to use a dynamic hostname instead.
Internet Performance ultimately determines the performance of IPsec. This can be verified by troubleshooting the performance of the underlying Internet connection (for example, by pinging the other IPsec member’s gateway address.) It’s tempting on many firewall devices to reject all ICMP packets silently, but this is discouraged since all it does is make troubleshooting issues like this much more difficult.
Rekey Races are a rare issue that happens on some equipment when both members agree that it’s time to re-negotiate the Phase 1 agreement, but also re-negotiate some Phase 2 agreements at the same time. This has caused some IPsec gateways to become “confused” about what’s happening, and re-negotiate a new Phase 1 agreement while leaving some of the tunnels in the old Phase 2, where the other gateway has put those tunnels in the new Phase 2.
- Verify Local Internet Connectivity
- Ping Remote Gateway Internet IP Address
- Check Phase 2 Associations
- Verify Traffic Over Phase 2 Tunnels
- Ping Remote Address via Phase 2 Tunnel
As you proceed through these troubleshooting steps, collect the information as you go, as you may need it to report tunnel trouble to your IPsec partner. Nobody likes to receive a “tunnel down” report with no other information, so having the information available up front will help get the problem resolved faster.
Verify Local Internet Connectivity first, including the actual public Internet IP address that the gateway is using. Many services on the Internet will verify this for you, including http://www.whatismyip.com/ and via Google:
dig o-o.myaddr.l.google.com @ns1.google.com txt +short
Ping Remote Gateway Internet IP Address, which will reveal whether the remote gateway is reachable, and, whether any packet loss is occurring. Keep the ping running continuously, since packet loss can be intermittent, it may take some time to observe it.
Check Phase 2 Associations for desynchronization. This will usually manifest itself on firewall dashboards as seeing multiple SPI associations. A healthy IPsec tunnel will have only one SPI association (and multiple only for the time it takes to rekey.) Long-term, multiple associations for the same network pairs are not normal.
Verify Traffic Over Phase 2 Tunnels by looking for byte or packet counters incrementing. IPsec will not generate traffic on its own: it needs traffic to be flowing over the tunnel for the traffic counters to increment. If you see traffic incrementing on one side but not the other (for example, a receive counter incrementing but not a transmit counter), then that’s a strong indication that one member of the IPsec association is desynchronized.
Ping Remote Address via Phase 2 Tunnel, and try multiple IP addresses. It’s possible that the issue is local to one system on the private network only. If you’re able to ping a system on the remote side, then the tunnel is functioning.
- Ping a Gateway from Gateway
- Tunnel Reset
Generally speaking, you cannot Ping a Gateway from Gateway. This is because the gateway doesn’t know what IP address to originate the traffic from (since the gateway has multiple network interfaces), and, a lot of IPsec implementations are done in ‘user space’ instead of in ‘kernel space’. This means that since the IPsec service is not part of the system’s core networking stack, it can’t originate traffic from itself. Always test traffic through the tunnel, never from the endpoints.
Doing a Tunnel Reset from one side rarely accomplishes much. For example, if one gateway is desynchronized, doing a tunnel reset on the other won’t cause the desynchronization to go away. Some IPsec implentations don’t actually clear all of the SPI associations cleanly on a tunnel reset, in which case only a reboot of the equipment will ensure the old associations are cleared. Once traffic is passing over the tunnel again, fixing the root issue is necessary, otherwise the problem will ultimately occur again.
- Perfect Forward Secrecy
- Rekey Lifetimes
- Time Synchronization
Perfect Forward Secrecy should be enabled, not only because of the security implications (it makes captured encrypted traffic more difficult to break), but because it forces a renegotiation of the Phase 2 tunnels to happen more often. The more time that passes between renegotiations, the more time you’re allowing an IPsec tunnel to become desynchronized.
Rekey Lifetimes should be as short as possible, and the Phase 2 should be set to two-thirds of the Phase 1 rekey time. This ensures that Phase 2 tunnel renegotiation doesn’t happen at the same time as the Phase 1 so often.
Time Synchronization from a reliable time source or NTP server is important since it’s used to calculate rekey times. A clock on a device that drifts (because it has no time source, or an unreliable time source) can cause desynchronization issues. Some equipment ships with hard-coded time sources, so this can’t be helped, but where it’s possible to configure it, reliable NTP servers should be used.
If all else fails, before you contact your IPsec partner, have the information you recorded during the troubleshooting steps ready. Providing as much information as possible will help the partner troubleshoot the issue. Including additional information (such as the physical location of the equipment being used and what networks are being transmitted over the tunnel) will also help.
- Both Gateways Public Internet IP Addresses
- Physical Location and Description of Equipment
- Source and Destination Inside IP’s
- Steps Taken (reboot, tunnel reset, traffic observed, duplicate SPI’s)
- Information/Screenshots from Dashboard
IPsec is a powerful and flexible service, but like the messenger ravens from Game of Thrones, taking a little care and attention will yield the best performance.