Your browser (Internet Explorer 6) is out of date. It has known security flaws and may not display all features of this and other websites. Learn how to update your browser.
X

Game of Tunnels

game_of_tunnels

The many Houses of the Seven Kingdoms of Westeros have a problem. They need to send messages to each other, and in a way that’s both fast and secure.

game_of_thrones_map_of_westeros_-wallpaper-1920x1080

How do they accomplish such a task? Well, they use these guys:

raven_1x10

The way that the houses use these messenger ravens isn’t all that different from how IPsec is used on the Internet today to secure messages between two private networks. Both the ravens and IPsec use a public medium to deliver their message. They’re both susceptible to interception and tampering. They’re both at the whim of the environment – a forest fire is just as likely to be hard on the messenger ravens as packet loss is to an IPsec packet. In fact, both methods deliver their messages in “packets”: the ravens are just more efficient at it.

But the most important way that these two mediums are alike is that it all starts with an agreement. Two parties must meet somewhere, at some time, and agree to terms of how future messages will be exchanged. In Westeros, this might be done by meeting in secret at some point. In IPsec, we call this the “Phase 1” negotiation.

two-visitors-at-sycamore-gap-anational-trust-images-john-millar

It’s important to recognize that when an IPsec tunnel is established, it simply means that two parties have agreed to how they will exchange packets of information in the future. IPsec is not synchronous. It’s not like a traditional tunnel or most peer to peer tunnels where data is exchanged over a TCP stream. IPsec packets are marked as their own protocol (they are neither UDP or TCP), and it’s up to the sender of the packet to ensure that it’s sent in such a way that the recipient knows how to decode it.

A lot of products on the market today are a little misleading about how they present the status of IPsec tunnels. As human beings, we want to be able to look at the status of something and know whether it’s working or not right away. Take these examples from three popular firewall products:

ipsec-established-examples

All would lead you to believe that the tunnel is up and running (a green indicator is most popular.) But IPsec tunnels are not so simple. In fact, these are just indications that the Phase 1 negotiation has succeeded. The gateway is simply saying, “yep, we’ve negotiated an agreement!” It’s not actually giving you an indication of whether that agreement is working or not.

Firewall dashboards like these are great ways to check whether a tunnel was negotiated successfully, but they’re not a good way to check if a tunnel is operating properly.

Common IPsec Problems:
(post phase 1 negotiation)

  • Desynchronization
  • Public IP Changes
  • Internet Performance
  • Rekey Races

Desynchronization is a problem that happens when one member of the IPsec agreement gets out of sync with the other. Maybe one member was expecting the cryptographic cipher to change on a schedule, but the other member didn’t actually change it. Once the cipher has changed, the member that changed it can’t go back to using the old one: that would open up a vulnerability (by allowing someone to send a message with the old cipher much later.)

Public IP Changes are easy to detect, by verifying the real Internet IP address on both gateways participating. Depending on the software or equipment being used, it may or may not be possible to configure the gateway to use a dynamic hostname instead.

Internet Performance ultimately determines the performance of IPsec. This can be verified by troubleshooting the performance of the underlying Internet connection (for example, by pinging the other IPsec member’s gateway address.) It’s tempting on many firewall devices to reject all ICMP packets silently, but this is discouraged since all it does is make troubleshooting issues like this much more difficult.

Rekey Races are a rare issue that happens on some equipment when both members agree that it’s time to re-negotiate the Phase 1 agreement, but also re-negotiate some Phase 2 agreements at the same time. This has caused some IPsec gateways to become “confused” about what’s happening, and re-negotiate a new Phase 1 agreement while leaving some of the tunnels in the old Phase 2, where the other gateway has put those tunnels in the new Phase 2.

Troubleshooting Steps:

  • Verify Local Internet Connectivity
  • Ping Remote Gateway Internet IP Address
  • Check Phase 2 Associations
  • Verify Traffic Over Phase 2 Tunnels
  • Ping Remote Address via Phase 2 Tunnel

As you proceed through these troubleshooting steps, collect the information as you go, as you may need it to report tunnel trouble to your IPsec partner. Nobody likes to receive a “tunnel down” report with no other information, so having the information available up front will help get the problem resolved faster.

Verify Local Internet Connectivity first, including the actual public Internet IP address that the gateway is using. Many services on the Internet will verify this for you, including http://www.whatismyip.com/ and via Google:

dig o-o.myaddr.l.google.com @ns1.google.com txt +short

Ping Remote Gateway Internet IP Address, which will reveal whether the remote gateway is reachable, and, whether any packet loss is occurring. Keep the ping running continuously, since packet loss can be intermittent, it may take some time to observe it.

Check Phase 2 Associations for desynchronization. This will usually manifest itself on firewall dashboards as seeing multiple SPI associations. A healthy IPsec tunnel will have only one SPI association (and multiple only for the time it takes to rekey.) Long-term, multiple associations for the same network pairs are not normal.

Verify Traffic Over Phase 2 Tunnels by looking for byte or packet counters incrementing. IPsec will not generate traffic on its own: it needs traffic to be flowing over the tunnel for the traffic counters to increment. If you see traffic incrementing on one side but not the other (for example, a receive counter incrementing but not a transmit counter), then that’s a strong indication that one member of the IPsec association is desynchronized.

Ping Remote Address via Phase 2 Tunnel, and try multiple IP addresses. It’s possible that the issue is local to one system on the private network only. If you’re able to ping a system on the remote side, then the tunnel is functioning.

IPsec Misconceptions:

  • Ping a Gateway from Gateway
  • Tunnel Reset

Generally speaking, you cannot Ping a Gateway from Gateway. This is because the gateway doesn’t know what IP address to originate the traffic from (since the gateway has multiple network interfaces), and, a lot of IPsec implementations are done in ‘user space’ instead of in ‘kernel space’. This means that since the IPsec service is not part of the system’s core networking stack, it can’t originate traffic from itself. Always test traffic through the tunnel, never from the endpoints.

Doing a Tunnel Reset from one side rarely accomplishes much. For example, if one gateway is desynchronized, doing a tunnel reset on the other won’t cause the desynchronization to go away. Some IPsec implentations don’t actually clear all of the SPI associations cleanly on a tunnel reset, in which case only a reboot of the equipment will ensure the old associations are cleared. Once traffic is passing over the tunnel again, fixing the root issue is necessary, otherwise the problem will ultimately occur again.

IPsec Best Practices:

  • Perfect Forward Secrecy
  • Rekey Lifetimes
  • Time Synchronization

Perfect Forward Secrecy should be enabled, not only because of the security implications (it makes captured encrypted traffic more difficult to break), but because it forces a renegotiation of the Phase 2 tunnels to happen more often. The more time that passes between renegotiations, the more time you’re allowing an IPsec tunnel to become desynchronized.

Rekey Lifetimes should be as short as possible, and the Phase 2 should be set to two-thirds of the Phase 1 rekey time. This ensures that Phase 2 tunnel renegotiation doesn’t happen at the same time as the Phase 1 so often.

Time Synchronization from a reliable time source or NTP server is important since it’s used to calculate rekey times. A clock on a device that drifts (because it has no time source, or an unreliable time source) can cause desynchronization issues. Some equipment ships with hard-coded time sources, so this can’t be helped, but where it’s possible to configure it, reliable NTP servers should be used.

If all else fails, before you contact your IPsec partner, have the information you recorded during the troubleshooting steps ready. Providing as much information as possible will help the partner troubleshoot the issue. Including additional information (such as the physical location of the equipment being used and what networks are being transmitted over the tunnel) will also help.

IPsec Issues Checklist:

  • Both Gateways Public Internet IP Addresses
  • Physical Location and Description of Equipment
  • Source and Destination Inside IP’s
  • Steps Taken (reboot, tunnel reset, traffic observed, duplicate SPI’s)
  • Information/Screenshots from Dashboard

IPsec is a powerful and flexible service, but like the messenger ravens from Game of Thrones, taking a little care and attention will yield the best performance.

141124-got-sights-1024

Disk Jockey

A company called Diskology makes a great product called the Disk Jockey (“DJ”). I personally own two of these (one attached to my server at home, and another attached to my workstation.) This is a fantastic product, albeit with a few minor quirks that you should be aware of before using the device.

2014-09-28 16.05.45

In its simplest form, the DJ operates like any of your run-of-the-mill hard drive dock. Even better, it can function as a two disk dock. On the back are connectors for eSATA and USB, although I tend to prefer eSATA for performance reasons.

When the disk jockey is not plugged in to a computer, it operates as a stand-alone device that can perform a variety of functions: disk copying, wiping, and verification. For any of us who frequently need to duplicate disks, wipe disks, or verify that two disks are identical, these standalone functions are invaluable and a great time saver.

However, the greatest power of the DJ comes from its drive combining options. For example, you can connect two disks to the DJ in what it calls a ‘mirror’ or ‘combine’ volume. When you select one of these options, the DJ then presents itself to your workstation as a logical volume. In the case of a ‘mirror’, it’s a bit like RAID1, and in ‘combine’, it’s a little like RAID0. However, it’s important to realize that the DJ’s ‘mirror’ and ‘combine’ modes are different from traditional RAID.

Mirror: In a traditional RAID1 mirror, the controller has a way to verify the consistency of the volume. That is to say, if you connect two drives in a RAID1, write some data, then remove one drive and replace it with another, it will realize a drive is inconsistent and begin matching up the drives to be consistent with one another. The DJ’s ‘mirror’ mode works differently: any writes go to both drives, but any reads only come from the disk connected to the ‘source’ side of the DJ.

This is an important distinction, because if you manage to connect the wrong disk to the ‘destination’ side of the DJ, it won’t realize that there’s a mismatch and then will blindly overwrite the data there.

We can test this by connecting two disks to the DJ in ‘mirror’ mode and writing a sequence of entirely null bytes to the volume. Examining the disks individually will show that both disks are full of nothing but null bytes. Now, connect one disk of the pair and overwrite the entire disk with hex 0x01. After, reconnect the pair, but keep the disk overwritten on the ‘destination’ side. Write hex 0xFF bytes to the first 512 bytes.

Examining the disks individually will show that both drives indeed have 512 bytes worth of 0xFF at the top. But the first disk will have 0x00 for the remainder, while the second will have 0x01. There is no consistency checking on the DJ.

Combine: This mode of the DJ operates like RAID0, except again, without consistency checking. Thus, it’s easy to accidentally swap the two drives, and the DJ will happily create a stripe without checking that the drives are connected backwards. This isn’t as fatal as in the ‘mirror’ scenario, but can be if the user continues with some kind of write operation.

So long as you’re aware of these quirks, the DJ is an excellent device of superb quality. The mirrored mode is especially useful as part of a on-site/off-site backup strategy. Its standalone functions are great time savers. The eSATA connectivity ensures fast transfer speeds, too. This device is well worth the money.

Backups

Everyone knows that they should take backups of their digital media. It also seems that everyone knows that everyone else rarely does so. As human beings, we tend to get a little sloppy about things that aren’t strictly necessary or of an immediate need.

Jamie Zawinski has a pretty good article about backups here, and you should read it.

Of course, everyone’s situation is different. I have a large RAID array (24TB), which meant deploying a single external disk for a backup wasn’t possible. I also tend to be a little extra paranoid about my data, so I had the following requirements:

  • Physically Redundant Storage: A copy of the backup must reside in two physical locations, so that if one burns to the ground, all of the data is safe at another.
  • Intensive Integrity Checking: It’s not good enough to just let a backup disk sit spinning and then write the changes to it. There must be a way to frequently check all of the data on the backup disk to ensure that it’s still a good backup when the time comes.
  • Ease of Use: An automated process that will begin backups automatically, without supervision, and then report backup success or failure after.

Problem #1: Physically Redundant Storage

A company called Diskology makes a great product called the Disk Jockey (“DJ”). The DJ allows you to connect two SATA disks to it to make quick on-the-fly disk mirrors, stripes, and also serves as a basic SATA disk dock as well. The version I picked up has USB and eSATA connectors. In the case of backups, I connect two disks of equal size to the DJ, select “mirror” mode, and then the DJ appears to the OS as a single disk. (For example, if there are two 2TB disks connected to the DJ, it shows up as one 2TB disk to the OS in “mirror” mode.)

Whatever writes you make to the DJ will be written to both disks in mirrored mode. Whatever reads you do from the DJ will be read from one. This has some interesting implications that you should be aware of, and I talk about them in greater detail here.

The result of all of this is that I keep one disk off-site at work. I bring home one side of the disk mirror from work every day, then attach it to the DJ along with the other side of the mirror I keep at home. When going to work in the morning, I do the opposite.

Problem #2: Intensive Integrity Checking

The problem with most “set and forget” backup regimes is that you might need some obscure piece of data from the disk down the road, only to find that the section of the disk where that data is has long gone bad. You don’t know that it’s gone bad because you’ve never tried to read it (in the case of data that rarely changes.) The solution to this is to always read the entirety of your backup disk during every backup cycle, and then report failures immediately.

The default behaviour of rsync is to simply check the modification time and file size, and if there’s a match, it doesn’t read the file on the backup disk at all. Many other backup solutions operate in a similar fashion.

I chose to solve this problem by using rsync’s –checksum (-c) option. This forces rsync to read each and every file on both sides of the backup to compare whether it should be replaced on the backup disk or not. The downside is that this is very slow, so in my case, a backup run will typically take 12 hours or longer.

An alternative to this would be to simply blow away the backup volume, and then do a complete backup on every backup run. There’s a big problem with this approach, though: if something happens during the backup run, you have an incomplete backup. The checksumming method ensures that the data on the backup volume is never erased ahead of time.

Problem #3: Ease of Use

So the backup procedure I have is now very simple:

  • After work, I attach the drives to the DJ,
  • A backup script runs automatically overnight,
  • I detach the drives from the DJ, keep one at home, and bring one in to work.

The script does all of the heavy lifting. It splits my array into easily managable chunks. The backup script has a –info flag that allows me to quickly see the status of all of my backups:

   Last Backup                  Used Free UUID
M  Tue Sep  2 20:00:04 PDT 2014 1.3T  66G 18e61a61-6502-6510-8086-0065d1917f97
S  Wed Sep  3 20:00:05 PDT 2014 1.4T 487G 0cfdeff4-6502-6510-8086-145408f4e658
Tb Sun Sep  7 20:00:06 PDT 2014 2.4T 374G c54d93c9-6502-6510-8086-7395b84d22d7
Z  Mon Sep  8 18:00:04 PDT 2014 627G 291G 57df37aa-6502-6510-8086-a9ac378f85d5
Ta Tue Sep  9 18:00:04 PDT 2014 2.2T 519G 45ff4122-6502-6510-8086-cf0a1f0ea6d7
Y  Tue Sep 16 18:00:14 PDT 2014 1.6T 295G 385afa54-6502-6510-8086-82120fc9d546
G  Wed Sep 17 18:00:03 PDT 2014 1.6T 264G 44d010a8-6502-6510-8086-f1032299ef49
A  Mon Sep 22 18:00:04 PDT 2014 1.5T 393G 08ece419-6502-6510-8086-59d4cb0e617c

The backup runs every day at 6:00pm (formerly 8:00pm – I had to push it back because the backup times were running too long for me to pick it up before work.) Regardless, I find that an hour between quitting time and backup start time is sufficient to connect the drives to the DJ. If I miss a backup day, it’s not such a big deal – you can see in this example, the oldest backup is about three weeks old.

Each letter to the far left represents a logical collection of files on the array. The “T” series is so large that it needs to span two 3TB disk pairs. Each disk contains a simple ext4 volume so that if the worst happens, it’s as simple as mounting it on virtually any Linux rescue boot image, and doing a single “rsync” to get the contents back.

If something goes wrong with a backup, it will be flagged in the status display.

The downside to all of this is that it’s possible for data to go for a long time without a backup (in this example, eight pairs of backup disks means it will take two weeks’ worth of working days before wheeling around to the first disk pair again.) That’s a risk I’m willing to take.

Ultimately it’s up to you how you craft your backup solution, but they should generally all fit the same mold: redundant, stable, and easy to use.

Cassette Tape Preservation

A few months ago, my grandfather passed away. He was 88 years old, and had lived a long, happy, and fulfilling life – there were no regrets or sour feelings about his passing.

While his belongings were being sorted through, they happened upon this:

2014-08-17 11.00.59

I have no doubt that this kind of thing happens all the time when someone passes away. I can’t imagine the number of unknown or blank CD’s, VHS, cassette tapes, and all kinds of other media that must be discarded as garbage. Who knows what they contain? At some point, it was probably important to the person who kept it.

So, I decided to preserve this tape and listen to what was on it. The first step was to dig through my box of USB miscellanea and revive some old hardware:

2014-08-17 11.01.29

This is an Ion Tape Express. It’s more or less the size of a walkman, but connects to your computer via USB. There’s a C-Media audio-to-digital chip within the enclosure which helps to minimize how far the analog signal must go before it’s converted to digital. You can pick one of these up at Radio Shack for about $60, and they’re fully Linux compatible.

I have no doubts that a better analog-to-digital conversion could be done with both a high-end tape deck and analog-to-digital converter. But for household amateurs such as myself, the Ion Tape Express is a good intersection of price and space (after all, how many tapes do you convert in a year?) I have no interest in taking up a lot of space with high end audio gear that won’t get used all that often.

The conversion is as simple as pressing “play” on the Ion, and then record in your favourite audio editing/recording software. In my case, I decided to use Audacity.

2014-08-17-175030_631x287_scrot

In the end, only the first 30 minutes of the first side of the tape had content. I recorded all 90 minutes of the tape and preserved it as a 41khz .wav file. Ultimately, the tape contained nothing of real value, but disk storage is so cheap and dense that it doesn’t matter: I’ve now digitally preserved something of my grandfather’s that should last for all time so long as it’s stored and backed up correctly by those who come after me.