Let’s Talk Tech: How the Internet Works Part 1: The IPv4 Packet
Welcome to the first episode of our new Let’s Talk Tech series: How the Internet Works! Today, our big boss, John Brewer (AKA Doctor Deep Core) will take you on a romp through the wonderland that is IP.
Full transcript is below the video. Enjoy!
Introduction to IPv4
Hello, and welcome to Let’s Talk Tech with Deep Core Data. I’m John Brewer, the owner and Founder here at Deep Core, where we’re using education to help businesses make the right tech choices. In this series we’re going to be talking about, at a technical level, how the Internet works.
The Internet is a massive, sprawling network; an unseen machine that carries information across all borders, and allows people who may never even see each other to work together. Nobody directly controls or regulates this virtual machine, so how is it that all of these devices, from smartphones to servers, built by different people with different technologies in different countries, all play nice together?
That’s what we’ll be talking about in this series, “How the Internet Works.”
And to begin with, we’ll be talking about one of the most fundamental pieces of the Internet, the aptly named Internet Protocol, or as it’s more commonly known, IP.
The Internet Protocol is our starting point in this series because it is the baseline that virtually all internet activity runs on. If you think of it as “the Internet”, it runs on IP.
There are currently two major versions of the Internet Protocol, IPv4 and IPv6.We’re going to be talking about IPv4 this episode, since most traffic is still routed with it, and we’ll talk about the newer IPv6 later.
So what does IP do?
At its simplest, IP takes a message from one device and delivers it to another device. It’s like a post office delivering a letter from one address to another. All the massive data centers around the world are basically just delivering these little letters, called packets, from one device to another.
In order to deliver a packet, we need to know what computer it’s going to; like having an address on a letter. Computers on the Internet all have addresses called IP Addresses. When a computer wants to send a message to a different computer, it creates an IP packet; the letter, in our analogy. And it places the address of the destination computer on the IP packet, then sends it off into the internet to be delivered.
The process of delivering is called routing…or rooting, if you live in the UK.
So what does the content of an IP packet actually look like?
Like many low-level protocols, the IP packet has a header and a body. You can think of it as a form with 15 fields across the top.
The first field is the version number. We’re talking about IP Version 4, so that number is a 4. In future versions of the protocol, it could go up as high as 15.
The second field is the Internet Header Length field, which is the length of the entire header, given in 32-bit words. Essentially, the length of the header in bytes divided by 4.
As an aside; we’ll frequently be using the word byte to mean eight bits of data because that’s the term most people are familiar with. However, in many technical documents and specifications, you will find the term octet.
The third field is the Differentiated Services Control Point, or DSCP. This field is used to define different classes of services; for instance, real-time information, a Skype call, might want to get routed to its destination as quickly as possible, whereas something like email we can afford to let pass through the network a bit more slowly. All of the IP headers have significant implications associated with them, but DSCP might be one of the most interesting…and a topic for a future discussion.
The next field is the Explicit Congestion Notification (or ECN). When IP was first developed, sometimes packets would just get ignored by a router while en route and just be lost altogether. This was called dropping the packet, and was (and still is) considered a Bad Thing. So to combat this Bad Thing, we have ECN.
The ECN field is filled in by the routers. If they receive a packet and they are close to having to drop it, they’ll put a 3 in the ECN field. When the packet is delivered to its destination, the destination sends a packet back to the sender, telling it to slow down because the network is becoming congested and is in danger of dropping packets. All of this in an attempt not to lose data while in transit.
Following the ECN field, we have TL or total length. This is the length of the packet in bytes. This includes the header, so it’s always at least 20 characters in length, although it can be as much as sixty-five thousand, five hundred and thirty five (65,535) characters. The official specification says that computers on the network only are required to handle packets up to 576 bytes, but almost all systems handle much larger packets. Larger packets can be sent through networks with smaller limits by using a process called fragmentation.
Speaking of fragmentation, there’s a whole set of fields that deal with just with fragmentation in the IP header. We’ll be talking about those fields now, but we’ll save the details of fragmentation for a future episode.
The first of these headers is the identification number, a 16-bit integer that groups a bunch of packets together. If all of packets have a particular identification number in common, then that means they were all previously part of the same packet.
After that, each IP header has 3 “flags”, or yes-no boxes on the packet header.
The first flag is easy; it’s reserved for future expansions to IPv4, so it’s always set to false. The second flag is the “Don’t Fragment” flag. It tells the routers that if they can’t handle a packet this large, don’t bother fragmenting it, just drop it on the floor. The third flag is the”I’m a Fragment” flag. If it’s set to yes, it tells the destination to be ready for other fragments of the original packet so that it can reconstitute them.
The next field is the Fragment Offset; if this is an unfragmented packet, or the first of a fragmented packet, this number is zero. If not, this says what byte in the original packet of data that this fragment starts at so that it can be reassembled correctly later.
And that’s the end of the fragmentation section of the header.
It’s time to move on to the next set of fields, beginning with the Time to Live, or TTL. Originally, this was intended to refer to an actual amount of time, but in practice, this counts how many times a packet is handed from router to router on its journey from source to destination. Each time a packet is forwarded, the router reduces the TTL field by 1.
If the TTL field reaches zero, the router drops the packet and sends a message back to the source, telling it that its packet was dropped. This system prevent packets that may have gotten caught in an infinite loop from accumulating and bogging down the network.
Next up, we have the control protocol. There are well over a hundred different protocols this could be, but in practice most protocols are TCP (the Transmission Control Protocol), UDP (User Datagram Protocol), or ICMP (Internet Control Message Protocol.)
So, why do I need another protocol running on top of IP?
Well, IP handles packets primarily, whereas something like TCP, for example, handles connections between computers. TCP also does error checking and other things, to ensure that packets all arrive in the correct order.
The checksum for the header is essentially all the bytes in the header added together into one number. When a router receives the packet, it adds up all the bytes in the header and if it gets the same answer, it knows the packet made it through all right. If it doesn’t get the same answer as the checksum, it knows that at least one bit got flipped in transfer, and it drops the packet.
Finally, we’ve got the IP addresses themselves. Now, an IP address is composed of 4 numbers, written usually as number dot number dot number dot number. For instance, a home router might be 192.168.1.1; 184.108.40.206 would be facebook.com; and 220.127.116.11 would be Google’s public DNS server.
Each IP address is stored, just as those 4 numbers are, in the IP packet. It has the source address, and it has the destination address.
After the addresses, there’s a few option fields. We can tell by the Internet Header Length field; if it is more than 5, there are a few options attached to this packet. Options are relatively rare, so we’re not going to go into them in depth here.
Last, but definitely most important, is the whole point of the IP packet: the data. Now, IP doesn’t care what the data is, it’s just a set of data. As long as the length of the data is the same as the number of bytes given in the IP packet it header, it just keeps on moving right down the road.
That’s it. This little form carries the weight of the information age with it.
Thanks for watching our first episode of “How the Internet Works.” If you liked what you saw, please subscribe to our channel. If you have questions, suggestions, or ideas for future videos, please leave a comment below.
Once again, I’m John Brewer for Deep Core Data. Thanks for watching.