Demystifying Internet Infrastructure: A Comprehensive Guide for Software Engineers

Demystifying Internet Infrastructure: A Comprehensive Guide for Software Engineers

What happens when you type google.com in your browser and press enter

This has been an age-old question. A good understanding of what happens here can make employers think highly of you because it shows you know what you do. The answer to this question varies depending on what area of tech you are focused on. Say you are a frontend engineer, this question requires you to talk more about the DOM rendering. If you are a SRE, this question requires you to talk more about how load balancing works.

This article covers how a software engineer can answer this question. If this is your first time hearing about DNS, this is going to be a great ride.

DNS Lookup

Firstly, What is DNS? Domain Name System is a large distributed phonebook that maps domain names to IP addresses. Humans access information online through domain names, such as nytimes.com or espn.com. DNS translates the domain names to IP addresses so browsers can load Internet resources.

When you enter google.com in the browser, let's take Chrome for example. It checks its DNS cache if it exists. If it doesn't it uses the OS DNS service to resolve the domain IP address. If the DNS service can't find it in the cache and host file, then it requests the DNS server configured on the network stack.

TCP/IP

When the browser gets the IP address of the domain name, it uses the IP and port number, which defaults to 80 for HTTP connection and 443 for HTTPS connection, to make a TLS handshake with the web server. This creates a connection session between the browser and the web server for other HTTP requests.

When making this connection the browser using the OS network stack, crafts an IP packet containing the connection message, how the packet will get to the server, and how the server response can get back to the browser.

Firewal

A firewall is a network security software that monitors incoming and outgoing network traffic and decides whether to allow or block specific traffic based on a defined set of security rules.

The network packet sent from the browser needs to pass the server firewall check before it can get to the web server for a connection handshake. If the packet doesn't pass the firewall inspection the traffic is rejected, else approved.

Firewalls typically work on the network layer, the transport layer. However, some are also capable of working as high as the application layer, Layer 7.

HTTPS/SSL

If the connection to be created requires an HTTPS connection, the browser uses port 443 and ensures the packet sent is encrypted. Secure Sockets Layer (SSL) certificates, sometimes called digital certificates, are used to establish an encrypted connection between a browser or user’s computer and a server or website.

You can use the lock icon in Chrome search bar to inspect if the browser established a secure connection with the server. You can also see the certificate authority that provided the Digital certificates to the server.

The encrypted connection helps secure data transmitted over the internet. It helps the server verify that HTTP responses were sent by the real server it is connected to. Also ensure that no other person on the internet can inspect the data transmitted, helping users log in or card information stay secure.

Load-balancer

Let's say the web server is getting too many HTTP requests due to an increasing user base or user activity. One solution is to increase the overall capacity of the server housing the web server, this is known as vertical scaling. But this is not the best solution as it's very expensive and requires heavy migration if it's needed to move the web server data to a new server. A solution most web service providers go for is vertical scaling. Instead of increasing the capability of a single server, they create more servers.

This brings up a new problem, how will the browser locate the web server to connect to? That's where and why a load balancer comes into place. It ensures requests from the internet pass through a single point and then distributes requests to its servers according to its algorithm.

Some load balancers can be configured to direct requests to certain servers depending on the HTTP request. Load balances can be used on different network layers. Layer 7 load balancers route network traffic in a much more sophisticated way than Layer 4 load balancers, particularly applicable to TCP‑based traffic such as HTTP. A Layer 7 load balancer terminates the network traffic and reads the message within.

Web server

Web servers deliver static content, like HTML pages, images, videos, and files. Examples of web servers are Apache and Nginx. It uses the HTTP request to load the static content to send back as an HTTP response. This response is what is later rendered to you in the browser. google.com css, js, and images which are sent back to your browser and are then displayed for you to see and use.

Web servers play a very important role in this whole process, they are the main entity that serves most web content. Static files could have also been configured with CDNs to serve static content faster to you based on your location.

Application server

This is closely related to web servers but serves dynamic content instead. Dynamic content means you might not get the same content as someone living far away from your country or continent. Let's take for example google recognizes a popular holiday in your location, the content you will get will be different from another person in another country that does not celebrate that holiday.

Application servers are also widely used on the internet today to give users personalized experiences when using the internet. It's closely related to the idea of having sessions, where a user is identified by a fixed data like IP address or Authentication data.

Application servers execute programs deployed for them. They send the HTTP requests to these programs which then decide on what HTTP response to send back to the application server. This decision could vary on different data, an example is why Google might use your IP or GPS location to serve you something unique.

Database

A database is an organized collection of structured information, or data, typically stored electronically in a computer system. Programs executed by the application server might require a connection to a database to get information related to HTTP requests. This allows for persistent dynamic content.

From the example used earlier, the information regarding the holidays of several countries and the content to be displayed stay in a database, making it persistent and easily accessible by the program.

Conclusion

In conclusion, understanding the intricate workings of the internet's infrastructure, from DNS lookups to load balancing, and the critical components that power it, such as web servers, application servers, and databases, is paramount for any software engineer. This knowledge not only showcases a deep understanding of the technological ecosystem but also demonstrates a level of expertise that can earn respect and trust from employers.

DNS, the backbone of the internet, translates human-readable domain names into IP addresses, allowing us to access online resources effortlessly. The journey from a user's query to the actual web server involves several crucial steps, each with its significance.

TCP/IP and firewalls play essential roles in establishing secure connections and ensuring data integrity. The use of HTTPS/SSL certificates further enhances security and privacy for users, safeguarding their data from potential threats.

In the face of increasing user demands, load balancers come to the rescue by efficiently distributing traffic across multiple servers, ensuring a smooth and responsive user experience. This scaling strategy is not only cost-effective but also minimizes the need for complex server migrations.

Web servers deliver static content, while application servers provide dynamic, personalized experiences for users. These elements, along with databases, form the foundation of modern web applications, enabling the persistent storage and retrieval of information.

In a world where digital interactions have become a fundamental part of our lives, comprehending the intricate web of technology that underlies it all is not only a valuable skill but a testament to a software engineer's prowess in shaping the digital landscape. With this knowledge, engineers can create robust, secure, and efficient systems that drive the Internet forward, making it a safer and more dynamic environment for all its users.