This sites backend

Before this webpage reaches your screen many processes take place in the background. The setup of this website offers free web hosting through AWS S3 and blazing-fast content delivery and loading using Cloudfront and Gatsby. Let us delve into the setup of this website and how its content is delivered to its users.

side project webdev AWS

0015Jan '22

This blog post goes into the hosting setup of this static website. It is based on this very well-written tutorial by John Sobanski. The diagram¹ below describes the final setup of this website. You can follow this tutorial if you would like to build a site like this yourself, which is, therefore, not the purpose of this blog. This blog aims to indulge you in the (if everything goes well) hidden background processes that take place in order for this page to reach your screen. There are two reasons why this would be of interest:

Visitors can learn more about the great technological ingenuity hidden in content delivery.
Developers can learn more about their site setup which can be beneficial in case things go wrong or in case you consider improving your setup.

This blog will start with the question What is a static website really? and then explain each service from user to resource:

The Domain Name System (DNS) AWS Route53
HTTPS connection
Content Distribution through AWS CloudFront
AWS Simple Storage Solution (S3)

What is a static website really?

This website is static. What does that mean? How does that impact our development, deployment, and delivery? Websites can be categorized into one of two categories: Static and dynamic. Whereas pages of static websites are pre-built server-side, dynamic web pages are built upon request by a server.

Benefits of a static website:

Faster: Because pages are pre-build, there is no need to construct pages on the fly, reducing loading times.
Less error-prone: Because pages are pre-built, errors like references to missing files are caught when the developer builds the site, not when the user renders a page.
More secure: With static websites, your pre-built front-end web interface lives independently from your back-end system. Dynamic websites require running a server which is a potential way in for hackers.
Cheaper: Running a shared or dedicated web server can set you back a couple of dollars a month.
Becoming more popular: If you are like me, one reason to build a static site is because all the cool kids are doing it. Static websites are becoming more popular due to advancements in developer tools like Next.js, Hugo, Gatsby, and Jekyll.

Limitations and drawbacks of static sites

Limited functionality: The main drawback is that without running a server, static sites are limited in their functionality compared to dynamic sites. This renders them inappropriate for many applications like user-based log-in systems.
Static websites are harder to set up: Without some development expertise, is can be difficult to set up a static site generator. Contrast that to content management systems (CMS) like WordPress.

How the DNS guides you to the right resources

When you type a domain name into your browser, such as ”www.maartenpoirot.com,” your browser first needs to find out the IP address associated with that domain so it can connect to the server hosting the website. This process is where the DNS (Domain Name System) comes into play.

The DNS is like the internet’s phonebook. It translates human-readable domain names into IP addresses that computers understand. Here’s how it works step by step:

Local DNS Cache Check: Your browser first checks its local DNS cache to see if it already has the IP address for the domain name stored. If it finds a match, it can skip the rest of the process and directly connect to the server.
Recursive DNS Servers: If the IP address isn’t found in the local cache, your browser contacts a recursive DNS server, usually provided by your ISP (Internet Service Provider) or a third-party DNS service like Google DNS or OpenDNS.
Root DNS Servers: If the recursive DNS server doesn’t have the IP address cached either, it starts the process of finding the IP address. It first contacts one of the 13 root DNS servers scattered across the globe.
Top-Level Domain (TLD) Servers: The root DNS server then directs the recursive DNS server to the appropriate TLD server based on the domain’s extension (like .com, .org, .net, etc.).
Authoritative DNS Servers: The TLD server then points the recursive DNS server to the authoritative DNS server for the specific domain. This server holds the most up-to-date information about the domain’s IP address.
IP Address Retrieval: Finally, the recursive DNS server retrieves the IP address from the authoritative DNS server and sends it back to your browser.

Connection to the Website: Armed with the IP address, your browser can now establish a connection to the server hosting the website content. It sends an HTTP request for the specific webpage you requested (like ”www.maartenpoirot.com/the-how“) to that IP address.

Content Delivery: The server receives the request, retrieves the requested webpage, and sends it back to your browser, which then renders the content for you to view.

This entire process typically happens in milliseconds, allowing you to access websites quickly and seamlessly. DNS is crucial to the functionality of the internet, as it ensures that users can easily navigate to websites using human-readable domain names rather than having to remember complex IP addresses.

How HTTPS achieves private and secure browsing

In contrast to HTTP, HTTPS connections perform to actions to counter two types of cyber security threats:

Authentication When communicating you would want to be sure that the site you are communicating with is actually who you think it is. In other words you would not want to be communicating to an impersonation of whom you think you are communicating with. This threat is called spoofing and requires Authentication of the website.
Encryption By encrypting you protect your communications from being interpretable to somebody eavesdropping on your communication with a server. This type of threat is called a ‘man-in-the-middle-attacks’. Encryption is what allows your bank transactions to be safe over the internet. HTTPS encryption is standard nowadays. Not having it will show a crossed padlock left to the address bar which looks unprofessional, but more importantly it means your site will be flagged as “not secure” and is penalized by Google Search.²

1. Authentication: How can a certificate ensure a site's identity and why can't it be falsified?

Setting up the HTTPS connection is what is technically called a handshake. This is usually a one-way Secure Socket Layer (SSL). This is a handshake where only the identity of the server is authenticated, and not the identity of the user. To perform this authentication the server sends over its SSL certificate. The client has a Trust Store that it can consult to verify the HTTPS certificate. If the certificate itself not in the Trust Store, the client can still try if the issuer of the certificate of the certificate is trusted. This is called the certificate chain. You can actually take a look at your trust store in your browser quite easily.³

2. Encryption: How can a user and a website encrypt their communication without sharing the key they need to do it?

To me, the magic of HTTPS encryption is that a man in the middle cannot decrypt the communication even if it intercepts all information that the client and server have sent over in the handshake. I will briefly explain how this is achieved. This is where things start to get exciting.

The client and the server are going to need the same session key $K$ to encrypt and decrypt each other’s communication. However, they can not just send $K$ over, as a man in the middle could simply eavesdrop on the $K$ and decrypt the communication. Instead, they use something called a Diffie-Hellman (DH) key exchange.⁴

A core concept in DH is the concept of a one-way function. A one-way-function is a function that is fast to compute but hard to inverse. The most simple of a one-way-function involves modular arithmetic, which we will use in this example. Remember that the modulus of a number is what remains after division by another number. For example: $3\:mod\:10=1$ .⁵

DH key exchange exploits two tricks of modular arithmetic. The first is the Chinese Remainder Theorem, which states that we can calculate $x$ in $x=A\:mod(B)$ if we have a couple of $A$ and $B$ values. The second trick is that only two of these equations exist under the condition that $B$ is a prime and $A$ is a primitive root modulo $B$ . When $p$ is a prime, only two of these equations exist since a prime only has two divisors.

The server and client work together in sharing their components of the equation to help the other construct their equation in three steps:

The server and client agree on a generator $g$ (which is usually just the number 2) and a prime modulus $p$ . This is part of what is called the cipher suite.
The server picks a private key $x$ and calculates the public key $X = g^x\:mod\:p$ and shares it with the client.
The client encrypts a secret random string of bytes $y$ , which is called the “pre-master secret” $Y = g^y\:mod\:p$ and shares it with the server.

Now, both the server and client can compute session key $K$ . For the server now has $K = Y^x$ , and the client $K = X^y$ . The session keys can now be used to encrypt and decrypt communication between client and server. Note that a man in the middle would have had to acquire either local secrets $x$ or $y$ to access session key $K$ and decrypt de communication. All of this happens every time you make an HTTPS connection. Crazy right? If you were not sure if you wanted HTTPS on your site before, I am sure you would want it now.

CloudFront

This part might come to you as a surprise. “If we have a domain, and AWS S3 can provide a storage location that serves static websites, why would we need CloudFront?” you might wonder. When distributing your data directly through S3, you would be left with two issues: First, if people would visit your naked domain maartenpoirot.com they would not be redirected to www.maartenpoirot.com, and the page would not be found. Second, the website endpoint of S3 does not support SSL connections.⁶ However, the Representational State Transfer (REST) endpoints of S3 do. Thus, we need CloudFront to redirect and access the bucket through the REST API.

S3 content storage

Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Customers of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps. With cost-effective storage classes and easy-to-use management features, you can optimize costs, organize data, and configure fine-tuned access controls to meet specific business, organizational, and compliance requirements.

This diagram was generated using Mermaid. Mermaid is a JavaScript-based parsing engine.↩
Google Security Blog: HTTPS as a ranking signal ↩
ExpeditedSecurity: Take a look at your Trust Store ↩
More in depth information on Diffie-Hellman at Wikipedia ↩
In reality, SSL encryption uses an alternative called Elliptic-Curve DH because it provides the same security with smaller key size.↩
AWS docs: Key differences between a website endpoint and a REST API endpoint ↩