HTTP and Web Applications
The web is built on a protocol called the HyperText Transfer Protocol (HTTP). To build proper web applications, it is essential to understand how this protocol works. The HTTP specification explains it all, but since it's a specification it can be quite hard to read, so let me give you a quick introduction to HTTP before you read the details in the HTTP specification.
Lecture material
Recommended reading
- Introduction to HTTP by Launch School
- Up to (including) the chapter HTTP --> Processing Responses
- HTTP/1.1 specification
- Don't read and learn everything by heart, but just enough so you get comfortable looking up things (methods, status codes, etc.) in it.
Interesting reading
Clients and Servers
The HTTP protocol is built on a client-server architecture. That means that some computers on the web acts as servers, and all other computers acts as clients. When you browse the web through a web browser, your web browser is a client.
A client can send an HTTP request to a server, asking the server to do something for it. When a server receives the HTTP request, it should carry out the request, and then send back an HTTP response to the client. This is visualized in below. Having a server on it own is useless; servers exist to serve clients (that's why we call them servers).
For example, when you click on a link in your web browser, your web browser sends an HTTP request to a server, requesting the server to send back the webpage the link leads to. When the server receives this request it generates the webpage/loads it from a file and sends it back in an HTTP response, and when your web browser receives the HTTP response it displays the webpage on the screen.
In general, any computer can act as a client or a server. It is also possible for a computer to be both a client and a server at the same time. For example, when you (acting as a client) send an HTTP request to Server A
, that server might in turn send an HTTP request to Server B
to handle your request. Then Server A
acts both as a server and a client at the same time. This is for example used when you login with your Google account on a website that doesn't belong to Google. Then the website you send your login request to will send its own requests to Google to check which Google account that belongs to you (this example is a bit simplified; in practice it is a bit more complicated than this).
Resources and URIs
Servers in HTTP are expected to contain resources. It is up to each server to decide what type of resources it should contain, but some resources commonly found on websites includes:
- Images (
.png
files,.jpeg
files, etc.) - Sounds (
.mp3
files, etc.) - Videos (
.mp4
files, etc.) - Documents (
.pdf
files,.docx
files,.txt
files, etc.) - Static webpages (
.html
files,.css
files,.js
files, etc.) - General data, such as:
- Accounts
- Blogposts
- Guestbook posts
- Articles
- Private Messages
- Comments
- etc.
Some resources (images, videos, documents, etc.) are simply stored as files on the server, while other resources can be stored in a database, or some other way. For now, we don't really care about how the resources are being stored on the server, as long as they are stored somehow.
When a client wants to work with a resource, it sends an HTTP requests to the server, asking the server to do something with the resource. There are primarilly four different types of requests clients can send. These are known as CRUD operations, and they are:
- Create (ask the server to create a new resource)
- Retrieve/Read (ask the server to send back a resource)
- Update (ask the server to change a resource)
- Delete (ask the server to delete a reosurce)
In English, an HTTP request could say something like:
- Send back the contact page to me
- Delete the last comment I wrote
- Change the title of the blog post I wrote yesterday to "Congratulations" (the client sends the new title to the server)
- Create a new friend relationship with the user Alice for me (the client sends a new resource representing the friend relationship to the server)
Example
When you create a new account on a website, your web browser sends an HTTP request to the server asking the server to create a new resource representing your account containing your username and password (and possibly some additional information).
Each request sent to a server is about doing something with a resource. The client tells the server which resource that is through a Uniform Resource Identifier (URI). Each resource on the server should have a URI that uniquely identifies it, and it is the server that decides which URI each resource should have. Each time you view a webpage (webpage = resource) in a web browser, you can see the URI that uniquely identifies that webpage in the address bar (usually located at the top of the web browser).
For example, Wikipedia's article/webpage about Mathematics has the URI /wiki/Mathematics
, while their article/webpage about Computer Science has the URI /wiki/Computer_science
.
The URI specification specifies the format of URIs. Simply put, it is a sequence of characters, where the slash character /
is used to group relevant resources together. URIs also include the protocol used to access the resource (e.g. HTTP) and an identifier indicating which server that stores the resource (a domain or an IP adress), so a complete URI could for example be https://en.wikipedia.org/wiki/Mathematics
, where https
indicates that the HTTPS protocol should be used to send the HTTP request, and the computer containing the resource is the one with the IP address the domain name wikipedia.org
maps to.
HTTP VS HTTPS
The web is built on the HTTP protocol. The HTTP protocol is not encrypted, so it's a bad idea to send sensitive information (such as passwords, credit card numbers, etc.) using it. Therefore the HTTPS protocol were invented. HTTPS does more or less work the same way as HTTP, but with the addition being encrypted, so even though you learn HTTP here, you can just as well later use HTTPS.
URIs usually identifies a single resource or a collection of multiple resources. It is quite common that the beginning of a URI identifies a collection of resources, and the end of the URI identifies a specific resource within that collection. For example, the URI /accounts
could identify the collection of all accounts on a website, and the URI /accounts/Alice
identifies the specific account with the username Alice. We can also see this pattern in the example before with URIs on Wikipedia.
Requests
So, a client can send an HTTP request to a server to request the server to do something for it. The HTTP specification have specified the structure of these HTTP requests, so it is very important that clients send HTTP requests using the structure specified in the HTTP specification. If they don't, servers will not understand the request and simply ignore it or send back an HTTP response indicating that something was wrong with the request.
The structure of an HTTP request is shown in below.
The first line in an HTTP request is called the request line. It consists of three parts.
The first part on the request line is called the METHOD
. It is also known as the verb, but the HTTP specification calls it method, so it is better to stick with that. The method indicates what the client wants to do with the resource identified by the URI
(the second part). For example, the GET
method indicates that the client wants to retrieve the resource, so the server should send back the resource in the HTTP request. The DELETE
method indicates that the client wants the server to delete the resource, so the server should delete the resource before it sends back an HTTP response.
The third part on the request line, VERSION
, indicates which version of HTTP the client is using, so the server can use the same version. A common value used here is HTTP/1.1
.
Below the request line you find the headers. These are key-value pairs with additional information about the request. For example, the Host
header is used to indicate the domain name of the server the request is sent to (for example Host: ju.se
), and the Accept
header is used to indicate which data format the client would like to get back the requested resource in (for example Accept: text/html
).
Below the headers you find the body of the request. Here the client can pass a resource to the server in the HTTP request. This is used when the client wants to create a new resource on the server or update an existing resource. If no resource needs to be sent to the server, you omit the body (leave it empty/blank).
An example of an actual HTTP request is shown in below.
Request Headers below describes some of the headers you can use in an HTTP request.
Name | Example | Description |
---|---|---|
Host | Host: nintendo.se | Identifies the domain the HTTP request is sent to. |
Accept | Accept: text/html | Identifies the data format the client wants the resource back in. |
Content-Type | Content-Type: application/json | Identifies the data format the body of the HTTP request is written in. |
Methods
There exists a bunch of different HTTP methods, but web developers primarily need to know four of them: GET
, POST
, PUT
and DELETE
. These maps well to the commonly used CRUD operations:
- Create:
POST
- Retrieve:
GET
- Update:
PUT
- Delete:
Delete
A client can send a GET
request to a server to tell the server to send back the resource identified through the URI in the request. A GET
requests contains no body since the request is only about fetching a resource from the server, and not sending a resource to the server.
Example
GET /accounts HTTP/1.1
Host: game-site.com
Accept: text/html
A client can send a POST
request to a server to tell the server to create a new resource. The URI in the request should identify what type of resource that should be created, and the body of the request should contain the resource.
Example
POST /contact-messages HTTP/1.1
Host: a-store.com
Content-Type: application/json
{"name": "Alice", "message": "Hi, I bought a thing from you, and it's not working. Please get back to me with info on how to proceed.", "email": "alice@wonderland.com"}
A client can send a PUT
request to a server to tell the server to replace an existing resource on the server with a new one the client sends it. The URI in the request should identify which resource on the server to be replaced, and the body of the request should contain the new resource.
Example
PUT /diary-entries/2020-02-13 HTTP/1.1
Host: diaries.com
Content-Type: application/json
{"date": "2020-02-13", "message": "Today was a wonderful day, Kim proposed to me, and I said yes :D We will live happily ever after now. (update 2020-04-15: Me and Kim are no longer together)."}
A client can send a DELETE
request to a server to tell the server to delete all resources on the server identified through the URI in the request.
Example
DELETE /guestbook-posts/123 HTTP/1.1
Host: football-lovers.com
PUT and DELETE not in HTML
In the language used to build web pages, HTML, it's only possible to send GET
and POST
requests. Therefore, web developers rarely use PUT
and DELETE
requests. On websites, PUT
and DELETE
requests are often implemented as POST
requests, and the URI is used to indicate if it should be a create, update or a delete operation.
Responses
When a server receives an HTTP request, it should look at the request (i.e. the method and the URI) to figure out what the request is about, then carry out the request and then send back an HTTP response. The structure of an HTTP response is shown in below.
The first line in the HTTP response is called the status line. It first contains the HTTP version the server is using (VERSION
), and then a STATUS_CODE
(a three digit number) indicating how the server handled the request (e.g. did it carry out the request? Or why didn't carry it out?), and then a REASON_PHRASE
, which just is a very short human readable text for what the status code mean. For programmers using HTTP, it's primarily the status code that's interesting to look at.
Just as HTTP requests, HTTP responses can contain headers and a body too.
Request Headers != Response Headers
Although HTTP requests and HTTP responses both contain headers, which headers they can contain depends on if it's a request or a response. For example, the Accept
header can only be used in HTTP requests, and the Content-Type
header can be used in both HTTP requests and HTTP responses.
Status codes
The three digit status code can either start with the digit 1
, 2
, 3
, 4
or 5
. Which digit it is hints about how the server handled the request.
The 1XX
status codes indicate an informational response. These aren't that important to know for programmers using HTTP.
The 2XX
status codes indicate that the server successfully carried out the HTTP request. The most commonly used ones are (reason phrase shown after the status code):
200
OK
The server carried out the request and sends back a resource in the body of the response.201
Created
The server carried out the request and a new resource was created while carrying out the request. TheLocation
header in the HTTP response contains the URI for the newly created resource.204
No Content
The server carried out the request, but the response contains no body.
The 3XX
status codes indicate that the server encourages the client to send a new HTTP request using another URI (i.e. redirecting the client).
302
Found
The server encourages the client to send the same HTTP request again but with the URI specified in theLocation
response header instead. This is useful if you change URIs on a website. For example, first maybe/about-us
was used to identify the about page on the website, and then you changed that to just/about
. Then when a client sends aGET
request to/about-us
you can send back a302
response with the headerLocation: /about
to indicate that the URI for the page has changed to/about
.
The 4XX
status codes indicate that the server didn't carry out the request because the client has done something wrong with the request. The client needs to fix the problem and try again.
400
Bad Request
The server didn't carry out the request because something is wrong with it. The body of the response can contain additional information about what's wrong. Better to use another4XX
status code if a more specific that describes the problem exists.401
Unauthorized
The server didn't carry out the request because the client is not authorized to request it (the client might for example need to login first).404
Not Found
The server didn't carry out the request because the URI in the request doesn't identify a resource that exists.
The 5XX
status codes indicates that the server couldn't carry out the request because something is wrong on the server-side. Maybe the server crashed, or maybe the server needs an external resource (e.g. a database) to carry out the request, and it doesn't have access to it the moment, etc.
500
Internal Server Error
The server couldn't carry out the request.
World Wide Web
Terminology
A web application is the application running on the server that receives HTTP requests and sends back HTTP responses.
A webpage is a single page you can view in your web browser. Each HTTP response usually contains a single webpage.
A website is the collection of all webpages on one and the same server (i.e. all webpages on the same domain, for example all webpages at ju.se).
So, why is it called the world wide web? Webpages can contain links to other webpages, and if you try to visualize this structure, you end up with something looking like a web, as shown in below (some imagination required!).