HTTP

Application-layer protocols

  • Transport-layer service models: Transport-layer sends data (TCP, UDP)
  • Client-server paradigm (HTTP)
  • Peer-to-peer paradigm (BitTorrent)

HTTP

Web

Web page consists of objects, which can be HTML file, JPEG image, audio file, etc.
Web page consists of base HTML-file which includes several referenced objects, each addressable by a URL.

HTTP overview

HTTP(hypertext transfer protocol) is a web's application-layer protocol.

It is a client/server model:

  • Client: Browser that requests, receives, and displays Web objects
  • Server: Server sends objects in response to requests

Actually client is way harder than server! Browser is complicated...

HTTP uses TCP, and it's stateless. (Server maintains no information about past client requests)
That's why HTTP server is fast!

HTTP connections

  • Non-persistent HTTP: At most one object sent over TCP connection
  • Persistent HTTP: Multiple objects can be sent over single TCP connection

Persistent HTTP is efficient, but it can be a burden on a server, so we use non-persistent HTTP.

Non-persistent HTTP

  • At most one object is sent over TCP connection
  • Downloading multiple objects requires multiple connections

Response time of non-persistent HTTP

Round-trip time (RTT): Time for a small packet to travel from client to server and back.
Does not includes transmission time!!

RTT: From packet sent to first byte received
Transmission time: From first byte received to last byte received

HTTP response time: 2 RTT + file transmission time
(1st RRT initiates TCP connection, 2nd RTT requests HTTP request)

Persistent HTTP (HTTP 1.1)

  • Server leaves connection open after sending response
  • Subsequent HTTP messages between same client/server sent over open connection
  • Client sends requests as soon as it encounters a referenced object
  • Requires little as one RTT per object!

HTTP 2 has same message as HTTP 1.1, but it allows multiplexing - you can send part of the referenced objects!

HTTP message

There are two types of HTTP messages: request, response.
ASCII is used for HTTP messages.

HTTP request message

First line is request line: (method) (URL) (version)\r\n e.g. GET /index.html HTTP/1.1\r\n
Other lines are header line: (header field):(value)\r\n
Connection: keep-alive\r\n means Persistent HTTP
Headers ends with empty line \r\n.
Depending on method type, body can be followed after headers.

  • GET: get object from server
  • POST: send user input to server
  • HEAD: get metadata of object from server
  • PUT: upload new file (and replace old file) to server

HTTP response message

First line is status line: (version) (status code) (reason)\r\n e.g. HTTP/1.1 200 OK\r\n
Other lines are header line: (header field):(value)\r\n
ETag: a5b-52d015789ee9e is a hashed value of the object that is used for cache validation.
Accept-Ranges: bytes means you can query with byte ranges, so you don't have to get the whole object.
Headers ends with empty line \r\n.
Depending on method type, body can be followed after headers.

  • 200 OK: request succeeded
  • 301 Moved Permanently: requested object is moved, new location is specified in header and there is no body
  • 400 Bad Request: request message is not understood by server
  • 404 Not Found: requested object is not found on this server
  • 505 HTTP Version Not Supported: server doesn't support this HTTP version

Cookies: maintaining user/server state

All HTTP requests are independent of each other.
But sometimes we need state e.g. we want to recover from a partially-completed transaction.
Websites and client browser use cookies to maintain some state between transactions.

  1. HTTP response message includes cookie header line. set-cookie: 1678
  2. Client browser stores cookie file.
  3. Subsequent HTTP requests will include cookie header line. cookie: 1678
  4. Server can identify client with cookie and perform cookie-specific action.

Cookies can used for:

  • Authentication: who are you?
  • Authorization: what can you do?
  • Shopping carts
  • Recommendations
  • User session state

Cookies and privacy

  • First party cookie: cookie from website you choose to visit
  • Third party cookie: cookie from website you did not choose to visit (e.g. ads, youtube, ...)

If other site's object is needed, client requests with header Referer: nytimes.com.
Third party site can track user behavior on a given website, or even across multiple websites.

GDPR (EU General Data Protection Regulation) says if cookies can identify an individual, explicit and informed user consent should be obtained before setting any cookies.

Web caches

Forward caches (proxy servers)

  1. Browser send all HTTP requests to a local web cache server.
  2. If object is in cache, it returns object to client.
  3. If object is not in cache, web cache server requests object and stores it.

When bandwidth was very expensive, forward cache was useful because you don't have to pay requests to a local server.
However forward caches is not used these days because it's actually slower than direct requests, and cache ratio is about 30~40%.

Request/response header can include Cache-Control: max-age=<seconds> or Cache-Control: no-cache to tell object's allowable caching.

Reverse caches (reverse proxy)

Forward cache is closer to the client than the origin server, but reverse cache is closer to the origin server than the client.
i.e. Forward cache is a cache for client, while rever cache is a cache for server.

Forward cache has to cache every webpage client is visiting.
Reverse cache only need to cache data from specific server!

It can be also used as CDNs!

  • CDN can protect actual server from DDoS attacks.
  • CDN can distribute request across multiple servers around the world.
  • CDN can find closest (therefore fastest) server from client.

Browser caches

  1. Client(Browser) caches objects and store date it has cached.
  2. Client includes If-modified-since: <date> in HTTP request.
  3. Server responses HTTP/1.0 304 Not Modified if browser-cached copy is up-to-date.
    Server responses HTTP/1.0 200 OK and data if browser-cached copy is outdated.
  4. Server don't send object if browser has up-to-date cached version!

Multi-object HTTP requests

HTTP/1.1

HTTP/1.1 introduced multiple, pipelined GETs over single TCP connection.
However, server should respond in order (FCFS: first-come-first-served scheduling) to GET requests.

HTTP/1.1 requires Host header to support virtual domain hosting.
Virtual domain hosting refers to hosting/serving multiple domain names on a single server/machine.

  • Head-of-line (HOL) blocking: small object may have to wait for transmission until large object is sent.
  • Loss recovery (retransmitting lost TCP segments) stalls object transmission.

HTTP/2

HTTP/2 solves HOL blocking!

  • Methods, status codes, most header fields remain unchanged from HTTP/1.1.
  • Transmission order of requested objects is based on client-specified object priority.
  • Objects are divided into frames (not fixed length), and frame are scheduled to mitigate HOL blocking.
  • Server can push unrequested objects to client. For example, server can push css, js files while responding index.html.
  • Not related to multi-object HTTP requests, but HTTP/2 introduces TLS encryption.

But HTTP/2 still has issues:

  • HTTP/2 uses single TCP connection.
  • Loss recovery problem still exists.
    • Actually to solve HOL blocking problem, browsers using HTTP/1.1 typically opened multiple parallel TCP connections.
  • Most HTTP/2 connection uses TLS encryption, but it's not forced.

HTTP/3

HTTP/3 has TLS by default, uses UDP (specifically QUIC), solves loss recovery problem.