![[Preamble]]
When designing py.wtf we had these (non-functional) goals in mind:
- low *maintenance*: as a hobby project, it's important not to feel like a chore. We especially don't want to be oncall for the service.
- low *cost*: there is zero funding, all costs come out of our own pocket. For any service we rely on, ideally we should fit into their free tier.
- low *latency*: the end user experience should be snappy. Admittedly this is less important than the above two (which are existential goals - if we don't hit them, the project is likely to shut down)
Early on we've come to the conclusion that we'll have to write our own front-end if we wanted to make the UI simple and pleasant to use. This seemed like an okay tradeoff since we weren't planning on doing anything fancy for UI anyway. A simple nextjs website would do.
But then we realized that nextjs supports [static site generation](https://nextjs.org/docs/pages/building-your-application/rendering/static-site-generation) as well as [client side rendering](https://nextjs.org/docs/pages/building-your-application/data-fetching/client-side), both of which allow us to completely skip having any backend component. This is great for latency (all content is cached by CDN), maintenance (no moving pieces), and costs (compute is expensive), and allows the following architecture:
![[py.wtf-architecture.excalidraw.svg]]
## Frontend
The front-end is a fully static [SPA](https://en.wikipedia.org/wiki/Single-page_application), implemented using nextjs. The browser only needs to load index.html and the attached JS/CSS code, then the entire app runs in the browser using [client side rendering](https://nextjs.org/docs/pages/building-your-application/data-fetching/client-side). Only static content is fetched (from CDN) based on user actions, which is then rendered by the JS that was initially loaded.
Compared to statically generating all HTML in advance (an approach many other documentation sites use), this significantly reduces storage (and bandwidth) costs: HTML is quite chatty compared to JSON. At the same time, it isn't great for SEO: search engines generally see the empty `index.html` without much content by default, unless they start evaluating JavaScript. Most search engines choose to do this, but the results vary from [pretty good](https://www.google.com/search?q=requests+site%3Apy.wtf) through [meh](https://www.google.com/search?q=django+site%3Apy.wtf) all the way to [downright terrible](https://www.google.com/search?q=libcst+site%3Apy.wtf).
Having a great landing page with fast, ergonomic in-app search capabilities therefore are a high priority.
## Storage
As of 2024-05-29, the storage hosting all the static files required for the website is [GitHub Pages](https://pages.github.com/). This is very convenient because it's [really easy](https://github.com/zsol/py.wtf/blob/a34b7802b8ccb13ccd0990c7b6ffc0769977beae/scripts/publish.py#L76-L92) to publish a new release, it's free, and was easy to get started with.
As we start publishing more than just the top 500 most popular pypi projects (and their dependencies), we will probably look at transitioning away from GH Pages and [[Picking a better storage provider]] at least for hosting the Index JSON files because:
1. these change relatively frequently, and version control doesn't provide too much value here
2. the size of the repo is bound to bump into reasonable limits (it's currently sitting at ~200MB)
## CDN
Cloudflare was a pretty simple choice for our CDN. It's generous free tier will likely fit our use case even after we've begun serving all PyPI-published packages, it was dead simple to set up, plus it allows us to configure "Rewrite rules", which is how urls like `https://py.wtf/requests/requests.api/get` don't end up giving a HTTP 404 with our SPA frontend design. This is essential to make SEO, but more importantly, deeplinks to a particular API doc work.
Here's all we needed to do on the Cloudflare side:
![[Cloudflare rewrite rule.png]]
This will rewrite any request other than `/` (the landing page), `/_index/` (where the actual content JSON files are stored), and `/_next/` (where nextjs infra is deployed) to go to `/_spa` instead, which will load up the entire frontend code.
Importantly, this rewrite happens at the CDN layer, so the URL in your browser isn't actually modified (this is not a redirect). The frontend code then goes and inspects the (unchanged) URL and processes the request accordingly - in our case, it loads the corresponding JSON file from under `/_index/`.
Cloudflare also supports storing static content in compressed form on the origin, decompressing on the fly to the user, which will enable us to push down our storage costs even further. More on that [[Picking a better storage provider|later]].