-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid some unnecessary redirects #1953
Conversation
The official `git-scm.com` website is currently built by Hugo with `--baseURL http://git-scm.com/`. This commit allows changing the URL scheme to `https`, in order to avoid unnecessary redirects between HTTPS and unencrypted HTTP. Such redirects occur when one of the following URLs is requested: * URLs in `layouts/alias.html`, e.g. in <https://git-scm.com/docs/> * Image URLs in <https://git-scm.com/application.min.css> * Endpoint URLs in <https://git-scm.com/sitemap.xml> Since these URLs have the `http` scheme, the client is supposed to send the new request unencrypted, only to be told by the server to use HTTPS again. Modern web browsers tend to stick with HTTPS in certain cases, so the downgrade may not always happen, but it's better to eliminate the redirects nevertheless, for security and performance reasons. Instead of trying to use relative URLs everywhere, this commit takes a simpler approach by using the `https` scheme in the base URL. One way to do this is to enable the "Enforce HTTPS" option in the GitHub Pages settings, but it's infeasible because the `git-scm.com` domain points to Cloudflare instead of GitHub. Therefore, this commit introduces a GitHub Actions variable, `EXTERNAL_HTTPS`, which can be set to true if HTTPS is provided by a third party, so that the URL scheme can be safely overridden. This also generalizes the special case of `git-scm.com` for any domain with a similar setup, allowing tests to be run more reliably in a uniform way. See-also: c22a1a5 (deploy(playwright): work around externally-enforced HTTPS, 2024-10-07) See-also: d4f88c1 (deploy: be more careful when auto-upgrading from HTTP -> HTTPS, 2024-10-07)
Since the download pages now live in `/downloads`, update the existing links to save an extra redirect.
I fear that this part is somewhat orthogonal to the purpose of this PR, and is caused by a misconfiguration of the site. When I direct my browser to https://github.com/git/git-scm.com/settings/pages, I see: The most important bit to which I would like to draw your attention, in particular, is this part:
That troubleshooting link has a couple of hints how to diagnose and fix this, but I have no access to the DNS records (neither do I want to, heck, I thought I was done over here in git-scm.com). Unless I am mistaken, this means that people who do control the domain git-scm.com, i.e. @peff and @ttaylorr, have to have a look at (and ideally fix) this part so that HTTPS is enforced on the GitHub side and the base URL is configured correctly to use https:// instead of http:// (currently, you can see that https://git-scm.com/ incorrectly links to http://git-scm.com/images/bg/body.jpg, for example, which is a consequence of this misconfiguration). |
Hmm. I initially thought that this GitHub hadn't rechecked GitHub did re-check those records, but the end result was the same. Looking through the rest of the troubleshooting documentation, it looks like this line is important:
While GitHub makes an exception for the the apex "www" (i.e. that having So unfortunately I don't think that it is currently possible to enforce HTTPS based on my understanding of those troubleshooting docs. 😞 |
Would GitHub's "enforce https" flag do anything anyway? Users visiting git-scm.com are terminating at Cloudflare, which is then talking to GitHub (well, GitHub's CDN) on the backend and proxying/caching. And Cloudflare does issue a redirect from http to https. So from the user's perspective, everything is always going to be https. And any links generated on the site should prefer https. I don't know anything about the Hugo setup, but naively I'd think that means the baseurl should be https (most links will just be relative, of course, but it looks like all of the scss stuff uses |
I also wonder if using Cloudflare is doing much these days. GitHub Pages are served off a CDN, too. So we are just stacking caches in front of caches (whereas when we originally started using Cloudflare, it was sitting in front of a Heroku dyno that was getting sorely hammered). |
I wondered the same when @dscho and I were redirecting the Cloudflare configuration to point at the new GitHub Pages site. TBH I think that we could probably drop it for similar reasons as the ones you point out, but it seemed like an extra change amid the already-large rewrite, so I punted on it then. It might be worth reevaluating, I dunno. |
Yes. It is responsible for As a consequence, as I mentioned before, resources like |
Thank you all for looking into this. While removing the Cloudflare caches is indeed a fix for the official site, I do want to mention that GitHub’s CDN (Fastly) can be less performant than Cloudflare in certain regions (which is also my personal experience), so there is a potential risk of inadvertently slowing down the site for some of the visitors.
I’m aware of this option, and this PR is an intentional workaround for it, in the belief that not everyone is able, or willing, to fix the “misconfiguration” and let GitHub handle their domain. Being able to configure |
Sure, but that is something that can be overridden. I.e., I think that we know something that GitHub Pages does not, which is that there is an extra proxy layer sitting between the user and the hugo build. |
We do want to enforce HTTPS. And we kind of do, but not "kind of" enough for GitHub's UI at https://github.com/git/git-scm.com/settings/pages to allow us to check the box called "Enforce HTTPS". The helpful comment next to the grayed out box says: [ ] **Enforce HTTPS** --- Unavailable for your site because your domain is not properly configured to support HTTPS ([`git-scm.com`](http://git-scm.com/)) --- [Troubleshooting custom domains](https://docs.github.com/articles/troubleshooting-custom-domains/#https-errors) HTTPS provides a layer of encryption that prevents others from snooping on or tampering with traffic to your site. When HTTPS is enforced, your site will only be served over HTTPS. [Learn more about securing your GitHub Pages site with HTTPS](https://docs.github.com/pages/getting-started-with-github-pages/securing-your-github-pages-site-with-https). As discussed in git#1953, it would appear to be outside our capabilities to fix this properly, therefore we are unfortunately stuck with hard-coding a work-around: Manually override the http:// URL produced by `actions/configure-pages` with the equivalent https:// one. Signed-off-by: Johannes Schindelin <[email protected]>
Correct, we could work around that problem. For example like this: #1956 The problem with that is: It is a work-around. And honestly, if we have that problem, so do others. If I was still working embedded in GitHub, i.e. with easy access to GitHub Pages engineers, I would want to help them fix this usability bug. So the best I can do is come up with a work-around. That is not the case for everybody involved in this here thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just left a couple of comments, mainly to demonstrate that I care about this issue and that I am grateful that you fixed it!
Sadly, there are errors: http: URLs
These are outside of our control, though, see e.g. the first hit and the second hit. It's a bit disappointing that these slipped through the cracks in the PR build, which passed without flagging anything. |
The new checks for HTTPS enforcement found a couple of problems: git#1953 (comment) Essentially, some (quite stale) translations of the ProGit book, as well as manual pages of older Git versions, use http://git-scm.com links. And those are basically outside of our control to fix properly (in particular the manual pages of older Git versions). So let's just work around this by manually replacing them. Signed-off-by: Johannes Schindelin <[email protected]>
After merging git#1953, as per https://github.com/git/git-scm.com/actions/runs/13539600621/job/37837557415#step:3:135167 the `deploy` workflow failed. This issue should have been found in a PR build, but was not. Let's make sure that similar issues will be found early in the future, before merging the respective PR. Signed-off-by: Johannes Schindelin <[email protected]>
The new checks for HTTPS enforcement found a couple of problems: git#1953 (comment) Essentially, some (quite stale) translations of the ProGit book, as well as manual pages of older Git versions, use http://git-scm.com links. And those are basically outside of our control to fix properly (in particular the manual pages of older Git versions). So let's just work around this by manually replacing them. Signed-off-by: Johannes Schindelin <[email protected]>
@b9a1 I've fixed these issues in #1957, please review. Another riddle (one which I was not able to solve): When I run |
That was my bad. Thanks a lot for fixing this, and my sincere apologies for the inconvenience.
Seems like Cloudflare caches to me. I have the expected 157 hits for |
Oh, I did not mean to criticize you @b9a1! Sorry if I came over like that.
Good point. We do have code to drop Cloudflare's caches, but apparently it does not do what it is supposed to do. |
There is specific code in git-scm.com's deploy workflow to open a ticket whenever broken links were detected, and to close such a ticket when no broken links were detected. However, as per lycheeverse/lychee-action#265 the way we checked for this was incorrect: `env.lychee_exit_code` had was correct, until lycheeverse/lychee-action#245 broke it by way of fixing another bug. This was the reason why the broken link that was found and reported in https://github.com/git/git-scm.com/actions/runs/13544529063/job/37852881418#step:3:135319 never made it into a GitHub issue, even if that had been the intention. For the record, I worked on a fix for this broken link and opened jnavila/git-manpages-l10n#131 to incorporate that fix. What prevented this broken link from being detected before above-mentioned deployment is the fact that before git#1953 was merged, the `git-scm.com` deployment used `http://` in the base URL, and hence the `--remap` option used in the deployment workflow mapped all http://git-scm.com links to local files, whereas now the `https://` base URL is used and https://git-scm.com links are mapped, including the offending one. In any case let's fix opening/closing the "broken link" issues. Signed-off-by: Johannes Schindelin <[email protected]>
There is specific code in git-scm.com's deploy workflow to open a ticket whenever broken links were detected, and to close such a ticket when no broken links were detected. However, as per lycheeverse/lychee-action#265 the way we checked for this was incorrect: `env.lychee_exit_code` had was correct, until lycheeverse/lychee-action#245 broke it by way of fixing another bug. This was the reason why the broken link that was found and reported in https://github.com/git/git-scm.com/actions/runs/13544529063/job/37852881418#step:3:135319 never made it into a GitHub issue, even if that had been the intention. For the record, I worked on a fix for this broken link and opened jnavila/git-manpages-l10n#131 to incorporate that fix. What prevented this broken link from being detected before above-mentioned deployment is the fact that before git#1953 was merged, the `git-scm.com` deployment used `http://` in the base URL, and hence the `--remap` option used in the deployment workflow mapped all http://git-scm.com links to local files, whereas now the `https://` base URL is used and https://git-scm.com links are mapped, including the offending one. In any case let's fix opening/closing the "broken link" issues. Signed-off-by: Johannes Schindelin <[email protected]>
Changes
EXTERNAL_HTTPS
will need to be (manually) set totrue
on this repository, either in the environment scope or in the repository scope. Forks of this repository may do the same based on instructions inARCHITECTURE.md
.--baseURL
option of Hugo for the officialgit-scm.com
site will be changed fromhttp://git-scm.com/
tohttps://git-scm.com/
(with only a different scheme).git-scm.com
.git-scm.com
will be tested with HTTPS, while other domains may be tested with HTTP if they don’t support HTTPS. This differs from the current strategy of using HTTP forgit-scm.com
and HTTPS for others.Context
Currently, clicking the most prominent download button on
https://git-scm.com/
causes three navigations:https://git-scm.com/download/win
, the hyperlink destination (assuming a Windows system), which is an alias to…http://git-scm.com/downloads/win
, with anhttp
scheme, resulting in a301 Moved Permanently
to…https://git-scm.com/downloads/win
, which is the canonical URL.This PR avoids both types of redundant redirects above (HTTPS to HTTP and
download
todownloads
) so the button (and some other links mentioned in the commit message) opens the canonical URL directly. Tests are also updated to accomondate for this change.In addition, this PR might serve as a better fix for #1898, which was supposed to be solved by #1899, by always running Playwright tests with HTTPS. However, that fix requires serveral conditions to work:
git-scm.com
was resolved to the GitHub Pages server in/etc/hosts
, other domains might be tested against a third‐party gateway, which needs to be able to speak HTTPS.https://git-scm.example/book
to redirect tohttps://git-scm.example/book/en/v2
, but this redirection depends on the base URL, so at least one of the following must be true:a. The base URL already has an
https
scheme.b. The tested server enforces HTTPS, so the final URL always says
https
even if the base URL doesn’t.The official domain (
git-scm.com
) didn’t meet condition 2, thus d4f88c1. The fork domain in #1898 (ttaylorr.com
), however, met all of 1, 2 and 3b, so it was fixed, but the issue remains for other domains.Conveniently, this PR lifts all the requirements above (respectively) by:
git-scm.com
domain.EXTERNAL_HTTPS=true
).In fact, the last change alone is sufficient to fix #1898, while the first two are kept for the original purpose of bypassing caches. This also removes the need to special‐case
git-scm.com
in the workflow, thereby simplifying the deployment logic.