Immediately Hitting A Snag: The case of the SSL mismatch
Who would've guessed that something as simple as a hello world post could encourage new content. (Not exciting content in any way, but still.)
As I finished that short writeup, I decided to put some extra things in place necessary for hosting. Before starting this blog, people visiting would get a near-blank error page from nginx, since I pointed the domain to a server but didn't set up any content to go with it. Since I decided to use it for content of some sort, some minor housekeeping had to be done.
Among other things, I wanted to set up my domain with Let's Encrypt so that people could access it with https
instead of just http
. It was easy enough: sudo certbot
ran well enough, the prompts were common sense, and everything seemed okay.
I loaded the certificate, transferred the blog contents over to the server, checked that everything was okay, and posted it to Bluesky.
Problems arise
Almost immediately, I got a response saying that I should set up Let's Encrypt on the site. Which, for obvious reasons, confused me.
I'd just done that, it worked properly, I even saw it locally. What was the problem?
You have some sort of domain mixup, the SSL cert i get is set up for wolfygoodness and not theliru.
— n666 (@n666.bsky.social) January 1, 2025 at 7:27 PM
This confused me even more.
A bit of background: at the time of writing, I have a few websites on the same server on which I'm hosting this blog. The way that I host them all is slightly different from the standard. I use nginx and set up a separate directory and server configuration for each of them. I then include the appropriate vhost.conf
into the main configuration. The main method seems to be fiddling with sites-enabled
and sites-available
directories, but I set the initial version of this server up a long time ago, and Archlinux didn't seem to want to play nicely with the sites-*
setup, so I just went with what made sense at the time. If anyone has a reason for me to look into modifying it to use it that way, I'll take a look, but I haven't had any many problems with this setup.
The main config had content organized like this:
include /srv/http/domain1/vhost.conf;
// some sites...
include /srv/http/wolfygoodness/vhost.conf;
// a few other sites...
include /srv/http/theliru.com/vhost.conf;
(Side note: I'd recommend not trying to figure out what domain wolfygoodness
is if you're on a work computer/device. It should be plainly obvious, though.)
The fact that the wolfy
domain was the domain being shown confused me. I use domain1
for a lot of things, and I was surprised that that wasn't the one that showed up first. It was even more confusing to me since wolfy
was smack dab in the middle of the domain list, not giving an obvious reason as to why that was the domain chosen for the SSL check.
I tried it with a few of my other devices. My phone showed everything working properly, my NAS (which I had to install a browser on) worked as expected, and a small NUC that I bought for some reason also had similar behaviour. I even tried checking on my Steam Deck for the hell of it; that worked okay on my end as well.
Frustrated, I asked a few people I know to help out by visiting the domain and seeing if anything popped up. Some people immediately said they're hitting problems. Others said that the page showed up just fine. After asking them for things like DNS info, browser versions (I initially suspected something like "the browser trying to access the domain is too old and doesn't support Let's Encrypt for some reason" may be a valid path of investigation, but that quickly got dropped since a lot of the browsers were modern), as well as a few other things trying to see why this error was intermittent.
Breakthrough
A lot of people tried offering suggestions, none of which managed to narrow down the issue, until someone mentioned something offhand as a debugging measure:
<[redacted]> Is it possible you're running multiple versions of the same site? One might have the old configuration.
<[redacted]> I get the following IP: 2600:[...]
:a7a0
< Liru> Wait
< Liru> IPv6?
For some reason, this immediately set off alarm bells in my head. My mental model regarding nginx and the setup was all in IPv4. Feeling like I was missing something, I decided to go through the vhost configurations to see if everything matched up.
My findings could be summarized in one line:
< Liru> ...FUCKING CERTBOT FUCKED UP
Smack dab in the middle of my wolfy
config was this innocuous line:
listen [::]:443 ssl ipv6only=on; # Managed by Certbot
Seeing that I copypasted most of my config options from domain1
, I didn't have that line in the wolfy
vhost.conf
file. Since it seemed to be the only IPv6 rule, that's likely why accessing the server defaulted it to that domain.
Trying to add that line into the blog domain configuration also gave an error saying that the port couldn't be bound. To fix that, I removed the ipv6only=on
part of the definition. I'm not sure if that's the best course of action, to be honest, but doing that allowed the blog to be accessible, as confirmed by multiple people.
But why, though?
After that, I tried looking into the reasoning behind why some people could access the domain before the fix, but others couldn't. The differing factor somehow had to be IPv6, so it had to do with reasons behind that.
I knew for a fact that the Framework laptop I currently use supports IPv6. Not by checking the Linux kernel for support or anything like that; it's because I distintly remember messing around with IPv6 addresses while helping to set up a friend's home server and seeing IPv6 addresses for several sites on my own laptop.
Checking the DNS records for this site seemed okay, since I added it relatively recently to my host's DNS records. Checking wolfygoodness
, though...
< Liru> ...wait, wolfygoodness has no ipv6 dns record
< Liru> Was I fucking saved from this because my ISP is incompetent and doesn't support IPv6?
Foreshadowing is a literary device in which a writer gives a subtle hint of what is to come later in the story.
Looking into things a bit more, I found that all the other domains have no IPv6 address in the DNS records. This could be an artifact of age, as this domain is relatively new, and the previous ones go back a long time. It could be that the record generator didn't add IPv6, or that the host I use didn't support them at the time.
I was still kind of confused, especially since I remembered that a lot of the people that helped to check my site also checked some of the other domains occasionally. Those domains didn't pop up the warning for them. Why would it only affect them?
Why didn't it affect me, when I had IPv6 as well?
...did you ever have a moment when you wrote something as a joke and it didn't pop into your head as a serious thought until a few minutes later?
My ISP is incompetent and doesn't support IPv6
Immediately, I opened a terminal window.
liru@laptop $ curl https://theliru.com -vI6
* Trying 2600:[...]:a7a0:443...
* Immediate connect fail for 2600:[...]:a7a0: Network is unreachable
* Closing connection 0
curl: (7) Couldn't connect to server
Checking another server I have...
[liru@server ~]$ curl https://theliru.com -I6
HTTP/1.1 200 OK
Server: nginx/1.26.2
Date: [...]
Content-Type: text/html
Connection: keep-alive
X-Clacks-Overhead: GNU Terry Pratchett
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
My server has IPv6 connectivity, but my home network and my phone service provider don't.
Looking it up online, it seems to be a bit of a theme. While a few ISPs here support it, it turns out that one of the biggest ones in Canada (the one I'm with) doesn't, and they have no plans to support it. Calling and asking about supporting it gives me a standard corporate "thank you for your feedback" response, which unfortunately doesn't bode well.
What I'm assuming happened
When someone on IPv4 requested the site, the flow was relatively smooth. The browser got the IPv4 (and IPv6?) address of the server, chose the IPv4 address, requested info from the server, the server checked the IP and the requested domain, and forwarded on the proper content.
When someone with an IPv6 address requested it, though, there was a hitch: The IPv4 and IPv6 address of the server was given, the IPv6 address was chosen to connect, the server checked the IP and noticed that only one domain supported it, and forwarded the request to that domain configuration. Naturally, that domain wasn't the one that the browser was supposed to access, and that threw a wrench into the plans.
This also explained why all my devices showed no issue; if the problem was with the network connection, all of them would have the exact same behaviour in this case.
I'm slightly curious if deleting the IPv6 address from this domain's DNS records would have fixed things, to be honest. My assumption is that it would, but I'm not sure.
One of the bigger concerns is that I'm not sure why or when Certbot added that line into the vhost configuration. I wouldn't be surprised if there were other issues online talking about it, but I feel that I duct taped it to a point that I'm okay with it, so I'll leave it alone for now.
In a related manner, though...
It’s not DNS
There’s no way it’s DNS
It was DNS-- Dave Chappelle, according to ChatGPT
I've got to say, investigating this definitely beat the alternative option for New Year's Day, but that's a story for another post, since "I could have died" is a whole separate thing out of which people seem to make a big deal. Not sure when that may come.