AT&T’s random DSL configuration changes begin!
I’ve been using AT&T (formerly BellSouth) for my DSL service for a few months now. Sunday night, December 23rd, my DSL connection went down at 8:03 PM. Of course, I didn’t notice until a bit after 10 PM. How do I know so precisely when it happened? I run pfSense on my Soekris net4801 (router), and it has a Link Quality graph that I can call up any time I want to see what sort of latency I’m seeing on my next hop, or if I’m experiencing packet loss to said next hop.
I’m a believer in running PPPoE directly from my router (which means running the DSL modem in bridge mode). I feel like I have a better view into what’s happening this way. Perhaps I’m just a control freak.
My first troubleshooting measure was to power fail the modem and reboot my router. What could it hurt, as I was already down? Didn’t help.
So, I moved the cable from my Mac to my DSL modem so I could get to the config pages and reconfigured my DSL modem to perform PPPoE. I did that, and my Internet connection was up and running.
Thinking that perhaps the outage that I was experiencing had just ended, I reversed the process, setting it back to bridge mode, hooking my PfSense box back up, and waiting a bit for PPPoE to establish itself. I waited, but it didn’t come back up.
I reversed the process again, going back to PPPoE mode on the DSL modem, and I was up and running again, immediately.
Is this some sort of plot by AT&T to ensure that only AT&T hardware can connect via PPPoE, I wondered… So, I tested that theory… After changing my DSL modem BACK to bridge mode, I left my Mac attached to it, and set it up for PPPoE, ensuring that I copied the username/password directly out of PfSense (which had been working fine until 08:03 PM). Up and running! So, my Mac could do PPPoE direct. I ran a few speed tests and all seemed fine.
I moved the PPPoE back over to PfSense, but there was still no joy.
Stumped at this odd situation, I figured that instead of wasting any more time that night (it was about midnight by this point), I’d post my experience on the PfSense forum to see if anyone else had ran across this issue. I ended up running with PPPoE on the DSL modem with PfSense set for DHCP mode.
The next morning, no reply, so I simply added a few lines regarding the fact that my Link Quality graph wasn’t much use, since it would now be monitoring a next hop of my DSL modem. (I don’t expect my patch cords to go bad very frequently.) By that night, the lead developer replied, pointing me to a post that allowed me to set the address that gets monitored by the Link Quality graphs, enabling me to continue running in DHCP mode, but with a useful graph.
Many days go by and I happen across the forum again, only to see that numerous others are now experiencing the very same PPPoE issue that I had experienced. Other forum members start digging into the problem, but things seem slow going, so I investigate their findings and start looking at source code…
It turned out that AT&T had modified their configuration to only hand out a single DNS server. MPD is the PPPoE client in PfSense and it contains a bug which causes MPD to believe that the IP Address field was being rejected by the PPPoE server, when it was rejecting MPD’s request for a secondary DNS server.
After a bit of back-and-forth on what we should do, the lead developer pipes up again about a hidden configuration option which configures MPD to not request a secondary DNS server.
To see the specifics on the hidden config option, go here.
If MPD doesn’t ask for a secondary DNS server, it can’t get rejected, so the “buggy” code never gets hit. That turned out to be a good short-term fix for this issue, as I’m up and running now on PPPoE again.
Of course, no one at AT&T could even admit that some configuration change actually had happened.
Note that it looks as if AT&T is still rolling this change out, as people keep coming forward every few days on the PfSense forum stating that they have also ran into this issue.
I’m guessing that this has something to do with their horrific DNS server situation. The DNS servers that AT&T inherited from BellSouth have been the culprits behind numerous issues that my employer has encountered. If your machine is configured with multiple DNS servers, it will communicate to all of them to resolve every DNS request. It takes the first one to reply, and runs with that address. If each customer is only handed one DNS server, requests from each customer won’t be hitting a pair of servers, but rather a single server, so each server should have less traffic if they split it up properly. This is a short-sighted fix, though, since the end result will be that the one DNS server that a customer is handed is suddenly much more important to be running all the time. In this case, a failure of one DNS server won’t partially effect a really major number of customers, but rather would have the effect of completely taking down the internet connection (essentially) for all the customers unlucky enough to talk to that DNS server.
Me? I use OpenDNS.com, so AT&T can do whatever they want with their crummy DNS servers.
Entry filed under: Networking. Tags: .