Dealing with evil ads
Posted on January 10, 2021 with tags tech. See the previous or next posts.
Ads might be needed, but they don't need to be revolting
Background
I usually don’t mind ads, as not as they not very intrusive. I get that the current media model is basically ad-funded, and that unless I want to pay $1/month or so to 50 web sites, I have to accept ads, so I don’t run an ad-blocker.
Sure, sometimes are annoying (hey YT, mid-roll ads are borderline), but I’ve also seen many good ads, as in interesting or funny even. Well, I don’t think I ever bought anything as direct result from ads, so I don’t know how useful ads are for the companies, but hey, what do I care.
Except… there a few ad networks that run what I would say are basically revolting ads. Things I don’t want to ever accidentally see while eating, or things that are really make you go WTF? Maybe you know them, maybe you don’t, but I guess there are people who don’t know how to clean their ears, or people for whom a fast 7 day weight loss routine actually works.
Thankfully, most of the time I don’t browse sites which use this networks, but randomly they do “leak” to even sites I do browse. If I’m not very stressed already, I can ignore them, otherwise they really, really annoy me.
Case in point, I was on Slashdot, and because I was logged on and recently had mod points, the right side column had a check-box “disable ads”. That sidebar had some relatively meaningful ads, like a VPN subscription (not that I would use it, but it is a tech thing), or even a book about Kali Linux, etc. etc. So I click the “disable ads”, and the right column goes away. I scroll down happily, only to be met, at the bottom, by the “best way to clean your ear”, “the most 50 useless planes ever built” (which had a drawing of something that was for sure never ever built outside of in movies), “you won’t believe how this child actor looks today”, etc.
Solving the problem
The above really, really pissed me off, so I went to search “how to
block
Method 1: hosts
file
The hosts
file is reasonable as it is relatively cross-platform
(Linux and Windows and Mac, I think), but how the heck do you edit
hosts on your phone?
And furthermore, it has some significant downsides.
First, /etc/hosts
lists individual hosts, so for an entire ad
network, the example I had had two screens of host names. This is
really unmaintainable, since rotating host names, or having a
gazillion of them is trivial.
Second, it cannot return negative answers. I.e. you have to give each of those hosts a valid IPv4/IPv6, and have something either reply with 404 or another 4xx response, or not listen on port 80/443. Too annoying.
And finally, it’s a client-side solution, so one would have to replicate it across all clients in a home, and keep it in sync.
Method 2: ad-blockers
I dislike ad-blockers on principle, since they need wide permissions on all pages, but it is a recommended solution. However, to my surprise, one finds threads saying ad-blocker foo has whitelisted ad network bar, at which point you’re WTF? Why do I use an ad-blocker if they get paid by the lowest of the ad networks to show the ads?
And again, it’s a client-side solution, and one would have to deploy it across N clients, and keep them in sync, etc.
Method 3: HTTP proxy blocking
To my surprise, I didn’t find this mentioned in a quick internet search. Well, HTTP proxies have long gone the way of the dodo due to “HTTPs everywhere”, and while one can still use them even with HTTPS, it’s not that convenient:
- you need to tunnel all traffic through them, which might result in bottlenecks (especially for media playing/maybe video-conference/etc.).
- or even worse, there might be protocol issues/incompatibilities due to 100% tunneling.
- running a proxy opens up some potential security issues on the internal network, so you need to harden the proxy as well, and maintain it.
- you need to configure all clients to know about the proxy (via DHCP or manually), which might or might not work well, since it’s client-dependent.
- you can only block at CONNECT level (host name), and you have to build up regexes for the host name.
On the good side, the actual blocking configuration is centralised, and the only distributed configuration is pointing the clients through the proxy.
While I used to run a proxy back in HTTP times, the gains were significant back them (media elements caching, downloads caching, all with a slow pipe, etc.), but today is not worth it, so I’ve stopped and won’t bring a proxy back just for this.
Method 4: DNS resolver filtering
After thinking through all the options, I thought - hey, a caching/recursive DNS resolver is what most people with a local network run, right? How difficult is to block at resolver level?
… and oh my, it is so trivial, for some resolvers at least. And yes, I didn’t know about this a week ago 😅
Response Policy Zones
Now, of course, there is a standard for this, called Response Policy
Zone, and which
is supported across multiple resolvers. There are many tutorials on
how to use RPZs
to configure things, some of them quite detailed -
e.g. this
one,
or a simple/more straightforward one
here.
The upstream BIND documentation also explains things quite well here, so you can go that route as well. It looks a bit hairy for me thought, but it works, and since it is a standard, it can be more easily deployed.
There are many discussions on the internet about how to configure RPZs, how to not even resolve the names (if you’re going to return something explicitly/statically), etc. so there are docs, but again it seems a bit overdone.
Resolver hooks
There’s another way too, if your resolver allows scripting. For example, the PowerDNS resolver allow Lua scripting, and has a relatively simple API—at least, to me it looks way, way simpler than the RPZ equivalent.
After 20 minutes of reading the docs, I ended up with this, IMO
trivial, solution (in a file named e.g. rules.lua
):
ads = newDS()
ads:add({'evilads.com', 'evilads.well-known-cdn.com', 'moreads.net'})
function preresolve(dq)
if ads:check(dq.qname) then
dq.rcode = pdns.NXDOMAIN
return true;
end
return false;
end
… and that’s it. Well, enable it/load the file in the configuration, but nothing else. Syntax is pretty straightforward, matching by suffix here, and if you need more complex stuff, you can of course do it; it’s just Lua and a simple API.
I don’t see any immediate equivalent in Bind, so there’s that, but if you can use PowerDNS, then the above solution seems simple for simple cases, and could be extended if needed (not sure in which cases).
The only other thing one needs to do is to serve the local/custom resolver to all clients, whether desktop or mobile, and that’s it. DNS server is bread-and-butter in DHCP, so better support than proxy, and once the host name has been (mis)resolved, nothing is involved anymore in the communication path. True, your name server might get higher CPU usage, but for home network, this should not be a problem.
Can this filtering method (either RPZ or hooks) be worked around by ad networks? Sure, like anything. But changing the base domain is not fun. DNSSEC might break it (note Bind RPZ can be configure to ignore DNSSEC), but I’m more worried about DNS-over-HTTPS, which I thought initially it’s done for the user, but now I’m not so sure anymore. Not being in control even of your own DNS resolver seems… evil 😈, but what do I know.
Combined authoritative + recursive solution
This solution was provided by Guillem Jover, who uses
unbound
, which is a combined
authoritative name server and recursive resolver in one, and
dnsmasq
(which is even more
things, I think):
For my LANs I use unbound, and then block this kind of thing in
/etc/unbound/unbound.conf.d/block.conf
, with stuff like:server: local-zone: adsite.example.com refuse
But then for things that are mobile, and might get out of the LAN, such as laptops, I also block with dnsmasq in /etc/dnsmasq.d/block.conf, with stuff like:
address=/adsite.example.com/
I still use ublock-origin to block stuff at the browser level, though, for yet an extra layer of noise suppression. :)
Thanks for the info!
Happy browsing!
10 lines of Lua, and now for sure I’m going to get even fatter without the “this natural method will melt your belly fat in 7 days” information. Or I will just throw away banana peels without knowing what I could do with hem.
After a few days, I asked myself “but ads are not so bad, why did I…” and then realised that yes, ads are not so bad anymore. And Slashdot actually loads faster 😜
So, happy browsing!