How to Block Annoying Web Advertising for Free:
This information is relevent to Windows based PCs. Macs and Unix based machines operate on the same principles, but the details may vary.
Contents:
- The hosts file and domain lookups
- Editing the hosts file
- Finding what servers to block
- Making it pretty
- Screwing with the hackers
Every computer connected to the Internet or some other network is identified by an IP address. This is how computers find one another. People are generally not interested in a computer's IP address, they are more concerned with words that make sense to them. For example, the words "rateyourmusic.com" make more sense to a human than 216.235.149.93 (that is the IP address of rateyourmusic's main web server).
When you type http://www.rateyourmusic.com into your web browser, your computer needs to know where it can find the rateyourmusic web server. So the first place it looks is the "hosts" file. If it does not find "rateyourmusic.com" in the hosts file then it will move up the chain and ask another computer if it knows where rateyourmusic is (this is usually your ISP). Your ISP will then look up its own form of a "hosts file" and see if it knows where rateyourmusic is, and so on.
But because you have your own hosts file on your computer, you have first say about where you go when the browser asks for a webpage. My hosts file is located in the C:\WINNT\system32\etc\drivers\ folder of my Windows 2000 machine. It is a hidden file and pretty well protected so you must have administrator privledges to access it. A hosts file is a simple text file of nothing but IP addresses and domains, and looks something like this:
127.0.0.1 localhost
127.0.0.1 1000files.com
127.0.0.1 101order.com
216.235.149.93 rateyourmusic.com
It simply matches an IP address to a domain, one per line of the text file. Using the above example, when your computer looks for rateyourmusic.com, it will find out the IP it needs is 216.235.149.93, and the browser goes directly to that address without having your ISP look up that information.
Important: The IP address 127.0.0.1 is a reserved IP. This is how every computer "talks to itself". Most hosts files when you first access them will only have127.0.0.1 localhost
in them. This line should never be modified.
So what's the deal with these two lines?127.0.0.1 www.1000files.com
127.0.0.1 101order.com
This is how the blocking works. These two suspicious ad servers (1000files.com and 1010order.com) are blocked from the outside world of the Internet because your hosts file is mapping those servers to your own machine. Any file requested from those two domains will never reach your computer because your computer is asking itself for it. Pretty nifty, eh?
Now that you know the secret rip open your text editor and get to it. Make a backup copy before you start editing! The spacing between the IP address and the domain does not really matter, as long as some whitespace character is between them.
I actually put a shortcut to my text editor and hosts file right on my desktop for easy access to it: "C:\Utilities\TextPad 4\TextPad.exe" C:\WINNT\system32\drivers\etc\hosts
There are many sites on the Internet that provide pre-made block lists of known adservers. A really good one is at pgl.yoyo.org. You can use these lists to cut and paste into your existing hosts file or to completely replace it (make a backup first!).
So you visit a website and one of those really annoying flashing, jiggling, music playing adbanners appears telling you that you're their one millionth visitor and have won a rubber duck. What do you do now?
Start by right-clicking on the banner itself and checking the image properties. Locate the address of the image, and attempt to extract the domain from that url. Usually this is everything after "http://" and before the next "/". Example, the domain you want to block from this url: http://www.consumersearch.com/www/csgraphics/gntoday468x60.gif is "www.consumersearch.com".
Be careful when adding domains to your hosts block, sometimes you may accidentally block yourself out of a site you actually want access to when all you wanted to do was block an image. There isn't much you can do if a domain you want to visit hosts the image banners on the same server.
Sometimes it's not so easy figuring out what domain to put in the hosts file. Sometimes an actual domain does not start with "www" even though it may be shown in the url. In this url: http://view.atdmt.com/NYC/view/msnnkcha02400098nyc/direct/01/ you must block "view.atdmt.com", not just "atdmt.com". (Just to be sure I always block both.) This makes for long lists of blocked servers, as advertisers constantly try to rename their servers with obscure prefixes.
If it's a flash animation banner, you've got some work to do. You have to open the source code of the offending page and search through it to find the domain hosting the shockwave ad. Not easy, but worth it because there ain't many shockwave servers. (I'm just making that up, I don't really know for sure.)
If all goes well you've started to see little red 'X's all over your pages. Which is fine if it gets rid off all the flashing and shaking but leaves pages looking pretty nasty. Luckily for me, I had an extra web server lying about (Linux web server that is, this may not work with a Windows based server, and it must not be serving web pages already, or stuff will break as you will see later on). On that web server's home directory I placed a file called ".htaccess" (That's "dot htaccess", not a typo). This file is another text file that looks like this:ErrorDocument 404 http://192.168.255.255/
RedirectMatch (.*)\.gif$ http://www.example.com/~username//blank.gif
RedirectMatch (.*)\.jpg$ http://www.example.com/~username/webcam.jpg
The first line of this file tells the webserver to redirect all "file not found" errors to the home directory of itself (192.168.255.255 is an example of a LAN IP, the web server does not have to be connected to the Internet). Since nothing is on the webserver except index.html, just about every request gives an error, and the home page is always shown.
The remaining lines of the .htaccess file tells the webserver to redirect all requests for images to blank.gif or my webcam (or any other image anywhere that I want). Ideally what this should be is a 1x1 pixel transparent gif, to conserve bandwidth and make the blocked ads appear as a solid colour on the rendered page. The images must be on a different server or you will put everything into an endless loop.
Note: If you are using an older Red Hat Linux Apache server, you must also edit access.conf (etc/httpd/conf/access.conf). Find the entry for AllowOverride
and modify it so that it saysAllowOverride All
| Here's a screenshot of an unblocked page: |  |
| and the same page with blocking turned on: |  |
Update 2004-05-20: I copy .htaccess to my cgi-bin directory as well. I'm not sure it it will work but it may clean up some ugliness when a referer asks for their cgi programs.
Update 2004-05-21: My .htaccess file is now working much better after some modifications and looks something like this:errordocument 400 http://www.fbi.gov/
errordocument 404 http://192.168.255.255/
errordocument 405 http://www.cia.gov/
errordocument 414 http://www.albinoblacksheep.com/flash/you.html/
RedirectMatch 301 \.gif$ http://www.example.com/webcam.jpg
RedirectMatch 301 \.jpg$ http://www.example.com/webcam.jpg
RedirectMatch 301 .*banner.* http://www.example.com/aria_giovanni.jpg
RedirectMatch 301 ^image http://www.example.com/blank.gif
The breakdown is as follows: See Screwing with the hackers for the 400, 405, and 414 additions. The string \.gif$ tells redirect to search a request with .gif at the end of the line (\. must be used to search for ".", $ indicates end of line) and replace it with my webcam image. Same for all .jpg requests. .*banner.* searches for requests with the word banner anywhere (.*) in them, and replaces those with a jpeg of my favourite girl [sweeeeet...]. ^image looks for the text image at the start (^) of a request (good for those qksvr images with no filename extensions) and replaces those with blank.gif.
Several times a weeks somebody tries to hack my web server. I can see it in the access logs as requests for strings that exploit Windows vulnerabilites. A lot of those requests result in 400 (Bad Request), 404 (File Not Found), 405 (Method Not Allowed), or 414 (URL Too Long) errors. The 404 is already redirected, and as an experiment, I begin redirecting the other errors back to the index also, and watch the logs to see what happens...Stay Tuned...
The hackers don't really seem too intersted in what I've done. I was actually hoping that the redirect request would throw their hackbots into a loop: "Can I have this file? What? I CAN have that file? Then let me have it! That's not it! I'll try again!" I'll keep messing with it, but just for shits & giggles I redirected the hack attempts (in the example above) to some nifty places :)
Stuff to figure out and/or post later...
- How to have the log file ignore the redirects?
- The quick way turn turn blocking on and off.
- Get "Not Found" error when ad requests something from cgi directory
- Figure out how to get framed page banner to inherit parent window background (body style="background-color: transparent")?
Home
Updated 2004-05-21