Filter The Web With squidGuard

by Carla Schroder

If you're facing the daunting task of filtering web content, squidGuard presents a cost-free solution with an open database of blocked sites.

Putting up a fence to keep out the bad parts of the Internet is one of those thankless, heroically crazy jobs required of network administrators. It's not like TV, with well-defined channels to manage. Context is everything, and no one has yet written a filter that can differentiate between a site with health and medical information, and a redhot triple X porn site that uses similar terminology. Think of how many words are double entendres- a person can hardly say anything anymore. Just naming fried chicken parts can get a page blocked.

Even more difficult are Web filters that try to screen for 'bad attitudes': hate, intolerance, racism- lotsa luck. No regular expression can differentiate between a news report of a hate crime, and a site that promotes such.

Commercial Web filtering software is expensive and secretive. With the exception of Net Nanny, blocked URLs and keywords are hidden underneath a layer of encryption- the user is not allowed to review them. You may never know what you're missing. Server-level licensing is generally per-seat, and requires a separate proxy server, adding to the cost. squidGuard is free, completely open right down to the source code, and runs on the free proxy server Squid, on Linux or *BSD, also free.

squidGuard has no hidden agenda. The default blacklists, printed in plain text, contain a disclaimer printed right at the top. For example, from /var/squidGuard/blacklists/drugs/urls:

# Don't blame us if there are mistakes, but please report errors with  
# This list is entirely a product of a dumb robot (squidGuardRobot-2.3.6).  
# This list was compiled from 48 link sources and 2876 links,  
# This list was compiled in 0:11:02 on 2002.01.26 00:35:41.  
# We strongly recommend that you review the lists before using them!  
# of which 2473 tested successfully.  
# the online tool at http://www.squidguard.org/blacklist/ 

If you are truly suspicious, which is an admirable trait, view the associated .db file in a hex editor to see if the text file is telling the truth. See? No secrets.

First you need Squid, the excellent Unix Web proxy cache. Squid comes in all major Linux distributions, or get it from squid.org. RPMs are on the Uptime RPM Archive. Squid must be configured and running before squidGuard will work.

The easy way to get squidGuard up and running is to install from RPM. Two good RPMs exist: one by Oliver Pitzeier, on the Uptime RPM Archive, and one from the excellent Eric Harrison of the K-12 Linux Terminal Server project. I've installed and run both of them on Red Hat 7.2 and 7.3. Warning: they have the same name, squidGuard-1.2.0-3.i386.rpm, but there are significant differences between the two RPMs. Of course building from source guarantees it will work on any system.

squidGuard's installation page says version 2.x of the Berkeley DB library is required. However, the changelog reports that support for version 3.2 was added in March 2001, so it's safe to say the installation page has not been updated in a while. Both RPMs call for version 3.2, on your system it appears as libdb-3.2.so. No problem with having both if you want to cover all the bases, squidGuard will find the correct one at installation.

Home User, Business User
squidGuard is equally useful at home or in the workplace. Rather than installing standalone products on every machine on your home network, head 'em off at the pass. Some of squidGuard's nicer features:

  • blazingly fast
  • fine-grained controls: configure individual users and groups
  • redirects to the URLs of your choice
  • filter on URLs or domain names
  • block banners (redirect to empty .png)
  • define access rules by time of day and date
  • define access rules for different user groups

It does not filter on page content, or on embedded scripting languages like JavaScript or VBscript.

It is helpful to download both RPMs just to examine the configuration files. # comments out a line, {} enclose a group declaration, - defines a range. Don't use reserved words, see the documentation for a list. There's some inconsistencies between where the documentation says files should be, and where they actually exist on your system. Both RPMs put documentation in /usr/share/doc/squidGuard, and the main configuration file is in /etc/squid/squidGuard.conf. Building from source puts files where the docs on squidguard.org says they should be.

It is best to explicitly declare even the defaults, for the sake of us feeble humans, as this example of squidGuard.conf shows:

logdir /var/log/squidGuard #defines where logfiles are
dbhome /var/squidGuard/blacklists/ #defines where blacklists are

One approach is to accept all of the default blacklists, and refine them. The other is to start from a clean slate, and add restrictions as you think them up. Here is the absolute minimum config file:

logdir /var/log/squidGuard  

acl {
   default {
   pass all

This restricts nothing, it is like not using squidGuard at all. acl = access control list.

To create a new database file of blocked URLs or domains, or a file of allowed sites, create first a plain text file containing your list. Use the same format as the default squidGuard text lists: one item per line, plain ASCII text. To convert it to a .db file, run this command:

# squidGuard -C filename

That's what the Berkeley DB library is for.

Let's call our blacklist files verboten/domains and verboten/urls.

Now edit squidGuard.conf:

logdir /var/log/squidGuard
dbhome /var/squidGuard/blacklists/

dest blockedstuff {
   log verboten
   domainlist verboten/domains
   urllist verboten/urls
   redirect www.bratgrrl.com

acl {
   default {
    pass !verboten all
    redirect 302:http://www.bratgrrl.com

dest = defines a category. squidGuard wouldn't care if you lumped everything into one big file, organizing into categories makes life easier for the overworked admin. ! means don't pass. Redirect sends requests for blocked sites or URLs to the page of your choice. bratgrrl.com is a fine choice, though a custom page may be more suitable. Some businesses like to use scary warnings in large type:


Use your imagination, for home use post a picture of your kids grounded for life.

# killall -HUP squid

tells Squid to re-read its configuration file, /etc/squid/squidGuard.conf, and effect the changes.

Eric Harrison's version automatically updates its blacklists nightly, see the MESD page for details.

As with all things Linux, the more you know about scripting, the more things make sense, and the more power at your disposal. squidGuard also supports regular expressions, here's a sample for blocking ads:


The squidGuard documentation is quite thorough, hopefully this will get you past the more common pitfalls. If you'd like to see a Squid tutorial, drop me a line, it's a great tool.


» See All Articles by Columnist Carla Shroder

This article was originally published on Tuesday Jun 25th 2002