EFF/Finkelstein Censorware White Paper #1
for Natl. Research Council Project on Tools and Strategies for Protecting Kids from Pornography
and Their Applicability to Other Inappropriate Internet Content
Title: "Blacklisting Bytes"
Co-authors: Seth Finkelstein, Consulting Programmer; Lee Tien, Senior Staff Attorney, EFF
EFF's thesis is simple: The quest for a technical solution to the alleged problem of minors' access to "harmful" material on the Internet is both misguided and dangerous to civil liberties. While we don't want to overstate our concerns, we believe that it's impossible to prevent minors from accessing large amounts of material that is accessible to adults on the Internet. Moreover, we believe that attempting to do so will build into the Internet mechanisms that can and will be used for other types of censorship. Although our discussion will focus on censorware, Prof. Lawrence Lessig has persuasively analyzed the architecture of "filtering" in terms that place censorware and ratings-based systems on the same spectrum. 
Criticism of censorware isn't new. Many have argued that censorware is caught on the horns of a dilemma: if it blocks too little, it's ineffective and therefore unconstitutional; if it's effective, it blocks too much and again is unconstitutional. This White Paper will highlight what we believe is a novel aspect of the dilemma. For censorware to effectively block minors' access to "harmful" material, it must block material that isn't itself "harmful." For instance, the censorware product SmartFilter blacklists two broad classes of websites that publish no "harmful" content: privacy/anonymity service sites and language-translation services.
We don't use the term "censorware" instead of "filter" lightly or for partisan reasons. The term "filter" implies that impurities are extracted, leaving a purified result. This frames the issues in terms of content alone. We believe that "filter" euphemistically hides the real, architectural issue, control of people: censorware is about an authority preventing those under its control from reading forbidden information.
That privacy, anonymity, and language-translation sites are blacklisted illustrates our point. Why are they blacklisted? Because they offer capabilities that can be used to escape the control-system. There's nothing obscene or pornographic about language translators or a web relay that shields reader identity. These are useful services. But because they let people read forbidden material, they must be blacklisted.  More generally, the debate over censorware is one example of the larger debate over architecture and social control.
I. The analytical structure of the censorware debate
The inherent difficulty of effective, constitutional censorware is, we believe, both logical and practical. Ordinary or general censorship tries to stop anyone from publishing or receiving certain types of speech. Censorware attempts something harder: preventing minors from getting material that lawfully can be published to and accessed by many American adults. However categorized, government cannot constitutionally eliminate such speech from the Internet, because government cannot reduce the adult population to reading only what children may read. 
A. Toxicity and the hermetic seal
Much of the debate regarding minors, sex, and censorware has taken place between two vastly different and thoroughly incompatible theories. Civil-liberties advocates frequently espouse what might be dubbed the control-rights theory, which is concerned with determining whether, according to some ideology, politics, or philosophy, some person or organization has the legal right to exercise control over another in a certain context. As a paper submitted to the Congressional COPA Commission argued:
The reasoning of many censorship advocates is dramatically different.
"[T]he decision by a third party that a person may not use a computer to access certain content from the Internet demands some sort of justification. The burden should be on the filterer to justify the denial of another persons's access. The most plausible justifications for restricting access are that the third party owns the computer or that the third party has a relation of legitimate authority over the user." 
The American Family Online page  states:
In short, "pornography" is toxic material. Not even a moment's viewing is safe. This idea is reflected in official language, like the Children's Internet Protection Act or material "harmful to minors." This toxic-material theory isn't concerned with an involved determination regarding justification, with who/what/where. It's focused on exposure of anyone, any time, anywhere.
"CAUTION: This is not to say we want you to go looking for trouble. Pornography is dangerous, and viewing it (even for a moment) can set off a terrible chain of events."
To someone who believes in the toxic-material theory, that pornography is dangerous, telling them that they should accept the constraints of the control-rights theory is nonsensical. The subject can't ever be allowed out of the blinder-box. Not at a school, not at a library, not anywhere. Censorware may not work well, but they'll take it for what it does. Indeed, the ineffectiveness of censorware is an argument for stronger legal regulation, as Morality in Media told the Congressional COPA Commission. 
B. Censorware as control
Censorware isn't a "filter"; it is software designed and optimized for use by an authority to prevent another person from sending or receiving information. The word "filter" implies a model whereby impurities are extracted, yielding a purified result. But a focus on content is both distracting and misleading. The basic aim is control of people. The architectural issue is how an authority can prevent those under its control from reading forbidden information.
There are many arguments about such control, with ideological positions usually depending on whether the authority relationship is parent-child, employer-employee, or government-citizen. But that's a debate of philosophy, not technology. The technical requirements for instituting such control are independent of views of its moral correctness.
What do we mean by "architectural control" or "architecture of censorship"? The general point is that the design and deployment of technical systems promotes and even embodies norms. Both censorware and technologically implemented ratings systems, we believe, promote a norm of censorship.
Law, of course, expresses and enforces norms as well. But we usually think of law as rules that impose duties, backed by sanctions. From the architectural perspective, rules are only part of the picture. Suppose that in a neighborhood, people use coin payphones so that their calls can't be traced to their home phones. The coin phones are then replaced with credit-card payphones. Someone eliminated a resource with the feature of anonymity. That act may have been pursuant to a rule, but no rule was applied to the people who no longer can make anonymous phone calls.
In short, architectural regulation structures behavioral settings, making some acts easier and others harder. Part of the difference between rules and architecture is that architectural norms are often embedded in equipment or its arrangement, and can affect us directly without our being aware; often, architecture's effects are significant precisely because we're oblivious to them. An architected setting may appear as a fait accompli, as conditions complied with by default rather than as rules to be followed or disobeyed.
The concept, we think, operates at two different levels.  The first, more obvious level has two main dimensions: transparency and extensibility. Prof. Lessig criticizes ratings systems on both grounds: "Filtering can occur at any level in the distributional chain -- the user, the company through which the user gains access, the ISP, or even the jurisdiction within which the user lives. . . . Filtering in an architecture like PICS can be invisible, and indeed, in some of its implementations invisibility is part of its design."  Furthermore, "[t]he filtering system can expand as broadly as the users want."  We agree that these aspects of such systems are a grave threat to free speech.
The second level of architecture is less obvious. Our initial premise is that law and public deliberation are integrally related. The meaning and wisdom of legal rules are a matter of public debate. Because architecture operates on the conditions of acts rather than on acts, it affects public discourse differently than do sanction-backed rules. Architecture tends to weaken public discourse and thus collective choice because we don't see what's happening
This is especially obvious in enforcement. Rules mean little if they're not enforced, but enforcement is complex human activity. Violations must be detected; authorities must learn of them; authorities must then do something, ranging from ignoring it to taking it to court. These decision-makers generally possess significant discretion, and how they enforce rules is part of what rules mean. Second, enforcement requires resources usually allocated as part of a more-or-less public budgetary process. Third, enforcement often occurs in public. In short, enforcing rules involves many people making decisions, often in plain view. As a result, we get feedback about what our rules really mean, which promotes public debate.
Architectural enforcement arises from being in the architected setting. Once payphones are removed, people simply can't make untraceable calls. They may not even perceive regulation. Architectural regulation eliminates many of the ways that people can modulate the effects of a rule during enforcement and many of our normal feedback loops. If public deliberation is important to law, the surreptitious enactment and enforcement of norms via architecture should give us pause.
These problems are magnified for complex new technologies like the Internet. First , most of us don't understand
how the Internet works, so we don't perceive what's being done to us. Perceiving that a setting has been architected to regulate us usually requires knowing what it does, and how it could otherwise have been designed. Perceiving architecture as harmful requires knowing why those lost options mattered. Second, as Prof. Lessig noted, there's a multiple actor problem associated with the many intermediaries involved in Internet activity. We don't know who is doing what to us. Third, there are network effects: as a standard takes hold, compatibility becomes more important. Fourth, technological change often destabilizes norms themselves; we're not sure if our old norms still apply. Caller ID changed the way we think about privacy in home phone numbers. But because telephones are a well-known technology, and the phone companies actively marketed the new feature, we had debate over blocking options. 
We believe that architectural censorship is harmful to free speech norms. It's plausible to hypothesize a "critical mass" model in which expectations "depend on how many are behaving a particular way, or how much they are behaving that way."  In this model, what game theorists call "common knowledge"  is crucial-- to reach critical mass, people need to know a lot about what others do. To develop public norms about censorship, we at least need to know a lot about it and how other people feel about it.  The invisibility of architectural censorship obstructs the production of common knowledge about its flaws, while its implementation probably cultivates censorship. 
Thus, while technological solutions seem to promote individual choice, their architectural implementation weakens collective choice.
C. Transparency and First Amendment principles
EFF's concern about architectural control leads us to enunciate two principles or values that should govern censorware and any proposed technological solution with the same objective.
Any technological solution must operate and be implemented transparently. End-users must know what censorware does, when, and why. For example, censorware shouldn't deny access to sites without expressly stating that access was denied. Not displaying a forbidden site as though it did not exist or merely displaying a generic error messages violates this principle.
Censorware should also make clear the criteria or categories under which information was blocked. It should show where the blocking occurred, whether at the user's own browser or somewhere upstream, like at an OSP, intranet, or proxy-server. More generally, all information relevant to user choice should be publicly available. Such information includes: blacklists of banned sites or words; rules or other criteria for rating banned sites; the formulas or algorithms for applying the rules and ratings. We also think that transparency requires user ability to alter default configurations.
The legal justification for transparency is the right to receive information from willing speakers, which government may not unduly burden.  Transparency is needed for informed user choice. The deeper reason, however, is the need to counteract the weakening of collective choice processes caused by invisible or opaque architectural censorship. We need common knowledge.
Censorware raises enormous privacy and anonymity issues as well. The very process of browsing both generates a list of visited sites at the client end and passes information about the client to visited sites. Censorware exacerbates matters by paying special attention to "bad" sites. When censorware is implemented in a hierarchical environment, it's likely that browsing will be monitored.
Second, proposed alternatives to censorware often rely on some sort of identification architecture, like age verification, Internet IDs, and so on. It's often argued that these alternatives are easily evaded. Our point, however, is that identification architectures enable greater social control.  As the sociologist Erving Goffman noted, identity differentiates people, and "[a]round this means of differentiation a single continuous record of social facts can be attached, entangled, like candy floss, becoming then the sticky substance to which still other biographical facts can be attached." [17 ] The First Amendment, however, clearly protects the right to speak anonymously.  If so, then the right to read anonymously should be even more clearly protected.  For instance, nearly all states protect the identities of library patrons. 
II. The reality of censorware
A. A quick introduction to the fundamentals of censorware: Blacklisting by name, blacklisting by address, blacklisting by word
Despite the mystique of the computer, almost all censorware operates very simplistically: they compare URLs (host and paths) against a blacklist. Sometimes the blacklist is local to the machine (client implementation). Sometimes it's stored remotely on a proxy server (server implementation). The client-side programs (e.g., CyberSitter, NetNanny) are typically home-based products, while the server programs (e.g., Bess, SmartFilter, WebSense, ) are used by large organizations. Some programs have both client and server versions (e.g., CyberPatrol).
When a person types in a URL indicating material they wish to read, the censorware examines various parts of the URL against its internal blacklist to see if the URL is forbidden. Take the following URL as an example:
Typically, censorware first checks the host, here <www.eff.org>, in two different ways:
1. By name - the name of the host is in the blacklist
2. By address - the IP address of the host in the blacklist
In the case of matching by-name, the program would search the blacklist to find entries matching the string <www.eff.org>. For matching by-address, the program would first determine the IP address associated with <www.eff.org>, which is 184.108.40.206, and see if there are entries matching that address. This is an important distinction, because host names and addresses can be quite different. A host can have many equivalent names, and also multiple IP addresses. Worse, different hosts can all have the same IP address ("virtual hosting").
If the host is found on the blacklist (either by-name or by-address), then the program looks to see how extensively it should be banned. Conceptually, this is just how much of the URL is on the blacklist. Items can range from :
1. Blacklisting the whole domain - everything on http://www.eff.org
2. Blacklisting a directory on the site - everything below http://www.eff.org/Censorship/
3. Blacklisting a particular file on the site - http://www.eff.org/Censorship/Internet_censorship_bills/2000/20001222_eff_hr4577_statement.html
We emphasize that there is no deep artificial intelligence here. It's merely looking up a host and path, and deciding if they match an entry on a huge blacklist.
Note that blacklists which work by-name typically also contain numeric representations of the IP address of the most popular sites. This is completely unlike blacklisting by-address. The typical numeric representation of an IP address (e.g., 220.127.116.11) is just one way of many of connecting to that IP address. For example, the URLs http://www.eff.org, http://www.eff.net, and http://www.eff.com, will all reach the same host at the same IP address. The URL http://18.104.22.168 is similar. A blacklist that worked by-name, but only banned http://www.eff.org and http://22.214.171.124, would ignore http://www.eff.net or http://www.eff.com, even though all four eventually reach the same IP address. A blacklist that worked by-address would ban all four, but would also ban any other domain that has the misfortune to share that address via virtual hosting.
This simple distinction, by-name or by-address, is sometimes described in very confusingly . The X-Stop censorware blacklist has been extensively analyzed, due to its role in the precedent-setting Mainstream Loudoun case.  It's entirely by-name. However, the ads for X-Stop trumpet "Direct Address Block (DAB)," which merely means that the typical numeric representations of some IP addresses are on the blacklist, too. The simple checking of blacklist text is described with almost comical marking spin: "Because this is done with numbers instead of letters (There are only 10 digits as opposed to 255 characters.) the response is nearly instantaneous." 
Less commonly, the path in the URL itself may be examined for forbidden words ("keyword filtering"). For example, if "censorship" were on the list of forbidden words, any attempt to read the material at http://www.eff.org/Censorship/ would be rejected.
Blacklists can have multiple categories of banned sites, (e.g. one for "Sex," another for "Drugs," perhaps another for "Rock And Roll," and so on), which often leads to discussion of tuning the censorware by selecting categories. But blacklists are almost always secret, so there's no way to know what sites are actually in the category. This secrecy is zealously guarded by almost all censorware manufactures. In Microsystems v Scandinavia Online AB , the company that makes CyberPatrol sued two programmers who reverse-engineered and cryptanalyzed the CyberPatrol blacklist and published their results.
The whole list-matching process above may be repeated all over again against exception lists or "whitelists." A few products consist only of whitelists, or can work in whitelist-only mode. For example, CyberPatrol named its blacklist the "CyberNOT" list, and called its whitelist "CyberYES."  It can be set (in both client and server versions) so that everything not prohibited is permitted (blacklist-only), or only that which is explicitly allowed is permitted (whitelist-only). And of course the whitelist can override the blacklist. In general, such blacklist/whitelist settings are standard in server-level programs, along with the ability to create additional organization-specific blacklists or whitelists. These options shouldn't obscure the fact that they are nothing more complex than matching a string against lists of items deemed naughty or nice.
Some censorware programs try to implement more exotic approaches to determining bans, based on scanning images or words on a page. These products or features work so poorly that they are barely worth discussing.
B. The mathematics of censorware
It's instructive to consider just how large are some of the numbers involved in censoring the web.
As of February 2001, there were more than 35 million domain names registered in the world. More than 20 million of these were ".com" domains.  Web server surveys have shown there were more than 27 million web servers in operation as of January 2001.  One study estimated the Web to have approximated 800 million pages in February 1999.Steve Lawrence and C. Lee Giles, Accessibility of information on the web, 400 Nature 107-109 (1999). 
Many censorware companies claim their blacklists are human-reviewed.  How long it would take to evaluate the whole web? How fast can a person evaluate a web page? Does it take 1 second? 10 seconds? Co-author Finkelstein's experience in preparing evidence for Mainstream Loudoun suggests that a reasonable overall estimate is one page per minute. While people can work faster for brief periods, extended viewing is boring and fatiguing. One page per minute is a good order-of-magnitude number (one page per 6 seconds, or one page per hour, are certainly unreasonable).
One page per minute is 60 pages per hour. That's 480 pages per eight-hour workday. Let's call it 500 pages per workday for ease of calculation. At 200 workdays per year, we have 100,000 pages per work-year. So one person doing only censorware evaluation could only do 0.1 million pages in a year. Compare this to the above numbers about the size of the web and number of domains. Evaluating the whole web at a per-page level-- 800 million pages at 0.1 million pages per work-year would take 8,000 workyears.
We don't claim that the above result is precise. The idea is to get a reasonable estimate of the size of the task. Even if we're off by a factor of 10, it's still an enormous number. Thus consider some very generous assumptions. If pages are evaluated twice as fast, and 100 people working full-time doing nothing else, it still takes 400 workyears.
To make matters worse, the web isn't static. It's constantly changing. A simple way to see the impossibility of evaluating the whole web is as follows: For every single day's changes, a significant fraction of that total change would have to pass through the censorware company. Again, every day, including weekends and holidays, as the Internet is international and worldwide in scope.
Further, let's consider roughly how much of the web consists of sex sites. Some well-researched estimates here are "Only 1.5% of sites contain pornographic content"  or 1.9% to 2.3% "Percent of Public Web" is "Adult Sites." 
This small relative number (about 2%) of commercial sex sites answers a key objection to the above estimate of the work needed to examine the web. Certainly, it's not necessary to look at every web page of a website avowedly dedicated to selling sexual content. However, the proportion of those sites is almost a rounding error in the estimation of the overall web size. Approximately 98% of the web is not so easy to dismiss.
These size statistics are relevant from another angle. While 2% of 27 million web servers is tiny in relative terms, it yields more than half a million in absolute numbers (2*10^-2 * 27*10^6 = 54 * 10^4 = 0.54 * 10^6).
It's to follow the links where commercial sex sites and commercial sex lists refer to each other. This can easily generate a blacklist of (as an order of magnitude) 100,000 commercial sex sites. Compiling such a blacklist would not represent any significant sampling or coverage of the entire web. Rather, it's a comparatively simple task, given the directories of such sites. 
But given the estimate above that a person can evaluate perhaps 100,000 pages per work-year, it would take one work-year for one person to evaluate a 100,000 item blacklist. It is utterly impossible for ordinary people to attempt to validate these blacklists, even if they weren't almost always secret in the first place.
In sum, neither the web, nor large censorware blacklists, can be human-reviewed.
C. Artificial idiocy
Given the inability to human-review the web, a censorware blacklist necessarily must be created largely by a computer program. Often extravagant claims are made regarding such programs, usually involving the buzzwords "Artificial Intelligence." It is important to debunk this myth of intelligent censorware, both empirically and theoretically.
Whenever censorware blacklists have been examined, the so-called intelligence has turned out to be nothing more than looking for simple keywords. The canonical example is the phrase "breast cancer" triggering bans because of the occurence of the word "breast." Now, this example has become so well-known and much-discussed that it's likely censorware companies make sure that keyword searching programs treat it as a very special case. But it's just one example. Any extensive investigation of an actual blacklist tends to produce many other instances. 
From a theoretical point of view, the claimed abilities would require a computer-science breakthrough of Nobel-Prize magnitude. Consider the legal test for "obscenity," which requires that an obscene work "taken as a whole  lack  serious literary, artistic, political or scientific value."  It is hard to see how anyone can seriously assert that computer programs could make such a judgment when humans endlessly debate these concepts.
Moreover, "obscenity" is typically discussed as if it were an intrinsic property. But the "contemporary community standards" prong of the obscenity test  makes it a geographical variable. Standards vary considerably from Memphis to San Francisco. Whether material is obscene or not depends on location. The concept of "harmful to minors" is doubly variable. It involves dimensions of both location and age. The combination 7-year-old/Memphis will be extremely different from 17-year-old/San Francisco. Such complex determinations are beyond any computer in the foreseeable future.
Of all claims made about censorware's abilities, the most ludicrious is the claim that a program can evaluate images for nudity. First, people often don't think that it suffers from the same problem that afflicts the "flesh-colored" band-aid. Human beings come in a wide variety of hues (commonly ranging from light pink to yellow to brownish to nearly black). Any program that claims to scan for "skin color" is either being obnoxiously racist in its implicit definition, or will be counting far too much to be meaningful. Second, gray-toned images present an even greater problem. No censorware alleged to have such fantastic image-recognition capability has ever stood up to serious evaluation. 
D. The inherent under- and over-inclusiveness of censorware
Given the mathematical constraints addressed above, it should come as no surprise that censorware ends up banning both too little and too much with regard to its supposed main target. Too little, in that many commercial sex sites will not be on the blacklist. Too much, as the attempts to construct such an overarching blacklist are an invitation to everything from deliberate agendas to shoddy work.
Moreover, that the blacklist will likely have a large collection of commercial sex sites leads to a kind of statistical masking. Imagine if a blacklisting company simply took a list of people who had been in prison (getting 100,000 such names would be fairly easy), then added its own personal enemies, and claimed it to be a list of harmful-people. As much of that list would undeniably be convicted criminals, the claim would be statistically accurate. But the opportunities for everything from deliberate malice to random error is obvious.
Further, suppose that such a list was kept by either looking up a person's name or home address. If by-name, even a slight variation in a name would result in not matching the list. But if kept by home address, many people could be living at that address. In fact, "virtual hosting," an arrangement in which many different (and usually unrelated) websites share the same IP address, perfectly exemplifies this problem. Blacklisting methods are either too narrow (by-name) or too broad (by-address).
Virtual hosting is only one illustration of a general principle. Any simple rules for banning material have similar problems. Consider the methodology for matching the word "sex" in a URL. Should all directories named "/sexstuff" or "/sexsites" qualify? If not, then simple variations will render the ban useless. But if so, the same rule will also ban directories named "/sextet," "/sextant" or "/sexton." Thus, there's no way to match enough variants to have a high assurance of excluding all "/sexual" material, without blacklisting a sextillion innocuous items. These problems have been repeatedly documented in reports from Peacefire (http://peacefire.org), Censorware Project (http://censorware.net), and Seth Finkelstein's Anticensorware Investigations (http://sethf.com/anticensorware/). But this work is sometimes dismissed as too partisan. It is corroborated, however, by experiments from the much more widely known Consumer Reports  and similar results from the UK Consumer's Assocation. 
Yet the overinclusiveness of censorware is even worse, and in hindsight is quite obvious from the perspective of architectural control. Certain categories of sites that themselves contain no "toxic" material must also be censored.
The following subsections are based on co-author Finkelstein's  empirical investigation of the censorware product SmartFilter, made by Secure Computing. He explains: "Often, blacklists are subdivided into many categories - hypothetically, Sex, Drugs, Rock-and-Roll, etc. Rather than finding some feminist websites banned under Sex, or medical-marijuana political advocacy considered Drugs, I decided to take a different approach for a change. Consider the number of categories under which a site is blacklisted to be a type of evilness-index. I wanted to know which sites did SmartFilter consider to be of greatest evil? Which URL's were ranked as so vile, so corrupting, that they achieved a kind of censorware academy-award sweep, appearing in as many categories as possible?"
1. Privacy and anonymity sites
Censorware is designed to control access to information. Thus, the subject can never be let out of the blinder-box. It follows that all privacy and anonymity services, all websites that let a user receive material via an encrypted or private form, represent a threat to that control. Since the blacklist could be thwarted by using such privacy/anonymity services, these services must then be on the blacklist in almost every SmartFilter category (the NonEssential category, reserved for user-defined entries, was the only one not found below). And indeed, a who's-who of privacy and anonymity service sites turned out to be blacklisted virtually everywhere. This is a non-exhaustive list:
To be concrete, these are all blacklisted by SmartFilter as all of the following categories:
Sex Related, Illegal Drugs, Hate Speech, Criminal Skills, Worthless, On-line Sales, Gambling, Personal Pages, Job Search, Sports, Games and Fun, Humor, Alternative Journals, Entertainment, Alternative Lifestyle, Extreme/Gross Content, Chat/Web-mail, Investments Information, General News, Poltics/Opinion/Religion, Dating and Introduction Services, Art/Culture, Cult/Occult, Usenet News Site, Self Help, Travel.
Simply put, privacy and anonymity are inimical to the goal of control. The point of censorware is to prevent a person from reading banned material. But privacy and anonymity web sites are dedicated to allowing people to escape monitoring and constraints of authorities. Thus, in order not to have a break in the blinder-box, they must be blacklisted. Everywhere.
Genesis 11:6, 11:7, 11:9, part of the legend of the Tower of Babel
2. Language translation sites
And the Lord said, Behold, the people is one, and they have all one language; and this they begin to do: and now nothing will be restrained from them, which they have imagined to do.
Go to, let us go down, and there confound their language, that they may not understand one another's speech. . . .
Therefore is the name of it called Babel; because the Lord did there confound the language of all the earth: and from thence did the Lord scatter them abroad upon the face of all the earth.
The second broad group of websites that SmartFilter considered to be of greatest evil was slightly more surprising - translation services, i.e., sites that enable users to read webpages written in a different language. Here's a non-exhaustive list:
What could be so offensive about the poor Babelfish? A moment's reflection revealed that it raised the same issue as the privacy or anonymity services listed above. Even something as prosaic and useful as a web-page translation system permits escape from censorware's information restrictions. Indeed, a language translation site can be used as a way of reading any other content whatsoever. So these websites also must be blacklisted everywhere, again in virtually all categories.
III. Censorware and its law, in light of reality
Government may not regulate speech based on its substantive content or message,  and may not favor one private speaker over another  or impose financial burdens on certain speakers based on the content of their expression.  Discrimination based on speakers' views on a subject is an egregious form of content discrimination."  Even in a non-public forum, government regulation must be viewpoint-neutral when directed against speech otherwise within the forum's limitations.  In short, "power in government to channel the expression of views is unacceptable under the First Amendment."  Thus, in general, government use of censorware must be subject to strict scrutiny.
A. Censorware and minors
The constitutionality of technological solutions used by government entities depends on what they block. Absent special circumstances, the outer limit of constitutionally permissible blocking is so-called "harmful to minors" (HTM) speech.  HTM speech may be fully protected by the First Amendment as to adults, yet "unprotected" as to minors.
Speech is "harmful to minors" if it (i) is "patently offensive to prevailing standards in the adult community as a whole with respect to what is suitable . . . for minors"; (ii) appeals to the prurient interest of minors; and (iii) is "utterly without redeeming social importance for minors." 
Courts have narrowly tailored HTM-based access restrictions to its constitutional boundaries. This point was made clear in Reno v. ACLU, where the Supreme Court held unconstitutional a general prohibition of indecent speech on the Internet. First, parents may disseminate HTM speech to their children.  Second, the HTM concept only applies to commercial transactions.  Third, for HTM purposes, minors are those under the age of 17.  Fourth, the government may not simply ban minors' exposure to a full category of speech, such as nudity, when only a subset of that category can plausibly be deemed HTM.  Fifth, the government interest is not equally strong throughout the HTM age range. 
Finally, an entirely separate problem is that HTM speech is not defined monolithically, but rather varies from community to community.  This inherent variation requires that governments tailor any technological solution intended to block HTM speech to local community standards.
B. Applying the analysis to technological solutions
In short, HTM-based regulation is subject to many constitutional limits. Accordingly, the very existence of a government interest in regulating minors' access to the Internet turns upon the extent of censorship. The state's interest does not extend beyond HTM speech, slightly modified for schools. Thus, if censorware or any technological means substantially transgresses this constitutional boundary, it will not survive First Amendment scrutiny under either the substantive strict scrutiny test or formal tests like vagueness and overbreadth. 
It is widely recognized, of course, that existing technologies like censorware block much protected speech that is not HTM. Censorware proponents have argued that technology can be improved, but we have shown that this is highly unlikely. First, there is too much information on the Internet to censor with the constitutionally required precision.  Censorware does not scale well.
Third, the example of SmartFilter's censorship of sites that are blocked for what they do, not what they say, demonstrates that censorware is necessarily caught between over-inclusiveness and ineffectiveness. Equally important, it demonstrates that the real issue here is control, not content. In simple terms, with censorware, people cannot be permitted to read anonymously. Privacy, anonymity, and even language translation sites constitute a security hole to the control system of censorware. If not banned themselves, they allow people to escape the blacklist, and this is unacceptable as a matter of control.
IV. Conclusion: censorware, ratings, and architectural censorship
The search for a technical solution is fundamentally misguided, because it has been guided by unstated and unsupportable assumptions. For censorware, the hope has been that the problem of overinclusiveness can, over time, be technologically solved. We have shown that there is an inherent tradeoff between effectiveness in controlling minors' access and inclusiveness or fit: for effective censorware must block sites that do not offer "offensive" material but do offer "tools," such as privacy, anonymity, and language-translation utilities.
This example also illustrates our more general point about architectural censorship: the evils of invisibility. We are not aware of any public documentation of the blacklisting of privacy, anonymity, and translation sites other than co-author Finkelstein's work. EFF believes that the use of censorware is not only legally unjustifiable, but corrosive of free speech values and public debate over First Amendment rights. As Prof. Lessig put it, "[o]nly when regulation is transparent is a political response possible."  Censorware hides blacklists in black boxes.
1 [Lawrence Lessig, Code and Other Laws of Cyberspace 177 (1998).]
2 [Seth Finkelstein, "SmartFilter - I've Got A Little List" <http://sethf.com/anticensorware/smartfilter/gotalist.php>]
3 [See Butler v. Michigan, 352 U.S. 380, 383-84 (1957).]
4 [Filtering the Internet: A Best Practices Model, by members of the Information Society Project at Yale Law School <http://www.copacommission.org/papers/yale-isp.pdf>]
7 [See Lawrence Lessig and Paul Resnick, Zoning Speech on the Internet: a Legal and Technical Model, 98 Mich. L. Rev. 395, 397-398 (1999) (distinguishing first- and second-order effects of architectures).]
8 [Lessig, Code, at 178 (citing Jonathan Weinberg, Rating the Net, 19 Hastings Comm/Ent L.J. 453, 478 n. 108 (1997)).]
10 [See generally Peter Kriete, Note, Caller ID and the Great Privacy Debate: Whose Phone Call Is It, Anyway?, 97 Dickinson L. Rev. 357 (1993).]
11 [Thomas Schelling, Micromotives and Macrobehavior 94 (1978).]
12 [Michael Chwe, Culture, Circles, and Commercials: Publicity, Common Knowledge, and Social Coordination, 10 Rationality & Soc'y 47, 49-50 (1998) (common knowledge is fact or event that everyone knows, everyone knows that everyone knows it, and so on).]
13 [One role of advocacy groups like is to produce common knowledge about the effects of new technologies on civil liberties]
14 [Randal Picker, Simple Games in a Complex World: A Generative Approach to the Adoption of Norms, 64 U. Chi. L. Rev. 1225, 1228 (1997) ("seeded norms" may "take root and spread"). ]
15 [See, e.g., Lamont v. Postmaster Gen., 381 U.S. 301, 307 (1965) (invalidating law requiring willing recipient to request that certain, state-defined materials be sent to him). ]
17 [Erving Goffman, Stigma 57 (1963). ]
16 [See generally James B. Rule, Private Lives and Public Surveillance (1973) (explaining how social control in large-scale societies depends on identification architectures); Lessig, Code 30-42 (discussing identification and "architectures of control"). ]
17 [Erving Goffman, Stigma 57 (1963). ]
18 [See generally Lee Tien, Who's Afraid of Anonymous Speech? McIntyre and the Internet, 75 Or. L. Rev. 117 (1996)]
19 [See generally Julie E. Cohen, A Right to Read Anonymously: A Closer Look at "Copyright Management" in Cyberspace, 28 Conn. L. Rev. 981 (1996); id. at 1013 (while the right to speak anonymously may be used to defame or harass, "the mere act of reading cannot injure"); id. at 1014 (monitoring of reading choices chills thought).]
20 [Id. at 1031 n. 213 (listing statutes).]
21 [ <http://www.securecomputing.com/pdf/SFProdOverview30.pdf>; see SmartFilter's description on page 2-2, whole page 20, "Understanding the SmartFilter Control List."]
22 [Mainstream Loudoun v. Board of Trustees, 24 F. Supp.2d 552 (E.D. Va. 1998) (holding public library's use of X-Stop unconstitutional). ]
23 <http://www.inet-ads.com/consumer/x-stop/x-sdab.htm>. ]
24 [98 F.Supp.2d 74 (D.Mass 2000); see also Matthew Skala, "Cyber Patrol break FAQ" <http://www.islandnet.com/~mskala/cpbfaq.html>
25 [SurfControl Products Overview <http://www.surfcontrol.com/products/overview/index.html>.]
26 [WorldWide Domain Statics, NetNames Global Domain Name database <http://www.domainstats.com/>.]
27 [<http://www.netcraft.com/survey/Reports/0101/>. ]
28 [<http://www.wwwmetrics.com/>; see <http://www.archive.org/> ("1 billion pages").]
29 [E.g., <http://www.securecomputing.com/index.cfm?skey=86> ("Candidate sites are then added to the Control List, after being viewed and approved by our Control List technicians.") (SmartFilter). ]
30 [Lawrence and Giles, n. 28 supra.]
31 [Office of Research, OCLC Online Computer Library Center, Inc. <http://wcp.oclc.org/stats.htm> (the web server numbers here are a year behind those cited above).]
32 [<http://dir.yahoo.com/Business_and_Economy/Shopping_and_Services/Sex/> would be a simple starting point, yielding many thousands of items.]
33 [For a case study, see Seth Finkelstein, SmartFilter - I've Got A Little List <http://sethf.com/anticensorware/smartfilter/gotalist.php>.]
34 [Miller v. California, 413 U.S. 15, 24 (1973).]
35 [Ibid ("whether the average person, applying contemporary community standards would find that the work, taken as a whole, appeals to the prurient interest").]
36 [For an experimental evaluation of one image-scanner, see Peacefire's investigation of the program BAIR, conducted by Bennett Haselton, <http://www.peacefire.org/censorware/BAIR/<.]
37 [Consumer Reports: When Online Becomes Off-Limits for Kids, <http://www.consumersunion.org/other/onlineny1198.htm>; Consumer Reports: Digital Chaperones for Kids, <http://www.consumerreports.org/Special/ConsumerInterest/Reports/0103fil0.html>.]
38 [Consumer's Association (UK), Internet filters don't safeguard children against pornography and other net nasties, <http://www.which.net/whatsnew/pr/may00/which/netnannies.html>.]
39 [Mr. Finkelstein formerly was chief programmer for the Censorware Project, which analyzed many censorware products in order to investigate what they actually censored. The work presented here, however, is independent and associated with no organization, including EFF. After EFF learned of his paper SmartFilter's Greatest Evils, EFF asked him to co-author this White Paper. EFF thanks Mr. Finkelstein for doing so; virtually all the non-legal discussion here is his contribution. SmartFilter's Greatest Evils is at <http://sethf.com/anticensorware/smartfilter/greatestevils.php>.]
40 [Police Dept. of Chicago v. Mosley, 408 U.S. 92, 96 (1972). ]
41 [City Council of Los Angeles v. Taxpayers for Vincent, 466 U.S. 789, 804 (1984).]
42 [Simon & Schuster, Inc. v. Members of N.Y. State Crime Victims Bd., 502 U.S. 105, 115 (1991).]
43 [R.A.V. v. St. Paul, 505 U.S. 377, 391 (1992); id. at 430 ("restrictions based on viewpoint are . . . particularly pernicious").]
44 [Perry Ed. Assn. v. Perry Local Educators' Assn., 460 U.S. 37, 46 (1983); International Soc'y for Krishna Consciousness, Inc. v. Lee, 505 U.S. 672, 678-79 (1992).]
45 [First Nat'l Bank of Boston v. Bellotti, 435 U.S. 765, 785 (1978).]
46 [Because we are addressing speech on the Internet, and the Supreme Court has already rejected Internet regulation based on the concept of "indecency," we focus instead on the "harmful to minors" concept. See Reno v. ACLU, 521 U.S. 844, 867-870 (1997).]
47 [Ginsberg v. New York, 390 U.S. 629, 633 (1968) (upholding conviction of magazine vendor for selling adult magazine to 16-year-old).]
48 [Reno, 521 U.S. at 865.]
49 [ Ibid.]
50 [Id. at 865-866.]
51 [Erznoznik v. City of Jacksonville, 422 U.S. 205, 212-14 (1975).]
52 [Reno, 521 U.S. at 878. Lower courts have held that "'if a work is found to have serious literary, artistic, political or scientific value for a legitimate minority of normal, older adolescents, then it cannot be said to lack such value for the entire class of juveniles taken as a whole.'" American Booksellers Ass'n v.Webb, 919 F.2d 1493, 1504-05 (11th Cir. 1990) (quoting American Booksellers Ass'n v. Virginia, 882 F.2d 125, 127 (4th Cir. 1989) (other citations omitted)).]
53 [ACLU v. Reno, 217 F.3d 162, 177 (3d Cir. 2000) (HTM category based on concept of discrete, diverse, geographically-defined communities), cert. pet. filed, Feb. 12, 2001 (No. 00-1293).]
54 [We do not address an even stronger argument against censorware -- that it operates as a prior restraint -- because it was well analyzed in Mainstream Loudoun. Our point that effective censorware must block privacy, anonymity, and translation sites also strengthens that argument.]
55 [See NAACP v. Button, 371 U.S. 415, 438 (1963).] Second, because the definition of HTM speech varies both with age and location, any constitutionally satisfactory HTM-based censorship must be tailored to both variables.]
56 [Lessig, Code, at 181.]