Research Reveals Serious Deficiencies in Anti-Phishing Toolbars

A new study by the CyLab at Carnegie Mellon University, "Phinding Phish: An Evaluation of Anti-Phishing Toolbars" shows that anti-phishing browser toolbars are generally not up to the task. The research, carried out by Lorrie Cranor, Serge Egelman, Jason Hong, and Yue Zhang, examined 10 of the 80-90 free anti-fraud toolbars currently available.

The advent of this study at this point in time is particularly interesting in the light of the recent discussion of whether IE or Firefox offer the better anti-phishing protection, sparked by research carried out by SmartWare and 3sharp respectively. Those studies were commissioned by Mozilla and Microsoft, and conspicuously reached opposite conclusions. The Carnegie Mellon study was supported by the US National Science Foundation and the Army Research Office, and there is no suspicion of any hidden agenda here.

The techniques used in the tests are interesting in themselves, employing a three-step method refinement progression. In the first step, four of the toolbars were manually put through their paces, based on information from a filter vendor providing a list of 50 phishing sites discovered within the past 36 hours. The Netcraft toolbar correctly identified 48 of the sites, was unable to determine whether two of them were malicious or not, and identified none of the malicious sites as safe. Interestingly, the Cloudmark toolbar, which relies on reputation scores from users, failed to identify every single one of the malicious sites. Since phishing sites exist only for an average of 4.8 days it appears that using user ratings for this type of thing is a bad strategy. Phishing sites simply do not have enough time to build a sufficient amount of user feedback before they are taken down.

In the second iteration of their test method the researchers used a phishing site feed from the Anti-Phishing Working Group (APWG), which was fed into an automatic system, basically consisting of a series of bots, each testing a particular toolbar, controlled by a task manager which looked at the APWG feed. The Task Manager carried out some initial processing of the information in phishing messages to isolate and remove legitimate sites before testing the remaining URLs. To accomplish this the researchers built a variant of Phelps and Wilenski's Robust Hyperlinks. The idea behind this method is to construct a lexical signature for a web page by identifying terms that occur frequently on this page but are rare on other web pages, and then feed this signature into a search engine, in this case Google, and see if the domain name of the web page under scrutiny matches any of the top 30 matches returned by the search engine. If so, the site is considered legitimate. This is robust because most phishing attacks lift the content of the legitimate site they are attempting to spoof, changing only the minimum amount of content required to perpetrate the attack. Hence most searches based on phishing site pages lead not to the phishing sites, but to the legitimate sites spoofed. URLs pointing to images and to pages without text entry fields were also removed, the rationale being that if there is nowhere for a visitor to enter information the page can't be used for phishing. These methods actually demonstrate in themselves some heuristics which are interesting and could be incorporated into an anti-phishing system.

The result of the processing was that the researchers ended up with URLs to sites which had a high probability of being malicious.

The bots then sent page requests to the sites thus identified, using an anonymiser to make it less likely that scammers noticed they were being observed. In each case the bots noticed the reaction of the toolbar under test (based on a simple image comparison method making it unnecessary to hack the code of the individual toolbars to extract data), and returned this information to the task manager.

In the third iterative step this method was refined and manually verified phishing sites obtained from phishtank.com used in the tests. Additionally, the toolbars were tested with 510 verified legitimate URLs in order to test for false positives.

Ten different toolbars were tested:

1. Cloudmark Anti-Fraud Toolbar, a tool running on IE. As mentioned above the detection is based on a reputation system.
2. EarthLink Toolbar, which runs under both IE and Firefox, relies on a combination of user feedback, heuristics and a blacklist.
3. The ebay Toolbar uses a combination of heuristics and a blacklist. It runs under IE.
4. GeoTrust TrustWatch Toolbar works with IE and uses several third-party reputation services and certificate authorities combined with a user-input driven blacklist.
5. Google Safe Browsing runs with both IE and Firefox on most platforms. Its functionality is integrated into the new Firefox 2.0 (but not on as default). It uses a combination of heuristics and a blacklist kept by Google.
6. McAfee Site Advisor runs on IE under Windows, and with Firefox on several other platforms as well. This tool uses a combination of heuristics and manual site verification.
7. Microsoft Phishing Filter in IE 7 uses a blacklist hosted by Microsoft as well as some heuristics.
8. Netcraft Anti-Phishing Toolbar uses a combination of several layers of heuristics and a blacklist. It runs with IE on Windows, and with Firefox on several platforms.
9. Netscape Browser 8.1 relies on a blacklist maintained by AOL. It runs on most common platforms.
10. SpoofGuard is a heuristic anti-phishing tool developed at Stanford University. It runs with IE.

The test results were as follows:

Tool	At First	2 Hours Later	12 Hours Later	24 Hours Later	False Positives	Uncertain
Spoofguard	91%	91%	91%	91%	38%	45%
EarthLink	83%	82%	84%	84%	1%	91%
Netcraft	77%	74%	74%	80%	0%	0%
Google*	70%	71%	76%	84%	0%	0%
Cloudmark	68%	69%	67%	67%	1%	96%
IE7	68%	68%	67%	67%	0%	0%
TrustWatch	49%	49%	48%	51%	0%	48%
Ebay	28%	27%	26%	26%	0%	0%
Netscape	8%	10%	10%	21%	0%	0%
Sites online	100%	98%	93%	70%	n/a	n/a

* This anti-phishing tool is integrated into Firefox 2.0.

EarthLink suffers from similar issues to SpoofGuard, with only 1% false positives but 91% 'unsure'.

Netcraft, Google (and by implication, Firefox 2.0) and IE7 displays no false positives and no 'unsure' results, but even the best of these, the Netcraft Anti-Phishing Toolbar, only identified 77% of the malicious sites as malicious.

You can glean a bit more information from the table, such as how fast blacklists get updated (look at the 24 hours later, during which period of time some of the malicious sites were taken down and some blacklists updated).

As mentioned above, McAfee SiteAdvisor was also included in the tests. It scored 0%. It has been left out here because it is not clear from studying McAfee's information whether this tool is actually intended to be an anti-phishing tool.

Cloudmark has informed heise Security that "the Cloudmark Toolbar that was tested by Carnegie Mellon University was a beta product that had not been updated, or 'fed' with our live data, in more than a year."

There are a couple of remarks to make about the methods used. Only two computers were used to run the bots, so there is a time factor which is not discussed. Judging from the tables in the report it is not significant. Furthermore, it is not explicitly discussed whether the responses of the toolbars under test in any way depend on the browser under which they are run. Since this point has been left out one can perhaps conclude that this is not the case.

The report also includes a section in which a few different attacks against the anti-phishing toolbars are discussed, e.g. accessing malicious sites through a content distribution network. This actually foils some of the toolbars, really a fairly basic error which can be remedied simply by looking for blacklisted URLs as substrings of the URL given to the toolbar.

It is safe to conclude, based on this research, that the performance of anti-fraud toolbars in general leaves a lot to be desired. None of them are very good, so these types of 'semantic' attacks still rely on their targets, humans, to discover and defeat them through safe browsing habits and suspicion against any information seemingly sent them by trusted correspondents such as banks. These toolbars provide no firm defence against fraud and should at best be regarded as alarm bells whose absence of sound provides no guarantee that is well, sort of like smoke detectors with untested batteries... (Niels Bjergstrom)