Opera's MAMA analyses the web
Opera has built its own search engine to measure how many badly-coded web pages are out there. Its Metadata Analysis and Mining Application (MAMA) web crawler analyses the structure and technologies used rather than the content of web sites. The crawler has been running for some time, and the first set of results have just been published and should be of great interest to web developers and designers.
MAMA can detect which version of HTML, XHTML or CSS or which script languages are being used, whether there is Flash technology or whether the code is W3C compliant. MAMA also retrieves any metadata such as the editor used, if present. According to Opera, the indexed data can be evaluated to determine the weaknesses of the code generation of many popular technologies.
To emphasise this, the browser vendor has released a report interpreting the results so far collected by MAMA. The search engine examined 3,509,180 URLs in 3,011,668 domains. Opera describes in detail how these URLs were selected. Only 4.1 per cent of all web pages passed through the W3C validator without problems. Opera also evaluated the conformity of the code generated dynamically by various content management systems. The best results were achieved by Typo3 (12.7 per cent), followed by WordPress (9 per cent) and Joomla (6.5 per cent).
MAMA showed that most popular web editors produce poor quality HTML: only Apple's iWeb made a convincing effort, with 81.9 per cent of W3C-compliant code. Only 3.4 per cent of pages from Adobe Dreamweaver's HTML validated correctly; NetObjects Fusion and Adobe GoLive achieved similar results. Microsoft products achieved very poor scores: only 0.6 per cent of all the HTML pages generated with Frontpage or Word complied with the W3C standard.
According to Opera, the idea for MAMA originated in the quality control department. To test Opera "in the wild" they laboriously had to search the web for pages which used this or that technology correctly or incorrectly. As a result, the decision was made to create a separate application for this purpose in 2004. The vendor plans to make the tool available to web developers by invitation before the end of the year.
(lghp)














