Treacherous metadata in company documents
by Daniel Bachfeld
Office documents can contain metadata such as names, storage locations and version information about the software used to create them. An attacker can exploit this information for targeted attacks. The free tool Foca shows how talkative a company's downloadable documents are.
In recent weeks, reports of hacking attacks on companies have been mounting up. HBGary, RSA, Epsilon and Barracuda Networks are among the companies from which hackers have stolen highly sensitive data. The attack on RSA was a highly focused attack which targeted individual employees. The hackers appear to have collected information on target personnel on the web; social networks such as Facebook and Xing offering ideal forums for doing so.
Often unintentionally, files available to download from a company can also be a rich source of interesting information: Office documents, presentations, images and other files contain metadata such as the author, date and software used which can provide useful tips for carrying out targeted technical or social engineering attacks.
A company can check just how much the files available to download from its web site give away by manually loading them using the relevant applications and going through the file properties. An easier and faster option is to use the free version of metadata extraction tool Foca. To download the tool, users are merely required to enter an email address.
Downloading and analysing files using Foca does not even require the user to know or enter the paths to individual files; the user simply specifies the domain and file types to be analysed. Foca then feeds these to the search engines of Google, Bing and Exalead and uses the results to download all of the files found. The tool supports a range of file formats, including .doc, .pdf, ppt, odt, xls and jpg. For example, entering
site:heise.de filetype:pdf into the search field will return all pdf files found on servers on the heise.de domain.
If the list of links to files found by the search engines is too large or if they take too long, the user can interrupt the search and manually download files by selecting the file, bringing up the context menus with a right-click and selecting "Download" from the menu. Once downloaded, metadata can be extracted by bringing up the context menu on a selected file and clicking "Extract Metadata". Under "Metadata Summary", found in the tree in the left hand panel of the Foca window, Foca sorts the extracted data into Users, Folders, Printers, Software, Emails and Operating Systems. This data can also be viewed for each document individually. Some data points may vary depending on the document type; for example, EXIF data from jpg images embedded in presentations.
Users shows complete user names or user codes, Folders shows the full local path on the author's computer and Operating Systems indicates the Windows version or whether Mac OS X or Linux was used. Software lists which software was used to create the document, such as Adobe Distiller, Microsoft Office or OpenOffice. Where documents are recent, attackers could use this information to draw conclusions on the vulnerability of a system, targeting a specific user with an email containing an exploit for their specific office software version. Users do, however, have to interpret and analyse the data provided by Foca themselves – it is not a vulnerability scanner or attack tool.
Some file formats are more revealing then others. Tests show that PowerPoint files reveal more information than pdf files. This is in part due to the fact that some metadata is discarded when office files are converted to pdf format. PowerPoint files are also of particular interest because Foca is able to extract extra metadata from any images embedded in them – for example the camera used to take the picture. EXIF data also usually contains a thumbnail of the original photograph, which often fails to reflect any changes made to the image. Deliberately obscured areas of a photo may be clearly recognisable in the thumbnail.
Office files often contain the path to the folder in which the file is located, which can in turn provide information on the user's Windows logon name, project names, etc. In some cases, Foca can even extract information on printers used, such as internally used domain names, though our brief in-house testing failed to find a single document containing this information.
Foca can also be used to analyse the collected metadata to correlate network information. Foca collects references to other systems, such as servers, within documents. It uses this to query network information service Robtex for more information on correlated systems. Robtex provides information such as DNS data on domains and the servers on them. Although other dedicated tools for such research exist, having Foca take care of this for you is certainly practical. The tool can also use reverse DNS lookup on any IP addresses found to determine the associated name, where available.
Individual items of data within documents may not be sufficient for targeted attacks, but, as ever, volume matters. With enough pieces of the puzzle, it is possible to put together an overall picture from which experienced attackers can identify points worth pursuing. The best protection from this kind of information gathering it to remove metadata from files or to fill the metadata with dummy data prior to publication. Microsoft has published instructions for doing so manually.
From Office 2010 onwards, Microsoft's Office package has also included a function for removing this revealing data from documents prior to publication. How effective this is, we have not yet tested. An online test is available to see how much information Foca can still extract from individual documents uploaded using the user's browser. Although the vendor promises that it will not save uploaded documents and will use the data obtained for statistical purposes only, users may prefer to restrict their use of the test to non-confidential files. The company behind Foca also offers a product by the name of MetaShield for Internet Information Server and SharePoint. This promises to remove metadata from MS Office, OpenOffice and pdf files on-the-fly before they are served.