Site menu Detecting insecure elements in your HTTPS site
e-mail icon
Site menu

Detecting insecure elements in your HTTPS site

e-mail icon


This article expresses the author's opinion at the time of writing. There are no guarantees of correctness, originality or current relevance. Copying the whole article is forbidden. Transcription of selected parts is allowed provided that author and this source are mentioned.

Some days ago I embarked in yet another fad and converted my personal site to HTTPS. It is no longer available in HTTP; access to any non-secure URL is redirected to the SSL-enabled resource.

In terms of privacy, I am not sure if there are major gains in serving static content over SSL. One thing is, the exact page URL is kept secret. Chuck can only see that the user is connecting to a computer that happens to serve my site. And, if everybody uses SSL, even if not strictly necessary, we make things a little more difficult for NSA, because they don't know where to begin sniffing.

Using SSL opened a small Pandora box: insecure content (http://, not https://) loaded from my own site or from third parties (Disqus, Google, YouTube, etc.). It is better to avoid insecure content because it reveals (directly or indirectly) which URL the user is visiting, and a lot of personal information may be leaked for Chuck to see in cleartext.

On top of that, there is the pragmatic reason: insecure content in HTTPS pages makes some browsers to show warnings or refusing to load that content.

Fixing insecure links to my own site was easy enough with grep and sed. Not so easy for other sites; we are never sure that the regular expression is catching 100% of offenders. But the biggest problem is dynamically-loaded content; some content is embedded as Javascript, and God knows what the script loads next.

The only way to get all content loaded by a page, and then search for insecure URLs, is to actually open the page in some browser. The initial suggestion of Osvaldo Santana was PhantomJS, a "headless" browser, one of the nicest tools ever made. I already use it a lot for code coverage tests on some Javascript software.

The other half of the problem is to crawl the site, finding all pages, testing them and generating a report. For that, I already had a script, the W3C validator. It works well, caches data and meta-data so it can be run frequently on a site with minimal impact. (Revalidating only the pages that have been updated is particularly important for the W3C validator because the actual validator is a public service, and it blocks out your IP address if you are perceived as an abuser of the service.)

So I put together a modified version of that Python script, to "validate" pages using PhantomJS. Its name is 'scare' and it can be downloaded here. You will need to install PhantomJS. Modify the script to point to the right path if necessary. Within the script, modify the base_url variable to your site.

A "driver script" for PhantomJS, named 'phantom.js', downloadable here, is necessary as well. The Javascript code intercepts URL requests, later analyzed by the master script. PhantomJS must not quit until all content scripts have had their chance to run and load subordinate content, so we can detect all resources loaded in real usage.

The first run of 'scare' can take many hours, since PhantomJS will scare every page of the site, including images, AdSense blocks, Disqus comments... everything. The screen output has a lot of information; it is wise to redirect the output to a log file.

In any case, the most important deliverable is invallist.txt, a list of site pages that loaded insecure content. The log file is then useful to determine what are the offending URLs. The file errorlist.txt contains broken links to your own site.

If you plan to use the scripts above to scan your own site for insecure content, keep the following points in mind:

Now my site is 100% guaranteed to load secure content only, except for this page that loads a HTTP-only Bitcoin ticker from That page will be kept as a "canary" to make sure that the script keeps detecting the insecure content.

e-mail icon