This article expresses the author's opinion at the time of writing. There are no guarantees of correctness, originality or current relevance. Copying the whole article is forbidden. Transcription of selected parts is allowed provided that author and this source are mentioned.
Some days ago I embarked in yet another fad and converted my personal site to HTTPS. It is no longer available in HTTP; access to any non-secure URL is redirected to the SSL-enabled resource.
In terms of privacy, I am not sure if there are major gains in serving static content over SSL. One thing is, the exact page URL is kept secret. Chuck can only see that the user is connecting to a computer that happens to serve my site. And, if everybody uses SSL, even if not strictly necessary, we make things a little more difficult for NSA, because they don't know where to begin sniffing.
Using SSL opened a small Pandora box: insecure content (http://, not https://) loaded from my own site or from third parties (Disqus, Google, YouTube, etc.). It is better to avoid insecure content because it reveals (directly or indirectly) which URL the user is visiting, and a lot of personal information may be leaked for Chuck to see in cleartext.
On top of that, there is the pragmatic reason: insecure content in HTTPS pages makes some browsers to show warnings or refusing to load that content.
The other half of the problem is to crawl the site, finding all pages, testing them and generating a report. For that, I already had a script, the W3C validator. It works well, caches data and meta-data so it can be run frequently on a site with minimal impact. (Revalidating only the pages that have been updated is particularly important for the W3C validator because the actual validator is a public service, and it blocks out your IP address if you are perceived as an abuser of the service.)
So I put together a modified version of that Python script, to "validate" pages using PhantomJS. Its name is 'scare' and it can be downloaded here. You will need to install PhantomJS. Modify the script to point to the right path if necessary. Within the script, modify the base_url variable to your site.
The first run of 'scare' can take many hours, since PhantomJS will scare every page of the site, including images, AdSense blocks, Disqus comments... everything. The screen output has a lot of information; it is wise to redirect the output to a log file.
In any case, the most important deliverable is invallist.txt, a list of site pages that loaded insecure content. The log file is then useful to determine what are the offending URLs. The file errorlist.txt contains broken links to your own site.
If you plan to use the scripts above to scan your own site for insecure content, keep the following points in mind:
Now my site is 100% guaranteed to load secure content only, except for this page that loads a HTTP-only Bitcoin ticker from realtimebitcoin.info. That page will be kept as a "canary" to make sure that the script keeps detecting the insecure content.