Reverse engineering Plausible's less than 1kb JS script
⚠️ oops, I published this blog post by mistake before completing it 😅 LOL - I am not sure how to unpublish this now :D so I am going to leave this as it is and remove this warning line once the post is complete. ⚠️
Plausible is “Simple and privacy-friendly Google Analytics alternative”. It is an open-source software. While trying to self-host it, I arrived at a page which talks about their less than 1kb analytics script. It got me curious about what would be inside it 🤔
So, here we go. Let us start to understand this from scratch.
Adding the script
When you want to enable Plausible analytics for your website, it seems like you would start by adding this little snippet to your website.
1 | <script defer data-domain="domain.com" |
script
tag is used to load JavaScript on the webpage.src
is the web address to the contents of the JavaScript.
Bigger question here is, “what does defer
and data-domain
do?”
Reading through the MDN docs, I learnt that the browser would defer the execution of the JS script to a time when all of the HTML is loaded and parsed. That means all the HTML tags from <html>....</html>
would be present when the script gets executed.
More specfically, it seems like the script would get eexecuted before the DOMContentLoaded event. This event waits only for the loading and parsing of HTML and does not wait for other things like stylesheets, images, etc. The more common load seem to be responsible for catching the occurence of loading the HTML document and all its dependencies like the stylesheets, images, etc.
Okay, so this way the script doesn’t immediately try to load and block things and it waits for at-least a skeleton of HTML to load.
Regarding the data-domain
part, I can’t seem to find references of it in the MDN docs for the script tag. So it is not an in-built attribute of the script
tag. I think it might be a way of passing data from HTML to the JS script. Let us continue and see where this leads us.
The script
This is the entire script.
1 | !function(){"use strict";var a=window.location,r=window.document,o=r.currentScript,s=o.getAttribute("data-api")||new URL(o.src).origin+"/api/event";function l(t){console.warn("Ignoring Event: "+t)}function t(t,e){if(/^localhost$|^127(\.[0-9]+){0,2}\.[0-9]+$|^\[::1?\]$/.test(a.hostname)||"file:"===a.protocol)return l("localhost");if(!(window._phantom||window.__nightmare||window.navigator.webdriver||window.Cypress)){try{if("true"===window.localStorage.plausible_ignore)return l("localStorage flag")}catch(t){}var i={};i.n=t,i.u=a.href,i.d=o.getAttribute("data-domain"),i.r=r.referrer||null,i.w=window.innerWidth,e&&e.meta&&(i.m=JSON.stringify(e.meta)),e&&e.props&&(i.p=e.props);var n=new XMLHttpRequest;n.open("POST",s,!0),n.setRequestHeader("Content-Type","text/plain"),n.send(JSON.stringify(i)),n.onreadystatechange=function(){4===n.readyState&&e&&e.callback&&e.callback()}}}var e=window.plausible&&window.plausible.q||[];window.plausible=t;for(var i,n=0;n<e.length;n++)t.apply(this,e[n]);function p(){i!==a.pathname&&(i=a.pathname,t("pageview"))}var w,d=window.history;d.pushState&&(w=d.pushState,d.pushState=function(){w.apply(this,arguments),p()},window.addEventListener("popstate",p)),"prerender"===r.visibilityState?r.addEventListener("visibilitychange",function(){i||"visible"!==r.visibilityState||p()}):p()}(); |
Feels small. Let me try to get some line breaks to make it more readable.
oh wait! Plausible is an Open Source Software. That means, I can try to get the source code of the un-minified version of the above code.
Going to the plausible github repo, pressing “t” and typing “plausible.js” landed me to the file that is in need: https://github.com/plausible/analytics/blob/1772ddff17f5c2880400f7f7c42d7c1aa772feef/tracker/src/plausible.js
Let us start reading the code now!
The code
The script starts with a good old immediately invoked anonymous function and ‘use strict’ notation.
1 | (function(){ |
After that some variables and functions are getting defined.
1 | var location = window.location |
location
and document
are okay - probably defined to avoid typing window
repeatedly. {{#if compat}}
seems like a server-renered template language notation. They are using it to get a reference to the script element which is executing the plausible script. After that endpoint
variable is set by picking the data-api
if it is present on the script tag or by calling a function called defaultEndpoint
.
Since the script tag doesn’t have the data-api
attribute, let us look at what defaultEndpoint
function does.
1 | function defaultEndpoint(el) { |
At this point, I am guessing compat
is for compatibility with old browsers. For now, my browser us comfortable using the else
block. So, let us zoom on to that. It returns new URL(el.src).origin + '/api/event'
. That means, it takes the script element’s src attribute and forms a new URL object and get’s the origin property.
1 | > new URL("https://plausible.io/js/script.js").origin |
Moving on. There is this little warn
function that is bugging myself to paste it here.
1 | function warn(reason) { |
I cleaned up all the server-side rendered templates to make code folding work for the script :D With that, we are entering the real action.
1 | var queue = (window.plausible && window.plausible.q) || [] |
So, we are creating a queue which hopefully is getting saved in window.plausible.q
object further down the lane.
1 | window.plausible = trigger |
trigger
is a big function and is getting assigned to window.plausible
. After that, we call trigger
function for every element in the queue. Initially, the queue will be empty, so I am going to see what is happening when that is the case.
Now there is a divide happening.1
2
3
4
5
6
7
8
9
10
11
12
13{{#if hash}}
window.addEventListener('hashchange', page)
{{else}}
var his = window.history
if (his.pushState) {
var originalPushState = his['pushState']
his.pushState = function() {
originalPushState.apply(this, arguments)
page();
}
window.addEventListener('popstate', page)
}
{{/if}}
If the URL contains #some-id
at the end, then the if block would be executed and if the URL doesn’t contain any reference to an HTML element identifier, then the else
block is executed.
TIL that there is a DOM event called hashchange.
The hashchange event is fired when the fragment identifier of the URL has changed (the part of the URL beginning with and following the # symbol).
So, if the page’s URL contains the #
suffix, then this makes sures that the page
function is executed after the fragment identified of the URL is changed.
In the other case, we seem to access window.history.
~ ~ ~ ~
To be continued.