Wednesday, May 6, 2015

Three effective solutions for Google Analytics Referral spam

http://www.blackmoreops.com/2015/05/06/effective-solutions-for-google-analytics-referral-spam

I published this post darodar.com referrer spam and should you be worried? back in December and I am still seeing a constant influx of frustrated website owners and concerned netizens getting worried about similar spams. I happen to be one of the first to detect this spam and post about it. I didn’t pay much attention to it as referral spam or web analytics is not my primary concern when it comes to computing. Working in IT field for over a decade and specifically IT security, I have a different view on spam and how they can be stopped. I opened my Analytics account yesterday cause I saw 25% traffic increase from Facebook, Twitter and many random sources and 83% increase on the root (“/”) of the server. Well, 25% is nothing, it can happen due to a post going viral. But this wasn’t the case this time as 83% increase was specific to the root (“/”) of the server It seems, our ‘beloved’ ‘Vitaly Popov’ has started a new stream of referral spam. He’s got more crafty as I predicted in my original post. He’s now actually using Facebook, Twitter as referrals including some new domains. In this post I will show three effective solutions for Google Analytics Referral spam.

Some facts about Google Analytics Referral spam:Three effective solutions for Google Analytics Referral spam - blackMORE Ops - 5

  1. By this time you know that Ghost Google Analytics Referrals spam cannot be blocked by .htaccess or web configuration.
  2. Ghost Google Analytics Referrals spam bots doesn’t really visit your website, so no trace of IP address be found in server logs.
  3. Ghost Google Analytics Referrals spam only abuse Google Analytics.
  4. Google Analytics hasn’t done anything about it, yet (officially).
  5. Google implemented encryption for all of their AdSense traffic.
  6. Ghost Google Analytics Referrals spam only affects Google Analytics.
  7. *** Ghost Referrals spam also affecting Yandex and few other search engines.
  8. As these bots doesn’t visit your website, they have no idea what your page title is. So Analytics will show (“/”) as the page title.
  9. These Ghost Google Analytics referral spam bots only targets your primary Tracking ID i.e. ‘UA-XXXX-1′

List of known Google Analytics Referral spam domains

Click to open list containing known Google Analytics Referral spam domains:

List of 194 new Google Analytics Referral spam domains

I now have a list of another 194 spammer domains that started yesterday.

Click to open list of new 194 new Google Analytics Referral spam domains

I mean seriously? users.skynet.be? It’s good to see they have some sense of humour.
So it seems very soon filters wont be enough. Actually it’s already not enough. Despite what the Analytics experts says, you can’t go around every day to filter hundreds of domains. Yes, you could filter for .be (i.e. Belgium) domains, but that’s a whole country we are talking about. So what is the best fix?

Solution 1: Create a new Tracking ID for your website


The simplest solution is often a good place to start
– William of Ockham’s Occam’s Razor
When I started looking around for a good solution, I was surprised the amount of information’s that became available since my last post about Referral spam in December. Some were well written, some were just rubbish.
Some spammers like Semalt actually visit your website, so you can block them using usual .htaccess or web configuration. They are an easy fix:
SetEnvIfNoCase Referer semalt.com spambot=yes
Order allow,deny
Allow from all
Deny from env=spambot
But Ghost referral is a Google Analytics problem. So I found a solution using Google Analytics rather the wasting time on adding filters.

Using Google Analytics to solve it’s own problem:

Google Analytics is very limited but their help document is very clear on how to use Analytics code. According to Advanced Configuration – Web Tracking (analytics.js) you can use multiple trackers on same website (old news!). But here’s the loophole in their coding that I found:
All the spammy bots are using only the first Tracking ID i.e. 'UA-XXXX-1'. So subsequent properties under your Analytics accounts are unaffected. i.e. 'UA-XXXX-2', 'UA-XXXX-3' and so on.
I just created another property in my Analytics account, configured it same as my primary one and added that to my website.

Instruction on how to setup a property in Google Analytics

In general, you just pretty much copy paste and enable any config you had in your primary Analytics account. Creating a second property for the same website/URL doesn’t hurt anything or affects anything. It’s just another container where data is stored.

My sample original Google Analytics tracking ID

My new sample Google Analytics tracking ID

Create new combined Google Analytics Tracking ID

Google Analytics Advanced configuration, Working with Multiple Tracking Objects, shows how to create a new combined Google Analytics tracking ID and put them in your website.
In some cases you might want to send data to multiple web properties from a single page. This is useful for sites that have multiple owners overseeing sections of a site; each owner could view their own web property.
To solve this, you must create a tracking object for each web property to which you want to send data:
ga('create', 'UA-XXXX-Y', 'auto');
ga('create', 'UA-12345-6', 'auto', {'name': 'newTracker'});  // New tracker.
Once run, two tracker objects will be created. The first tracker will be the default tracking object, and not have a name. The second tracker will have the name of newTracker.
To send a pageview using both trackers, you prepend the name of the tracker to the beginning of the command, followed by a dot. So for example:
ga('send', 'pageview');
ga('newTracker.send', 'pageview'); // Send page view for new tracker.
Would send a pageview to both default and new trackers.
This explanation might be slightly convoluted for many users. Here’s mine:

My sample combined new Google Analytics Tracking ID

I’ve also forced SSL on my Google Analytics tracking ID. This wont do any good for this particular spam, but having some encryption is always good in the long run.

Click here to open Google's Instruction on Forcing SSL (HTTPS) on GA

This fixed everything for me. This is the best solution out there and it will continue to work until the spammers changes their code to include subsequent GA Tracking ID’s.

Solution 2: Create a filter for NULL Page Title

If you’re lazy and don’t want to create a new Analytics code, then Solution 2 is the next best option. Actually, I think this might be even better as it will get rid of any similar future spam referrals as well.
If you look closely into your Google Analytics report, you will see that all these Ghost Google Analytics Referral Spam shows Page Title as (not set).
Actually, this is not really (not set), it’s NULL value. That means these fake or Ghost Google Analytics Referral Spam bots are sending fake data using your tracking ID. But how are they going to set your Page Title?
To get Page title, a bot actually have to visit your website. Without visiting your website it will become very tough to include that bit of information (correct info, they can always use bogus data). So they’ve left that bit of info empty or NULL and when Google Analytics gets these fake data, it sets Page Title as NULL or (not set).
To create a filter for your view, select Admin > Account > Property > View > Filters.
Three effective solutions for Google Analytics Referral spam - blackMORE Ops -1
Fill up the Filter with the following information’s:
  1. Filter Name: Page Title (not set)
  2. Filter Type: Select Custom
  3. Select Exclude
  4. Filter Field: Select Page Title
  5. Filter Pattern: Put ^$ in this field. ^$ means empty or missing or NULL value.
  6. Filter Verification: Click “Verify this filter”.
    • It will show you how your filter would affect the current view’s data, based on traffic from the previous seven days.
    • Note: Verify will only work on an existing view where you have at least 7 days worth of data.
    • Verify will not work if you’ve created a new Tracking ID from Solution 1. (cause it doesn’t have enough data.)
  7. Click Save
Three effective solutions for Google Analytics Referral spam - blackMORE Ops - 2
You will see Ghost Google Analytics referral spam disappearing from your reports within few minutes and within 4 hours, your Google Analytics report will be all clear.

Solution 3: Create a filter for valid Hostnames

To implement this solution, STEP CAREFULLY or you will exclude valid traffic! You MUST identify ALL valid hostnames that may use your website tracking ID, and this could include other websites that you are tracking as part of your web ecosystem — your own domain, PayPal, your ecommerce shopping cart, and all of reserved domains (in case you decide to use them).
Start with a multi-year report showing just hostnames (Audience > Technology > Network > hostname), then identify the valid ones — the servers where I have real pages being tracked.
Three effective solutions for Google Analytics Referral spam - blackMORE Ops - 3
Then create a filter with an expression that captures all of the domains that I consider valid. For example:
www.blackmoreops.com
OR
.*blackmoreops.com|.*youtube.com|.*amazon.com|.*googleusercontent.com
Three effective solutions for Google Analytics Referral spam - blackMORE Ops - 4
This can be used as a supplementary addition to Solution 2. It’s mainly because you would never know where you are getting your traffic from and it’s a lot of work keeping this filter updated. Also as time goes, your filter will become bigger and the chance of making a mistake will increase. But it’s a good solution nevertheless.
Read more details on hostname filter here.

Conclusion

This is entirely Google’s problem and entirely their issue to resolve. I wouldn’t waste a single moment creating filters for Ghost Google Analytics referral spammers. If you want you can block spam bots that actually visit your website using .htaccess or web-server configuration etc.
The above solution works 100% right now, but it’s very easy for the spammers to modify their code to add subsequent Google Analytics Tracking ID’s. If that happens, keep an eye in here, I will come back with another solution. Share and Retweet this guide for those stressed webmasters.

1 comment:

  1. They are spamming different parts of Google Analytics using their Measurement Protocol, which allows servers to communicate directly with Analytics. So far we’ve seen referral spam, pages as you mentioned, search terms under organic traffic, and even event tracking spam.


    Steven@Analyticsspamsolution

    ReplyDelete