One of the biggest ways that analytics data becomes skewed is by ghost spam. These are fake visits to your site that are trying to entice you to go to the spammer’s URL. To see ghost spam, choose the Network report under Audience - Technology and then choose Hostname as the primary dimension. Everything listed that is not a valid hostname for your site is ghost spam. You can remove this traffic by creating a regex of all your valid hostnames and then creating a custom view filter to include only this list. You can also remove this spam from historical data by creating a custom segment that does the same thing.
Along with ghost spam, most sites will have issues with crawler spam, also known as referral spam. This happens when crawler programs, or bots, show up as referral traffic in your data. While most bots are trying to increase the search engine ranking of the website they are promoting, not all referral spam is malicious. For example, “googlebot” may show up when Google crawls and indexes the pages on your site. Even though this isn’t a bad thing, it still isn’t relevant data for your analytics.
Because crawler spam uses a valid hostname, it is harder to detect and requires a different filter from the ghost spam to remove it. To do this, you can create a filter that excludes Campaign Sources with the crawler spam names. You can find these names by looking at the Referrals report under Acquisition. It can be fairly time consuming to create a regex of all of these names, however you can typically search online for an updated list of the most common crawler spam. Like ghost spam, this can also be excluded from historical data with a segment.
Google Analytics also provides a quick and simple way to reduce traffic from bots under View Settings. Towards the bottom of the screen there is a check box under Bot Filtering that states, “Exclude all hits from known bots and spiders”. Check this box and click Save.
It’s likely that applying all of these filters will reduce your total number of sessions and users, but you’ll also probably see positive improvements in bounce rate and session duration since a lot of spam traffic has 100% bounce rates and 0:00 session durations. Even if filtering out spam has a less than desirable effect on your data, it’s still important to remove it. Google Analytics can only help you make the best decisions for your website if the information is accurate and relevant.
If you need help with cleaning up your analytics, or any part of your SEO, fill out our Get Started form today!