Google Analytic's Code Scraper

For SEO purposes, I mocked together a quick script to scrape clients websites to make sure that they had Google analytics setup on their sites.
It searches for the known code "UA-" which is the standard starting letters for Google analytics. Go and try it out! (On sites that your allowed to scrape that is! :P)
Yes I am well aware there are many caveats, such as any enterprise tracking or non-google, but for a low tier website hosting company this sufficed plenty.

#!/bin/bash
# usage ./analytics-scrape.sh www.slowb.ro anayltics-output.txt
# real world usage:
# for item in $(cat domains-that-we-scrape.txt); do echo $item; ./analytics-scrape.sh www.$item analytics-output.txt; done
# I would suggest that you test domains without www. first and it will throw an error for that site. If it does, then you already have a bad start to your seo practices. 

site=$1;  
output=$2  
regex="UA-[1-9]*-*[1-9]"  
atauaid=$(wget $site -qO- -U 'Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20100101 Firefox/10.0.2' -T 30 --tries=3)  
if [ "$atauaid" != "" ]; then  
    uaid=$(echo -n "$atauaid" | grep UA- | grep -v verification | grep -v X-UA | head -n 1 | awk -F"'" '{ for (i=1;i<=NF;i++) if ($i ~ /UA-/) print $i }');
    if ! [[ "$uaid" =~ "UA*" ]]; then
         uaid=$(echo -n "$uaid" | awk -F'"' '{ for (i=1;i<=NF;i++) if ($i ~ /UA-/) print $i }');
    fi
 if [[ "$uaid" =~ $regex ]]; then
     echo "$site , $uaid ,  match ,"  >> $output
 else
     _uaid=$(echo -n "$atauaid" | grep UA- | grep -v verification | grep -v X-UA | awk -F"'" '{ for (i=1;i<=NF;i++) if ($i ~ /UA-/) print $i }')
     if ! [[ "$_uaid" =~ "UA*" ]]; then
         _uaid=$(echo -n "$_uaid" | awk -F'"' '{ for (i=1;i<=NF;i++) if ($i ~ /UA-/) print $i }');
     fi
     if [[ "$_uaid" == "" ]]; then
         echo "$site , $_uaid , missing" >> $output
     else
         echo "$site , $_uaid , backupmatch" >> $output
     fi
 fi
else  
    echo "Scraping $site FAILED" >> $output
fi  

Tim Coombs

Administrator of Slowb.ro and world leader of my own mind, the only place our ideas and thoughts are our own in a world gone mad

In a terminal https://slowb.ro

Subscribe to Slowb.ro's Blog

Get the latest posts delivered right to your inbox.

or subscribe via RSS