Scraping Websites for Likes, Follows, and Subscribers

So I’m working on some code to scrape multiple websites for Likes, Follows, and Subscribers without having to open multiple windows.  I’ve decided to use python, and this is what I have (it’s currently sketch, I’ll clean it up later, I just needed it to do the job).  Feel free to use it, but please email message me back with whatever improvements you make!  For some reason it doesn’t work with all facebook pages, and just grabs the facebook front page’s code when it tries to grab the html.  More updates on this later, feel free to email me with questions.  This is in python, and I don’t have the time to figure out how to save it in a way that keeps the indentations, so you are going to have to make your own, sorry.  I’ll fix it later.

# getSubscribers.py

import urllib

import re

def youtube():
L=[“http://www.youtube.com/user/geekandsundry/about”,\
http://www.youtube.com/user/geekandsundryvlogs/about”,\
http://www.youtube.com/user/caitlindotcodotza/about”,\
http://www.youtube.com/user/lemonkissu/about”,\
http://www.youtube.com/user/allhailskippy/about”,\
http://www.youtube.com/user/2brokegeeks/about”,\
http://www.youtube.com/user/cheesypiratevideos/about”,\
http://www.youtube.com/user/nikaharper/about”,\
http://www.youtube.com/user/geekingoutneil/about”,\
http://www.youtube.com/user/hystericbynature/about”,\
http://www.youtube.com/user/beccacanote/about”,\
http://www.youtube.com/user/KiriCallaghan/about”,\
http://www.youtube.com/user/atomichole/about”,\
http://www.youtube.com/user/thatterigirl/about”,\
http://www.youtube.com/user/beautyarmory/about”,\
http://www.youtube.com/user/amydallen/about”,\
http://www.youtube.com/user/MonarchsFactory/about”,\
http://www.youtube.com/user/sachieTV/about”,\
https://www.youtube.com/user/Scottplaysbadgames/about”,\
http://www.youtube.com/user/MitchHutts/about”,\
http://www.youtube.com/user/TheDrunkenMoogle/about”,\
http://www.youtube.com/user/jamesbrentisaacs/about”,\
http://www.youtube.com/user/telltaleheartxo/about”,\
http://www.youtube.com/user/feliciaday/about”%5D
for url in L:
URL = url

response = urllib.urlopen(URL)
html = response.read()

#print html
m = re.search(‘<span class=”about-stat-value”>.*?</span> subscribers’, html, flags=re.DOTALL)
URL=URL[28:]
URL=URL[:-6]
result = m.group(0)
result = result[31:]
result = result[:-19]
print result

# getFollows.py

def twitter():
L=[‘https://twitter.com/GeekandSundry&#8217;,\
https://twitter.com/enthusiamy&#8217;,\
https://twitter.com/nikaharper&#8217;,\
https://twitter.com/dailydael&#8217;,\
https://twitter.com/neil_mcneil&#8217;,\
https://twitter.com/beccacanote&#8217;,\
https://twitter.com/sachiecos&#8217;,\
https://twitter.com/electricandlive&#8217;,\
https://twitter.com/thatterigirl&#8217;,\
https://twitter.com/el_pinata&#8217;,\
https://twitter.com/kiricallaghan&#8217;,\
https://twitter.com/radioryebread&#8217;,\
https://twitter.com/allhailskippy&#8217;,\
https://twitter.com/cristinaviseu&#8217;,\
https://twitter.com/mitchhutts&#8217;,\
https://twitter.com/beautyarmory&#8217;,\
https://twitter.com/thumbwartitan&#8217;,\
https://twitter.com/2brokegeeks&#8217;,\
https://twitter.com/hollandfarkas&#8217;,\
https://twitter.com/jeffylew&#8217;,\
https://twitter.com/feliciaday”%5D
for url in L:
URL = url

response = urllib.urlopen(URL)
html = response.read()

#print html
m = re.search(‘followers\’> \n <strong>.*?</strong>’, html, flags=re.DOTALL)
URL=URL[20:]
result = m.group(0)
result=result[24:]
result=result[:-9]
print result

#getLikes.py

def facebook():
L=[“https://www.facebook.com/pages/Geek-Sundry/473663412653314?fref=ts&rf=515429875150388&#8221;,\
https://www.facebook.com/pages/Jeff-Lewis/139836749369525?nr&#8221;,\
https://www.facebook.com/ThisIsNikaHarper&#8221;,\
https://www.facebook.com/amydallen&#8221;,\
https://www.facebook.com/sachiecosplay&#8221;,\
https://www.facebook.com/neilmcneilmcneil&#8221;,\
https://www.facebook.com/BCanote&#8221;,\
https://www.facebook.com/monarchsfactory&#8221;,\
https://www.facebook.com/enterthecastle&#8221;,\
https://www.facebook.com/warmgamerteri&#8221;,\
https://www.facebook.com/Scottplaysbadgames&#8221;,\
https://www.facebook.com/KiriCallaghanWrites&#8221;,\
https://www.facebook.com/RyeBreadRadio&#8221;,\
https://www.facebook.com/thatsps&#8221;,\
https://www.facebook.com/hystericbynature&#8221;,\
https://www.facebook.com/iammitchhutts&#8221;,\
https://www.facebook.com/pages/Beauty-Armory/131830093618548&#8221;,\
https://www.facebook.com/2BrokeGeeks&#8221;,\
https://www.facebook.com/hollandjeanfarkas&#8221;,\
https://www.facebook.com/ThisIsNikasaur&#8221;,\
https://www.facebook.com/enthusiamy&#8221;,\
https://www.facebook.com/FeliciaDay&#8221;,\
https://www.facebook.com/pages/Felicia-Day/108279339196066″%5D
for url in L:
URL = url

URL = url

response = urllib.urlopen(URL)
html = response.read()
m = re.search(‘rel=”dialog” role=”button”>.*?people like this topic’, html, flags=re.DOTALL)

if m is not None:
result = m.group(0)
result = result[27:]
result = result[:-23]
URL=URL[31:]
print result

else:
m = re.search(‘<div class=”fsm fwn fcg”><div class=”fsm fwn fcg”>.*?likes’, html, flags=re.DOTALL)
if m is not None:
result = m.group(0)
result = result[50:]
result = result[:-6]
URL=URL[25:]
print result
else:
print ‘x’

Advertisements

About A Lewis

Please check out @alicenlewis

Posted on September 23, 2013, in Geek, Geek and Sundry, US and tagged , , , , , , , , , , , , , , , , . Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: