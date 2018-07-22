





Monitor website contents modification with Python

Idea is to write a script to monitor websites for content changes. If the site content changes, then it sends a mail. The below small script is the beauty of python programming. It does everything for you and saves serious money a company spends to monitor it’s competitor websites. There are many ways to achieve it, however, It was having some different issues like last-modified time are not always possible on some of the sites or download the whole page and compare it every time.

Input file input.csv contains the list of all the sites to be monitored with the hash value. In case a site is modified then simply update the hash so that next time it should correlate with the old hash value. You can add this script in the cronjob to run in every 30 mins or as per your requirement. I am precarious sites change in 3o mins so I think it might be better to run it once in a day if it is not being used for financial data. You can create class and re-write below code for better performance. Any suggestions for the new script are gladly received for this Linux blog.

The script below is divided into three main sections to monitor website contents modification with Python:-

Read input.csv file for each site and update after reading hash value.

Generate hash value and compare it with old hash values.

Send mail to concern person if the site modifies.

#!/bin/python3.6 from urllib.request import urlopen import hashlib import urllib3 import csv,os #Procedure to get hash value def getHash(url): urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) http = urllib3.PoolManager() r = http.request('GET', url) the_page=r.data return hashlib.sha224(the_page).hexdigest() #Defination to send mail, if content changes. def send_email(user, pwd, recipient, subject, body): import smtplib FROM = user TO = recipient if isinstance(recipient, list) else [recipient] SUBJECT = subject TEXT = body # Prepare actual message message = """From: %s

To: %s

Subject: %s



%s """ % (FROM, ", ".join(TO), SUBJECT, TEXT) try: server = smtplib.SMTP("smtp.gmail.com", 587) server.ehlo() server.starttls() server.login(user, pwd) server.sendmail(FROM, TO, message) server.close() print ("successfully sent the mail") except StandardError as e: print ("failed to send mail",e) input_file="/opt/input.csv" #This is input file #example #http://yahoo.com,hash #http://linuxcursor.com,hash #Temporary create outfile to store hash values output_file="/opt/input.csv_" #Open readonly input_file input=open(input_file,"r") data=csv.reader(input) data=[row for row in data] input.close() #Open temporary file in write mode. out=open(output_file,"w") #Read line by line site name and its hash values. for i in data: _site=i[0] _existing_hash=i[1] print ("INFO Site name",_site,"Existing hash ",_existing_hash) _new_hash=getHash(_site) print("INFO new hash",_new_hash) if _existing_hash == _new_hash: print("INFO Nothing to do") out.write(_site+","+_existing_hash ) out.write("

") else: print("INFO Update csv file") out.write(_site + "," + _new_hash) out.write("

") print("INFO Site is updated Send mail") send_email("FROM@gmail.com", "PASSWORD", "To_EMAIL_ID", _site + " is updated "," Site contents have been changed. " + _site) out.close() from os import remove remove(input_file) #Move temporary output file to input.csv for next run os.rename(output_file, input_file)