5 Python scripts for automating search engine marketing duties


Python is a strong programming language that has gained recognition within the search engine marketing business over the previous few years. 

With its comparatively easy syntax, environment friendly efficiency and abundance of libraries and frameworks, Python has revolutionized what number of SEOs strategy their work. 

Python presents a flexible toolset that may assist make the optimization course of quicker, extra correct and more practical. 

This text explores 5 Python scripts to assist enhance your search engine marketing efforts.

The best technique to get began with Python

For those who’re trying to dip your toes in Python programming, Google Colab is value contemplating. 

It’s a free, web-based platform that gives a handy playground for writing and working Python code with no need a fancy native setup. 

Primarily, it lets you entry Jupyter Notebooks inside your browser and offers a number of pre-installed libraries for knowledge science and machine studying. 

Plus, it’s constructed on prime of Google Drive, so you possibly can simply save and share your work with others.

To get began, observe these steps:

Allow file uploads

When you open Google Colab, you’ll first must allow the power to create a short lived file repository. It’s so simple as clicking the folder icon. 

This allows you to add momentary information after which obtain any outcomes information.

Add supply knowledge

A lot of our Python scripts require a supply file to work. To add a file, merely click on the add button.

File upload button

When you end the setup, you can begin testing the next Python scripts.

Script 1: Automate a redirect map

Creating redirect maps for big websites might be extremely time-consuming. Discovering methods to automate the method will help us save time and deal with different duties.

How this script works

This script focuses on analyzing the online content material to seek out carefully matching articles. 

  • First, it imports two TXT information of URLs: one is for the redirected web site (source_urls.txt), and the opposite for the positioning absorbing the redirected web site (target_urls.txt).
  • Then, we use the Python library Stunning Soup to create an internet scraper to get the primary physique content material on the web page. This script ignores header and footer content material.
  • After it’s crawled the content material on all pages, it makes use of the Python library Polyfuzz to match content material between URLs with a similarity share.
  • Lastly, it prints the ends in a CSV file, together with the similarity share. 

From right here, you possibly can manually overview any URLs with a low similarity share to seek out the following closest match.

Get the script

#import libraries
from bs4 import BeautifulSoup, SoupStrainer
from polyfuzz import PolyFuzz
import concurrent.futures
import csv
import pandas as pd
import requests
#import urls
with open("source_urls.txt", "r") as file:
    url_list_a = [line.strip() for line in file]
with open("target_urls.txt", "r") as file:
    url_list_b = [line.strip() for line in file]
#create a content material scraper through bs4
def get_content(url_argument):
    page_source = requests.get(url_argument).textual content
    strainer = SoupStrainer('p')
    soup = BeautifulSoup(page_source, 'lxml', parse_only=strainer)
    paragraph_list = [element.text for element in soup.find_all(strainer)]
    content material = " ".be part of(paragraph_list)
    return content material
#scrape the urls for content material
with concurrent.futures.ThreadPoolExecutor() as executor:
    content_list_a = checklist(executor.map(get_content, url_list_a))
    content_list_b = checklist(executor.map(get_content, url_list_b))
content_dictionary = dict(zip(url_list_b, content_list_b))
#get content material similarities through polyfuzz
mannequin = PolyFuzz("TF-IDF")
mannequin.match(content_list_a, content_list_b)
knowledge = mannequin.get_matches()
#map similarity knowledge again to urls
def get_key(argument):
    for key, worth in content_dictionary.objects():
        if argument == worth:
            return key
    return key
with concurrent.futures.ThreadPoolExecutor() as executor:
    consequence = checklist(executor.map(get_key, knowledge["To"]))
#create a dataframe for the ultimate outcomes
to_zip = checklist(zip(url_list_a, consequence, knowledge["Similarity"]))
df = pd.DataFrame(to_zip)
df.columns = ["From URL", "To URL", "% Identical"]
#export to a spreadsheet
with open("redirect_map.csv", "w", newline="") as file:
    columns = ["From URL", "To URL", "% Identical"]
    author = csv.author(file)
    for row in to_zip:

While meta descriptions are not a direct ranking factor, they help us improve our organic click-through rates. Leaving meta descriptions blank increases the chances that Google will create its own.

If your SEO audit shows a large number of URLs missing a meta description, it may be difficult to make time to write all of those by hand, especially for ecommerce websites. 

This script is aimed to help you save time by automating that process for you.

How the script works

Get the script

!pip install sumy
from sumy.parsers.html import HtmlParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words
from sumy.summarizers.lsa import LsaSummarizer
import csv
#1) imports a list of URLs from a txt file
with open('urls.txt') as f:
    urls = [line.strip() for line in f]
results = []
# 2) analyzes the content on each URL
for url in urls:
    parser = HtmlParser.from_url(url, Tokenizer("english"))
    stemmer = Stemmer("english")
    summarizer = LsaSummarizer(stemmer)
    summarizer.stop_words = get_stop_words("english")
    description = summarizer(parser.document, 3)
    description = " ".join([sentence._text for sentence in description])
    if len(description) > 155:
        description = description[:152] + '...'
        'url': url,
        'description': description
# 4) exports the results to a csv file
with open('results.csv', 'w', newline="") as f:
    writer = csv.DictWriter(f, fieldnames=['url','description'])

Script 3: Analyze keywords with N-grams

N-grams are not a new concept but are still useful for SEO. They can help us understand themes across large sets of keyword data.


How this script works

This script outputs results in a TXT file that breaks out the keywords into unigrams, bigrams, and trigrams. 

Get this script

#Import necessary libraries
import re
from collections import Counter
#Open the text file and read its contents into a list of words
with open('keywords.txt', 'r') as f:
    words = f.read().split()
#Use a regular expression to remove any non-alphabetic characters from the words
words = [re.sub(r'[^a-zA-Z]', '', word) for word in words]
#Initialize empty dictionaries for storing the unigrams, bigrams, and trigrams
unigrams = {}
bigrams = {}
trigrams = {}
#Iterate through the list of words and count the number of occurrences of each unigram, bigram, and trigram
for i in range(len(words)):
    # Unigrams
    if words[i] in unigrams:
        unigrams[words[i]] += 1
        unigrams[words[i]] = 1
    # Bigrams
    if i < len(words)-1:
        bigram = words[i] + ' ' + words[i+1]
        if bigram in bigrams:
            bigrams[bigram] += 1
            bigrams[bigram] = 1
    # Trigrams
    if i < len(words)-2:
        trigram = words[i] + ' ' + words[i+1] + ' ' + words[i+2]
        if trigram in trigrams:
            trigrams[trigram] += 1
            trigrams[trigram] = 1
# Sort the dictionaries by the number of occurrences
sorted_unigrams = sorted(unigrams.items(), key=lambda x: x[1], reverse=True)
sorted_bigrams = sorted(bigrams.items(), key=lambda x: x[1], reverse=True)
sorted_trigrams = sorted(trigrams.items(), key=lambda x: x[1], reverse=True)
# Write the results to a text file
with open('results.txt', 'w') as f:
    f.write("Most common unigrams:n")
    for unigram, count in sorted_unigrams[:10]:
        f.write(unigram + ": " + str(count) + "n")
    f.write("nMost common bigrams:n")
    for bigram, count in sorted_bigrams[:10]:
        f.write(bigram + ": " + str(count) + "n")
    f.write("nMost common trigrams:n")
    for trigram, count in sorted_trigrams[:10]:
        f.write(trigram + ": " + str(count) + "n")

Script 4: Group keywords into topic clusters

With new SEO projects, keyword research is always in the early stages. Sometimes we deal with thousands of keywords in a dataset, making grouping challenging. 

Python allows us to automatically cluster keywords into similar groups to identify trend trends and complete our keyword mapping. 

How this script works

Get this script

import csv
import numpy as np
from sklearn.cluster import AffinityPropagation
from sklearn.feature_extraction.text import TfidfVectorizer
# Read keywords from text file
with open("keywords.txt", "r") as f:
    keywords = f.read().splitlines()
# Create a Tf-idf representation of the keywords
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(keywords)
# Perform Affinity Propagation clustering
af = AffinityPropagation().fit(X)
cluster_centers_indices = af.cluster_centers_indices_
labels = af.labels_
# Get the number of clusters found
n_clusters = len(cluster_centers_indices)
# Write the clusters to a csv file
with open("clusters.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["Cluster", "Keyword"])
    for i in range(n_clusters):
        cluster_keywords = [keywords[j] for j in range(len(labels)) if labels[j] == i]
        if cluster_keywords:
            for keyword in cluster_keywords:
                writer.writerow([i, keyword])
            writer.writerow([i, ""])

Script 5: Match keyword list to a list of predefined topics

This is similar to the previous script, except this allows you to match a list of keywords to a predefined set of topics. 

This is great for large sets of keywords because it processes them in batches of 1,000 to prevent system crashes.

How this script works

Get this script

import pandas as pd
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
# Load the Spacy English language model
nlp = spacy.load("en_core_web_sm")
# Define the batch size for keyword analysis
# Load the keywords and topics files as Pandas dataframes
keywords_df = pd.read_csv("keywords.txt", header=None, names=["keyword"])
topics_df = pd.read_csv("topics.txt", header=None, names=["topic"])
# Define a function to categorize a keyword based on the closest related topic
def categorize_keyword(keyword):
    # Tokenize the keyword
    tokens = nlp(keyword.lower())
    # Remove stop words and punctuation
    tokens = [token.text for token in tokens if not token.is_stop and not token.is_punct]
    # Find the topic that has the most token overlaps with the keyword
    max_overlap = 0
    best_topic = "Other"
    for topic in topics_df["topic"]:
        topic_tokens = nlp(topic.lower())
        topic_tokens = [token.text for token in topic_tokens if not token.is_stop and not token.is_punct]
        overlap = len(set(tokens).intersection(set(topic_tokens)))
        if overlap > max_overlap:
            max_overlap = overlap
            best_topic = topic
    return best_topic
# Define a function to process a batch of keywords and return the results as a dataframe
def process_keyword_batch(keyword_batch):
    results = []
    for keyword in keyword_batch:
        category = categorize_keyword(keyword)
        results.append({"keyword": keyword, "category": category})
    return pd.DataFrame(results)
# Initialize an empty dataframe to hold the results
results_df = pd.DataFrame(columns=["keyword", "category"])
# Process the keywords in batches
for i in range(0, len(keywords_df), BATCH_SIZE):
    keyword_batch = keywords_df.iloc[i:i+BATCH_SIZE]["keyword"].tolist()
    batch_results_df = process_keyword_batch(keyword_batch)
    results_df = pd.concat([results_df, batch_results_df])
# Export the results to a CSV file
results_df.to_csv("results.csv", index=False)

Working with Python for SEO

Python is an incredibly powerful and versatile tool for SEO professionals. 

Whether you’re a beginner or a seasoned practitioner, the free scripts I’ve shared in this article offer a great starting point for exploring the possibilities of Python in SEO. 

With its intuitive syntax and vast array of libraries, Python can help you automate tedious tasks, analyze complex data, and gain new insights into your website’s performance. So why not give it a try?

Good luck, and happy coding!

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

Source link


Please enter your comment!
Please enter your name here