Hook
Active member
- Joined
- Jul 8, 2024
- Posts
- 185
- Reaction score
- 133
- Trophy points
- 44
I shared this on the old forum, so I thought I may aswel post it on this one also.
Replace the "file-here.txt" with your .txt file with the droplist. It'll then create a file called domains with 3 columns, use the filters in excel or A-Z column 3, and you'll get all the .co.uk domains together. It has a bug with .uk names which I am yet to fix as it causes no problems for my use.
Need python with nltk, csv and re - It'll import the words by default each time you run the script.
Replace the "file-here.txt" with your .txt file with the droplist. It'll then create a file called domains with 3 columns, use the filters in excel or A-Z column 3, and you'll get all the .co.uk domains together. It has a bug with .uk names which I am yet to fix as it causes no problems for my use.
Need python with nltk, csv and re - It'll import the words by default each time you run the script.
Python:
import re
import csv
import nltk
from nltk.corpus import words
# Ensure you have the words corpus
nltk.download('words')
# List of valid English words
english_words = set(words.words())
# Function to split domain into words
def split_into_words(domain):
# Split by lowercase-uppercase transition or hyphen
words = re.findall(r'[a-zA-Z]+', domain)
return words
# Function to check if a word is valid English
def is_valid_word(word):
return word.lower() in english_words
# Function to process domains
def process_domains(domains):
processed_domains = []
for domain in domains:
# Check for numbers and hyphens in domain
if re.search(r'[\d-]', domain):
continue
# Extract words from the domain
words = split_into_words(domain.split('.')[0])
# Filter out non-English words
valid_words = [word for word in words if is_valid_word(word)]
if valid_words:
suffix = '.'.join(domain.split('.')[-2:]) # Get last two parts as suffix
processed_domains.append((domain, ' '.join(valid_words), suffix))
return processed_domains
# Read domains from file
file_path = "file-here.txt" # Adjust the file path as necessary
with open(file_path, mode='r') as file:
domains = file.read().splitlines()
# Process domains
processed_domains = process_domains(domains)
# Write to CSV
with open('domains.csv', mode='w', newline='') as file:
writer = csv.writer(file)
writer.writerow(['url', 'word(s)', 'suffix']) # Header with three columns
writer.writerows(processed_domains)
print("CSV file 'domains.csv' created successfully.")