I want to use Scrapebox to remove all domain name mentions from a list that contains barely 4,000 website URLs.
The domain names are formatted on the pages as follows:
The domain names are plain text. They are not hyperlinks.
If it helps, they are always in between
I already have my list of almost 4000 URLs I want to scan.
I use 5 private proxies that have been tested and saved.
I think they are used when using the Custom Data Grabber, but frankly I have problems with Scrapebox.
I created rules for incoming and outgoing messages for Scrapebox in the Windows Firewall.
I can do other things that work with Scrapebox. For example internal links on the domain from which I get the URLs.
I created a custom data grabber module and a module mask underneath:
I tried several regex examples and found the following:
I tested it with the tool at https://regex101.com/ and 3 sample URLs are shown as matches (as far as I can tell?):
However, when I start my module, I only get the following:
The module data folder contains a CSV file with two odd characters in the first cell each time the module is executed:
I have accessed several of the URLs via browseo.net and the domain names on these URLs are readable according to this tool.
Does anyone know where I'm wrong here?
Or is there a better way to remove MENTIONS domain names from a list of URLs?
Thank you in advance!