archive.org – Is it possible to request to remove page snapshots from a personal social media profile from the Internet Archive’s Wayback Machine?

My understanding is that if it is your domain, and you add a robots.txt, then you can exclude it from the web archive.

And this unofficial page suggests that if you forgot to add a robots.txt, but manage to retroactively prove that you owned the domain at a given date, then they might also remove it: https://www.joshualowcock.com/tips-tricks/how-to-delete-your-site-from-the-internet-archive-wayback-machine-archive-org/ and this is also suggested at https://help.archive.org/hc/en-us/articles/360004651732-Using-The-Wayback-Machine:

How can I exclude or remove my site’s pages from the Wayback Machine?

You can send an email request for us to review to info@archive.org with the URL (web address) in the text of your message.

But what if you have a personal profile on a social media website such as Twitter, Facebook, Stack Exchange, etc. which has a robots.txt that allows archiving?

Do they remove archives of your profile upon request?

Why can't I get to every saved copy of a page on archive.org? It is said to have 3 snapshots, but they forward to other days

I am analyzing the history of COVID-19 data on the Polish government website, which only shows current data and no history of its own. So I use web.archive.org to display saved copies. For this purpose it is very important to have a separately saved copy every day and possibly even at different times. The numbers on Sunday evening are different from those on Sunday morning.

If I now go to web.archive.org and enter the URL, I get a list of the dates archive.org supposedly saved the page on. For example, on Sunday, April 19, 2020:

3 snapshots

I want to see and compare each of these stores. However, when I click on it, I am redirected as follows:

Why does that happen? Did archive.org save the page and then lose it? Or does this have something to do with server overload? If the archived page is not permanently lost, when would it be available again? (Note: I know that the page has actually changed and the numbers differ from the redirected times – not just identical copies are reduced.)

archive.org – Can a URL excluded by Wayback Machine be resumed by Internet Archive in the future?

I have copyrighted the content of my website and it has been removed from the Wayback computer. Now it says "this URL has been excluded from the wayback machine". My question is whether it can be resumed in the future if I post or sell new website content on this domain.
If so, does it also contain the previous content that I wanted to remove?

Is the source code that archive.org runs on the website open source and if so, where is the source code?

While thinking about how much I should trust https://archive.org, I started looking for the source code.

Despite the expectation that it was open source, I couldn't easily find the source location due to the organization's nonprofit open culture.

Is the source of the website open source, and if so, where is the source?

The best I could find was the source code for their crawler: http://crawler.archive.org/index.html, but I suspect this is only part of the project.

Archive.org downloader. Website downloader. CMS Converter

Archive.org downloader. Website downloader. CMS Converter

+ Answer to thread

  1. Archive.org downloader. Website downloader. CMS Converter

    https://en.archivarix.com/ Is an online downloader for Wayback machines,
    Website Downloader and CMS Converter. It works very easy, just type
    Website URL, download options, email and wait a little bit.
    Content can be downloaded and installed by the system in a ZIP file
    on your server. For content management we have developed a free open
    Source CMS – It is a small PHP file and does not require any
    Installation or database. More details about CMS can be found here –
    https://de.archivarix.com/cms/
    What is the purpose of this? First – to create your PBN with the one-of-a-kind
    Content found in web archive. You can set when parsing the site
    the parameters required for the normal use of content as a source
    Traffic and links. B. Delete all external links and clickable
    Remove contacts, remove counters, remove ads and analyzes, optimize
    HTML code and pictures. Thanks to the Archivarix CMS, this is easily possible
    Manage, find and replace the site with the WYSIWYG editor,
    insert your own TDS scripts. It is also possible to work with
    any other CMS, for example WordPress in the same domain.
    Second, the system can be used to convert websites that have been created in
    another CMS or in static HTML to Archivarix CMS. It is also possible
    to remove all external scripts, counters and displays (eg
    Example, if the site was on a free hosting).
    The first 200 downloaded files will be provided free of charge. The number
    Free downloads is not limited. It means that you can recover or download
    free any number of websites, but within 200 files. Next the
    Price depends on the number of downloaded files, more details here –
    https://de.archivarix.com/#show-prices-wbm


Tags for this thread

booking permissions

  • you not allowed post new threads
  • you not allowed post reply
  • you not allowed Post attachments
  • you not allowed Edit your posts

network
counter

,

Archive.org

Sometimes I crawl around to find places to start with. I found one that was mentioned left and right in the last few days and thought I would share it for the laughs.

http://web.archive.org/web/20070128021711/forum.freeforums.org/

What you have?

scrape archive.org

Hello
How do I use the scrapebox to scratch a site on archive.org?

Thank you

Hello friends
I need your help to achieve that

I mean, what do you want to scratch? The expired domain finder has an Archive.org downloader but no scraper

Can you specifically specify what you want to scratch? Examples would be helpful.

Hello, I want to scratch a content site

Thank you

Thanks, I will buy the expired domain finder

Hello
With this plugin can -I can download all articles from a website on archive.org

No, no articles will be downloaded. nothing in the Scrapbox downloads articles from archive.org

When you buy an expired domain, the archive.org downloader in the expired domain finder downloads the entire site. You can re-upload them and then use this site, but not just articles.

And while we're at it, I downloaded a site yesterday with Archiv.org Grabber. However, not everything that is available in web.archive.org for this domain has been downloaded.
I also try to get the rest. So I thought I would rename the download folder and let it work again by using a different date in the date fields of the snapshot.
Now only the homepage is downloaded, not all other pages. What can I do to get it to a specific missing page?

There is nothing you can do to get it to a specific page. You could send a line to the Scrapebox support and provide the specific link to the missing page along with other helpful specific data and possibly screenshots to see if they can compensate or not.

I've found a few cases where it just is not perfect. There are just too many options and things that can make users wrong with a site, which sometimes results in the downloader not being able to download any part of the site.

Thanks for your reply, @loopline, it is not so tragic, I will download these few pages manually and integrate them again. It is helpful to know that I am not always 😉

scrape archive.org

Hello
How do I use the scrapebox to scratch a page on archive.org?

Thank you

Hello friends
I need your help to achieve this

I mean, what do you want to scratch? The expired domain finder has an Archive.org downloader, but no scraper

Can you specifically specify what you want to scratch? Examples would be helpful.

Hello, I want to scratch a content site

Thank you

Thanks, I will buy the expired domain finder

Hello
With this plugin can -I can download all articles from a website on archive.org

No, no articles will be downloaded. nothing in the Scrapbox downloads articles from archive.org

When you buy an expired domain, the archive.org downloader in the expired domain finder downloads the entire site. You can re-upload them and then use this site, but not just articles.

And while we're at it, I downloaded a site yesterday with Archiv.org Grabber. However, not everything that is available in web.archive.org for this domain has been downloaded.
I also try to get the rest. So I thought I would rename the download folder and let it work again by using a different date in the fields of the snapshot date.
Now only the start page is downloaded, not all other pages. What can I do to get it to a specific missing page?

There is nothing you can do to get it to a specific page. You could include a line in the Scrapebox support and provide the specific link to the missing page along with other helpful specific data and possibly screenshots to see if they can compensate or not.

I've found a few cases where it just is not perfect. There are just too many options and things that can make users wrong with a site, which sometimes results in the downloader not being able to download any part of the site.

Thanks for your reply, @loopline, it is not so tragic, I will download these few pages manually and integrate them again. It is helpful to know that I am not always 😉

DreamProxies - Cheapest USA Elite Private Proxies 100 Private Proxies 200 Private Proxies 400 Private Proxies 1000 Private Proxies 2000 Private Proxies ExtraProxies.com - Buy Cheap Private Proxies Buy 50 Private Proxies Buy 100 Private Proxies Buy 200 Private Proxies Buy 500 Private Proxies Buy 1000 Private Proxies Buy 2000 Private Proxies ProxiesLive Proxies-free.com New Proxy Lists Every Day Proxies123