ArchiveTeam has been archiving Reddit posts for a while now, but we are running out of time. So far, we have archived 10.81 billion links, with 150 million to go.
Recent news of the Reddit API cost changes will force many of the top 3rd party Reddit apps to shut down. This will not only affect how people use Reddit, but it will also cause issues with many subreddit moderation bots which rely on the API to function. Many subreddits have agreed to shut down for 48 hours on June 12th, while others will be gone indefinitely unless this issue is resolved. We are archiving Reddit posts so that in the event that the API cost change is never addressed, we can still access posts from those closed subreddits.
- In VirtualBox, click File > Import Appliance and open the file.
- Start the virtual machine. It will fetch the latest updates and will eventually tell you to start your web browser.
Once you’ve started your warrior:
- Go to http://localhost:8001/ and check the Settings page.
- Choose a username — we’ll show your progress on the leaderboard.
- Go to the “All projects” tab and select ArchiveTeam’s Choice to let your warrior work on the most urgent project. (This will be Reddit).
Alternative Method: Docker
When setting up the project container, it will ask you to enter this command:
docker run -d --name archiveteam --label=com.centurylinklabs.watchtower.enable=true --restart=unless-stopped [image address] --concurrent 1 [username]
Make sure to replace the [image address] with the Reddit project address (removing brackets):
Also change the [username] to whatever you’d like, no need to register for anything.
More information about running this project:
There are a lot more items that are waiting to be queued into the tracker (approximately 758 million), so 150 million is not an accurate number. This is due to Redis limitations – the tracker is a Ruby and Redis monolith that serves multiple projects with around hundreds of millions of items. You can see all the Reddit items here.
The maximum concurrency that you can run is 10 per IP (this is stated in the IRC channel topic). 5 works better for datacenter IPs.
IMPORTANT: Do NOT modify scripts or the Warrior client!
Edit: Added more project info given by u/signalhunter.
Edit 2: If you want check how much you’ve contributed, go to the project tracker website, press “show all” and type ctrl/cmd – F (find in page on mobile), and search your username. It should show you the number of items and the size of data that you’ve archived.