Why Now Is the Moment to Back Up the Web

We must become undisciplined. The work we do requires new modes and methods of research and teaching; new ways of entering and leaving the archives of slavery.

Christina Sharpe, In The Wake

Over the past year, two intersecting crises have laid bare a hard truth: the Internet doesn’t always remember. First, grassroots archivists and volunteers scrambled to rescue tens of terabytes of U.S. government climate, health, and societal data slated for removal from official websites. Then, information preservation advocates sounded the alarm that swaths of scientific research, built on taxpayer-funded studies, could vanish with a single administrative whim.

In an age when even government data can vanish overnight, there’s never been a more urgent moment to support information-preservation efforts and to start backing up the web ourselves. Open access to reliable knowledge is the lifeblood of a healthy society, yet political pressures today threaten to lock down everything from climate records to classroom resources.

You’ll often hear “back up the web” uttered in prepper circles—people who build emergency plans and stockpile supplies in case society faces sudden collapse—but a recent video from EposVox (watch here) made me realize this isn’t just a survivalist hobby. It’s a professional imperative for anyone who teaches, researches, or learns online.

(Prepper: someone who prepares for a wide range of crises, natural disasters, economic breakdowns, and even geopolitical upheaval by planning and hoarding resources in advance.)

As researchers and educators, we’re more than passive consumers of digital content. We’re active contributors and stewards of knowledge. Suppose we want tomorrow’s learners, policymakers, and historians to find, trust, and build upon our work. In that case, we must act today to ensure our content survives and model best practices for our communities.

That urgency drives us to safeguard the web’s archives and rethink how we share knowledge.

The Fragility of Online Knowledge

Shifting Priorities, Shifting Content
Governments change administrations; sites are redesigned; “irrelevant” pages are pruned. What was official today may be offline tomorrow, along with critical datasets on climate trends, public health, or educational resources.
Platform Ephemerality
Social media posts, course-management forums, and even personal websites can disappear or throttle reach without warning. A tweet thread on your latest pedagogical insight? Gone when the platform policy changes.
The Myth of “Perpetual Access”
We used to joke that “the Internet never forgets.” Yet countless research articles, open-access journals, and educational videos have been lost to dead links and unavailable archives.
We Cannot Rely on Any Single Platform for Preservation
Many of us instinctively post our papers, slide decks, and datasets to Academia.edu, ResearchGate, or our professional-society web pages. But remember, terms change, sites shutter. Commercial or membership-based platforms can alter their business model or go offline, taking your content or selling it to others.

Why Researchers & Educators Must Lead the Charge

Intellectual Legacy
Your scholarship and classroom materials represent years of inquiry and collaboration. Losing them isn’t just an inconvenience, it’s a brake on collective progress.
Equity & Accessibility
Offline or pay-walled resources deepen the digital divide. Hosting mirrors or providing archived snapshots ensures learners everywhere, from well-resourced universities to remote communities, can access your work.
Reproducibility & Transparency
Science and pedagogy stand on the shoulders of data. When data disappears, so does our ability to reproduce experiments, verify claims, or adapt lessons to new contexts.

Rethinking “How, Where, When, and Why” You Publish

Aspect	Questions to Ask Yourself
How	• Am I providing machine-readable metadata alongside my work? • Have I version-controlled my drafts and final outputs?
Where	• Is this hosted on a single proprietary platform (Academia.edu, ResearchGate)? • Do I trust professional organizations and journals to keep my materials freely available online? • Do I have mirrored copies (institutional repository, GitHub, personal server)?
When	• Do I update archives when I revise or retract content? • Have I timestamped my releases with a clear version history?
Why	• Am I prioritizing broad access over platform convenience? • How does this posting serve the public good and future scholarship?

Actionable Mindsets, Steps, Tools & Communities

Mindsets to Adopt

Redundancy Is Resilience. Treat every important file as if it will be lost tomorrow.
Openness by Default. Share not just the polished article, but also raw data, code, and annotations.
Community-Centric Preservation. View your work as part of a larger digital commons—help build and defend it.
Collaboration Over Secrecy. It’s natural to feel your ideas are “too good” to share before they’re polished, but hoarding them often slows progress. Early, thoughtful sharing, drafts, sketches, even half-baked questions, invite feedback that strengthens your work and multiplies its impact.

Practical Steps

Keep Track of Every Draft
Think of your work like a photo album of versions, so you can always flip back in time. Treating every save as a way-back snapshot, you’ll never lose an important idea, and you can always trace how your work evolved.
- Google Docs/Drive: Every time you edit, Google automatically saves a “Version History.” Go to File → Version history → See version history, then click the three dots beside any entry to Name this version. Later, you can restore or download any snapshot.
- Microsoft OneDrive/Office 365: Similar story—open your document in Word or PowerPoint online, then File → Info → Version History to view and restore past edits.
- Dropbox: Right-click a file in your Dropbox folder, choose Version history, and you’ll see every saved change.
- Simple File Naming: If you prefer your desktop, create a folder called MyPaper/ and save files like MyPaper_v1.docx, MyPaper_v2.docx, etc. It’s manual but foolproof.
Capture the “Behind the Scenes” Too
The whole scholarly journey doesn’t end at the published PDF. This transparency not only bolsters reproducibility, but it lets future learners see the real work behind the finished article.
- Drafts & notes. Upload annotated manuscripts, early outlines, and meeting notes to your repo.
- Raw data & code. Link to datasets (e.g., CSV, JSON) and analysis scripts in the same versioned project. Unsure if your data is “sensitive?” Check out Purdue University Library’s guide on sensitive data, which can help you identify it and all applicable laws and regulations.
- Idea logs. Keep a simple markdown diary of “aha!” moments or serendipitous finds. You’ll thank yourself later when combing back through your research arc.
Archive on Publication.
When you hit “publish,” immediately deposit a copy in:
- Your institution’s repository (or an open archive like Zenodo, or the Open Science Framework)
- A personal backup (e.g., an encrypted NAS or cloud bucket)
- The Internet Archive’s Wayback Machine (via browser extension)
Embed Rich Metadata.
Include clear titles, authors, dates, and licenses in your file headers or HTML <meta> tags—so future archivists can catalog your work accurately.

Essential Tools

Kiwix: Offline reader for Wikipedia, Stack Exchange, TED Talks, and more.
Wayback Machine Browser Extension: Auto-captures snapshots of every page you visit. This exists for (& from) the Internet Archive!
GovDiff: Jerome Paulos created the tool to show side-by-side changes on government websites.
GitHub + GitHub Pages: Free hosting for code, data, and static sites—complete with version history.

Communities to Join

ArchiveTeam: Distributed collective of rogue archivists, programmers, writers, and loudmouths dedicated to saving digital heritage. In practice, they attempt to rescue data from websites about to disappear.
- You can help by downloading and running an ArchiveTeam Warrior app on your computer, as a Virtual Machine or a Docker container. This will fetch and upload data from endangered websites into an archive.
Public Environmental Data Partners: Committed to preserving and providing public access to federal environmental data.
Safeguarding Research & Culture (SRC): Creating an alternative infrastructure for archiving and disseminating cultural heritage and scientific knowledge.
Data Rescue Project: A grassroots effort to gather, catalog, and provide sustained resource access. Pay attention to the tracker here.
Scholarly Publishing and Academic Resources Coalition (SPARC): Advocacy and best practices for open scholarship.
Digital Preservation Coalition (DPC): Training, guidelines, and a global network focused on preserving digital heritage. Their handbook identifies best practices in creating, managing, and preserving digital materials.

Addressing The Elephant in the Room: AI Scraping Your Content

Large-language models (LLMs) and data-harvesting bots are already mining open repositories, preprint servers, and even academic slides for training data. That creates both risks and opportunities:

A risk is the loss of control. Once your text and figures are absorbed into an AI’s training set, you can’t untrain them. Your writing may be regurgitated, accurately or not, without attribution.
An opportunity is a greater reach. AI tools that index open repositories can surface your work more broadly, potentially leading to new collaborations.

How to navigate this landscape:

Choose an explicit license. Use a Creative Commons license (e.g., CC BY 4.0) so readers and AI developers know precisely how your work may (and may not) be reused.
Embed machine-readable metadata. Add metadata and tags to your content so crawlers pick up the correct authorship, date, and license info. Read more here.
Monitor and engage. Please search for your own work in AI-powered discovery tools (Semantic Scholar, Elicit) and verify that it’s represented accurately. Contact the platform or self-host a corrected version if you find misattributions or hallucinations.
Advocate for ethical AI. Join communities like the Data & Society AI Ethics Lab or SPARC’s open-access advocacy to push for transparent, respectful model-training practices.

“We’re not just writing for today’s clicks, we’re preserving tomorrow’s discoveries.”

By embracing a culture of redundancy, openness, and community engagement, we can ensure that the web remains a reliable, enduring home for research and teaching. Let’s start today, because the history we save now will be tomorrow’s foundation.