Think and Save the World

How Open-Source Knowledge Platforms Democratize Human Understanding

· 9 min read

The platforms, in more detail

Wikipedia (2001– )

The English Wikipedia has over 6.8 million articles. Across all 300+ language editions, over 62 million articles. Roughly 120,000 active editors worldwide contribute in any given month, with a much smaller core of about 4,000 highly active editors doing the bulk of quality control. The Wikimedia Foundation operates on roughly $180 million per year in donations — a rounding error compared to the global value created.

Quality: the Nature 2005 study by Jim Giles compared 42 science articles across Wikipedia and Encyclopedia Britannica, finding roughly 4 errors per Wikipedia article and 3 per Britannica article — comparable in a way that sent shockwaves through the reference-book industry. Follow-up studies (including a 2014 PLOS ONE study on medical articles, various IBM analyses) have found the quality gap is smaller on well-maintained topics and wider on obscure or politically contested ones. Wikipedia's coverage of current science is often more up-to-date than any print encyclopedia can be, by definition.

Systemic bias: well-documented. English Wikipedia skews white, male, American, and toward topics of interest to tech-adjacent men. Coverage of women, African history, Indigenous knowledge, and non-Western science is thinner. Edit-a-thons and targeted initiatives (Art+Feminism, AfroCROWD, Whose Knowledge?) have been working on this for over a decade. Progress is real but the problem is structural.

Khan Academy (2008– )

Salman Khan started posting math tutorials to YouTube in 2004 to help his cousin. The nonprofit Khan Academy was incorporated in 2008. Now serves over 140 million registered learners annually. Core offerings: K–12 math (highly regarded, used in formal schooling worldwide), sciences, humanities, economics, computing, test prep. Translated into over 50 languages.

The pedagogical model: short videos (6–12 minutes typically), mastery-based practice with immediate feedback, knowledge maps showing prerequisite chains. This model — break learning into small units, practice to mastery before moving on — is backed by research going back to Benjamin Bloom's 1984 "2 Sigma Problem" paper, which showed that one-on-one mastery-based tutoring produced outcomes two standard deviations above classroom instruction. Khan Academy is, in effect, an attempt to democratize that tutoring effect.

Limitations: video-based learning doesn't work equally well for all students. The platform is better for self-motivated, reasonably well-prepared learners than for students with severe literacy gaps. And the content is American in assumption.

MIT OpenCourseWare (2001– )

In 2001, MIT announced it would publish the materials for all its courses online, for free. This was at a time when elite universities were exploring how to sell online content for revenue. MIT chose the opposite direction and reasoned, essentially: our mission is education, the materials cost us almost nothing to put online, and restricting them is not in our interest.

As of the most recent count, over 2,500 MIT courses have materials published, covering nearly the entire MIT curriculum. Over 500 million learners have accessed OCW content since launch.

The Open Education Consortium now includes hundreds of universities publishing similar materials. Stanford Engineering Everywhere, Yale Open Courses, Harvard's extension school materials, UC Berkeley's webcast archive — a significant chunk of what was once available only to matriculated students is now available to anyone.

arXiv (1991– )

Paul Ginsparg, a physicist at Los Alamos, started arXiv in 1991 as an email server for sharing physics preprints. It moved to the web, then to Cornell. Now hosts over 2.3 million preprints across physics, math, computer science, quantitative biology, quantitative finance, statistics, electrical engineering, and economics.

The impact on physics is difficult to overstate. In many subfields, publishing to arXiv is the primary form of dissemination; the journal publication that follows is secondary, almost ceremonial. The consequence is that physics research moves at a pace roughly 6 months ahead of the journal-based publication cycle that dominates other fields. Researchers in fields without strong preprint cultures (biomedicine, much of social science) have repeatedly proposed arXiv-like platforms; bioRxiv (2013) and medRxiv (2019) are the biological sciences' equivalents.

Sci-Hub (2011– )

Alexandra Elbakyan, a neuroscience graduate student in Kazakhstan, built Sci-Hub to scrape papers from behind paywalls using donated institutional credentials and made them available for free. As of the most recent estimates, Sci-Hub hosts roughly 85 million papers and serves tens of millions of downloads per month.

The legal picture: Sci-Hub has lost multiple lawsuits (Elsevier v. Sci-Hub, 2017, awarded Elsevier $15 million; ACS v. Sci-Hub, 2017, awarded $4.8 million). Injunctions in the US, UK, France, Sweden, India, and elsewhere have attempted to block access. The site moves domains and mirrors when blocked. Millions of researchers, including at well-funded institutions, continue to use it.

The moral-empirical picture: a 2016 study in Science by John Bohannon found that Sci-Hub downloads came heavily from well-funded Western institutions — Harvard, Cambridge, Caltech — whose researchers preferred Sci-Hub to their own libraries because it was faster. Not because they couldn't afford access. Because the publisher-provided access was worse than the free pirate version. This is a devastating indictment of the industry.

The business model implications

Academic publishing as an industry makes roughly $28 billion per year. Elsevier, Springer Nature, Wiley, Taylor & Francis, and SAGE are the big five. Elsevier alone reports operating margins above 35%, higher than Google or Apple in most years.

The inputs to this industry are almost all free. Research is funded by national science agencies, foundations, and universities. Writing is done by researchers without payment from publishers. Peer review is done for free. Editorial boards are largely unpaid. Publishers provide typesetting, hosting, and a prestige signal. They charge libraries $5,000–$40,000 per journal per year for bundled subscriptions and individual readers $30–$60 per article.

This is not a natural market outcome. It's the legacy of a 20th-century system in which publishers controlled print distribution and research could not circulate without them. That bottleneck is now purely artificial — all the actual work is digital — but the contracts, the prestige hierarchies, and the tenure committees that reward publication in these journals keep the system alive.

Open access (OA) publishing — where authors or their funders pay a fee to make the article free to read — was supposed to solve this. In practice, OA has largely been captured by the same publishers, who now charge authors $2,000–$11,000 per article to publish OA. The money has moved from libraries to grants, but the extraction continues.

Plan S, a 2018 initiative by a coalition of European funders, requires all grant-funded research to be published OA. The Nelson Memo, a 2022 US Office of Science and Technology Policy directive, requires federally funded research to be freely available. These are structural reforms. Whether they actually break the publishers' pricing power, or merely move the payment point, remains to be seen.

The gatekeeping is an ethical scandal

I used that phrase in the distilled version. Let me defend it in detail.

The argument goes like this:

1. Most scientific research is funded by public money — taxes, public-university endowments, government grants. 2. The researchers who produce it are paid by public or philanthropic funds to produce it. 3. The peer reviewers who validate it are compensated by their universities, not the publisher. 4. The resulting knowledge is a public good by any reasonable definition — non-rivalrous (my reading doesn't diminish your reading), non-excludable in principle (digital copies are free to make). 5. Publishers insert themselves as a tollbooth and extract rents that prevent much of the world — and much of the funding public — from accessing the very research they paid for.

When a researcher in Nigeria cannot access a paper on tropical disease that was written by a researcher in Nigeria, published in a Western journal, and based on fieldwork in Nigeria, funded by a Western donor interested in development — what is the ethical description of that arrangement?

The polite word is "market failure." The accurate word is "scandal."

What a fully open commons would unlock

Consider a thought experiment. Every scientific paper is free. Every textbook at university level is free. Every educational course is free at the audit tier. Every reference work is free. Translation layers are mature enough that the best material in any language is available in every major language within days.

What happens?

- Research acceleration: the average lag between publication and citation drops. Fields without strong preprint cultures (biomedicine, social science) catch up to physics. Replication improves because the papers being replicated are actually readable by people trying to replicate them.

- Developing-world research capacity: the gap between a researcher at Harvard and a researcher at a West African university narrows. Not closes — equipment, networks, and time still matter — but narrows. The composition of who produces knowledge shifts.

- Educational redistribution: the ceiling on what a motivated, unresourced student can learn moves up sharply. The amount of untapped talent is enormous, and a significant fraction of that talent currently exits the pipeline because it can't access the inputs.

- Public understanding of science improves: journalists, policymakers, and citizens can actually read the evidence behind claims they are asked to trust. This is especially important for climate, health, and technology policy.

- Cross-disciplinary synthesis: more people can read outside their specialty, which is where most interesting ideas come from. The current paywall system punishes people who want to read broadly.

The economic value of this is almost impossible to estimate but is certainly in the hundreds of billions annually, probably trillions. The cost of maintaining the paywalls is the $28 billion/year revenue of the academic publishing industry. The ratio is obscene.

The ongoing battles

Plan S and the OA mandate: European funders (cOAlition S) require OA publication. Publishers have adapted by creating "transformative agreements" that shift costs from subscription to article-processing charges (APCs). Whether this is real reform or financial repainting is contested.

Sci-Hub legal attacks: ongoing. Elsevier sued in India in 2021, and in a case that drew global attention, the Delhi High Court has been weighing arguments about whether Sci-Hub access serves a fundamental right to education and research.

Wikipedia funding and neutrality fights: occasional political pressure on the Wikimedia Foundation to shape content; sustained harassment of editors working on politically sensitive topics. The WMF has resisted pressure well, but the threat surface is growing.

Open textbook movements: OpenStax (Rice University), Open Textbook Library (University of Minnesota), and others are creating free alternatives to the $200 college textbook. Adoption is increasing but publisher lobbying keeps it slow.

AI training data and the knowledge commons: the newest front. LLMs trained on freely available text can now generate high-quality synthesis of that text. This threatens the business model of some knowledge producers but also expands access dramatically. The legal and ethical questions are unsettled. Whether AI systems will end up enclosed or commons-based is one of the defining fights of this decade.

Exercise: test your access

1. Pick a scientific topic you care about. Try to find three peer-reviewed papers on it. How many can you actually read without a paywall? 2. Check your country's Wikipedia coverage of a topic you know well. How does it compare to the English version? 3. Find an MIT OCW course in a subject you've always wanted to learn. Spend thirty minutes in it. What surprises you about the quality? 4. If you're in the US, check whether your local public library provides free access to academic journal databases. (Many do. Almost nobody knows.)

These four actions will reorient your relationship to knowledge more than any essay can.

Further reading

- Peter Suber, Open Access (MIT Press, 2012) — the foundational explainer, itself free online. - John Willinsky, The Access Principle (MIT Press, 2006) — the ethical argument for OA. - Jonathan Cohn, The Burden of Academic Publishing — journalism on Sci-Hub and the battles. - Joseph Reagle, Good Faith Collaboration — the cultural history of Wikipedia. - Aaron Swartz's writings — the hacker-activist who died prosecuting the paywall fight; his work is essential context.

What this means for Law 1

If Law 1 — we are human — is the foundational ethical commitment of this manual, then access to human knowledge is one of its concrete operationalizations. A commitment to shared humanity that doesn't include access to what humans have learned is thin.

The open-source knowledge commons is proof that the infrastructure for Law 1 can be built. Wikipedia exists. arXiv works. Khan Academy teaches. Sci-Hub serves researchers the system won't. These are working examples of "the yes" — what happens when enough people say knowledge is for everyone.

The incomplete build is political. The existing platforms can be expanded, funded, translated, integrated into schools, and protected from legal attack. The scandal of paywalls can be closed. None of this requires new technology. All of it requires political will.

You don't have to wait. Donate to Wikipedia. Use Khan Academy with a child in your life. Read a paper on arXiv. Cite open-access sources when you can. When you see a paywalled paper and a preprint version both exist, link the preprint. Small acts of commons-building add up.

This is what saying yes looks like in knowledge form.

Cite this:

Comments

·

Sign in to join the conversation.

Be the first to share how this landed.