‘Internet archiving’ tag
- See Also
-
Gwern
- “Design Graveyard”, Gwern 2010
- “Research Bounties On Fulltexts”, Gwern 2018
- “Internet Search Tips”, Gwern 2018
- “Internet Search Case Studies”, Gwern 2019
- “Design Of This Website”, Gwern 2010
- “Archiving URLs”, Gwern 2011
-
“The
sort –key
Trick”, Gwern 2014 - “Darknet Market Archives (2013–2015)”, Gwern 2013
- “Predicting Google Closures”, Gwern 2013
- “Easy Cryptographic Timestamping of Files”, Gwern 2015
- “Writing a Wikipedia Link Archive Bot”, Gwern 2008
- “Archiving GitHub”, Gwern 2011
- “Writing a Wikipedia RSS Link Archive Bot”, Gwern 2009
- “Resilient Haskell Software”, Gwern 2008
-
Links
- “How Do Archivists Package Things? The Battle of the Boxes”
- “HUGE Google Search Document Leak Reveals Inner Workings of Ranking Algorithm: The Documents Reveal How Google Search Is Using, or Has Used, Clicks, Links, Content, Entities, Chrome Data and More for Ranking.”, Goodwin 2024
- “Insights from a Laboratory Fire”, Jones et al 2023
- “Introducing A Dark Web Archival Framework”, Brunelle et al 2021
- “Gscan2pdf: A GUI to Produce PDFs from Scanned Documents”, Ratcliffe 2019
- “When Nothing Ever Goes Out of Print: Maintaining Backlist Ebooks”, Elsey 2016
- “Memory and the Construction of Scientific Meaning: Michael Faraday’s Use of Notebooks and Records”, Tweney & Ayala 2015
- “Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot”, Klein et al 2014
- “Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations”, Zittrain & Albert 2013
- “The Prevalence and Inaccessibility of Internet References in the Biomedical Literature at the Time of Publication”, Aronsky et al 2007
- “More Product, Less Process: Revamping Traditional Archival Processing”, Greene & Meissner 2005
- “How Large Is the World Wide Web?”, Dobra & Fienberg 2004
- “The Little Engines That Could: Modeling the Performance of World Wide Web Search Engines”, Bradlow & Schmittlein 2000
- Unforgotten Dreams: Poems by the Zen Monk Shōtetsu, Shōtetsu & Carter 1997
- “Space Jam Homepage”
- “Faraday’s Notebooks: the Active Organization of Creative Science”, Tweney 1991
- “The Other Pínakes and Reference Works of Callimachus”, Witty 1973
- “The Pínakes of Callimachus”, Witty 1958
- “How Archives Can Make—Or Break—A Philosopher’s Reputation”
- “The Backrooms of the Internet Archive”
- “The Original WWW Proposal Is a Word for Macintosh 4.0 File from 1990, Can We Open It?”
- “SingleFile”, Lormeau 2024
- “A Lunar Library”
- “2024 Guide on Removing DRM from Kobo & Kindle Ebooks”
- “Internet Archive Hacked, Data Breach Impacts 31 Million Users”
- “The Forgotten Pixel Art Masterpieces of the PlayStation 1 Era by Richmond Lee”
- “To Preserve Their Work—And Drafts of History—Journalists Take Archiving into Their Own Hands”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography
See Also
Gwern
“Design Graveyard”, Gwern 2010
“Research Bounties On Fulltexts”, Gwern 2018
“Internet Search Tips”, Gwern 2018
“Internet Search Case Studies”, Gwern 2019
“Design Of This Website”, Gwern 2010
“Archiving URLs”, Gwern 2011
“The sort –key
Trick”, Gwern 2014
“Darknet Market Archives (2013–2015)”, Gwern 2013
“Predicting Google Closures”, Gwern 2013
“Easy Cryptographic Timestamping of Files”, Gwern 2015
“Writing a Wikipedia Link Archive Bot”, Gwern 2008
“Archiving GitHub”, Gwern 2011
“Writing a Wikipedia RSS Link Archive Bot”, Gwern 2009
“Resilient Haskell Software”, Gwern 2008
Links
“How Do Archivists Package Things? The Battle of the Boxes”
“HUGE Google Search Document Leak Reveals Inner Workings of Ranking Algorithm: The Documents Reveal How Google Search Is Using, or Has Used, Clicks, Links, Content, Entities, Chrome Data and More for Ranking.”, Goodwin 2024
“Insights from a Laboratory Fire”, Jones et al 2023
“Introducing A Dark Web Archival Framework”, Brunelle et al 2021
“Gscan2pdf: A GUI to Produce PDFs from Scanned Documents”, Ratcliffe 2019
“When Nothing Ever Goes Out of Print: Maintaining Backlist Ebooks”, Elsey 2016
When Nothing Ever Goes Out of Print: Maintaining Backlist Ebooks
“Memory and the Construction of Scientific Meaning: Michael Faraday’s Use of Notebooks and Records”, Tweney & Ayala 2015
Memory and the construction of scientific meaning: Michael Faraday’s use of notebooks and records
“Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot”, Klein et al 2014
Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot
“Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations”, Zittrain & Albert 2013
Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations
“The Prevalence and Inaccessibility of Internet References in the Biomedical Literature at the Time of Publication”, Aronsky et al 2007
“More Product, Less Process: Revamping Traditional Archival Processing”, Greene & Meissner 2005
More Product, Less Process: Revamping Traditional Archival Processing
“How Large Is the World Wide Web?”, Dobra & Fienberg 2004
“The Little Engines That Could: Modeling the Performance of World Wide Web Search Engines”, Bradlow & Schmittlein 2000
The Little Engines That Could: Modeling the Performance of World Wide Web Search Engines
Unforgotten Dreams: Poems by the Zen Monk Shōtetsu, Shōtetsu & Carter 1997
“Space Jam Homepage”
“Faraday’s Notebooks: the Active Organization of Creative Science”, Tweney 1991
Faraday’s notebooks: the active organization of creative science
“The Other Pínakes and Reference Works of Callimachus”, Witty 1973
“The Pínakes of Callimachus”, Witty 1958
“How Archives Can Make—Or Break—A Philosopher’s Reputation”
“The Backrooms of the Internet Archive”
The Backrooms of the Internet Archive:
View External Link:
https://blog.archive.org/2024/06/01/the-backrooms-of-the-internet-archive/
“The Original WWW Proposal Is a Word for Macintosh 4.0 File from 1990, Can We Open It?”
The original WWW proposal is a Word for Macintosh 4.0 file from 1990, can we open it?:
“SingleFile”, Lormeau 2024
“A Lunar Library”
“2024 Guide on Removing DRM from Kobo & Kindle Ebooks”
“Internet Archive Hacked, Data Breach Impacts 31 Million Users”
Internet Archive hacked, data breach impacts 31 million users
“The Forgotten Pixel Art Masterpieces of the PlayStation 1 Era by Richmond Lee”
The Forgotten Pixel Art Masterpieces of the PlayStation 1 Era by Richmond Lee
“To Preserve Their Work—And Drafts of History—Journalists Take Archiving into Their Own Hands”
To preserve their work—and drafts of history—journalists take archiving into their own hands:
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
reference-rot
web-archiving
cataloging
Wikipedia
Miscellaneous
-
/doc/cs/linkrot/archiving/2020-03-03-meganwarnock-picardfacepalmcartoon.jpg
: -
/doc/cs/linkrot/archiving/2019-gwern-internetarchive-domainsearch-screenshot.png
: -
/doc/cs/linkrot/archiving/2006-alperin-webcitetechnicalguide.pdf
: -
/doc/cs/linkrot/archiving/gwern-googlescholar-search-highlightfulltextlink-thumbnail.jpg
: -
/doc/cs/linkrot/archiving/gildaslormeau-singlefile-archivingtutorialanimation.mp4
: -
https://annamancini.substack.com/p/how-the-apple-archive-ended-up-at
-
https://blog.gingerbeardman.com/2023/05/24/ordering-photocopies-from-japans-national-library/
-
https://github.com/Kneesnap/onstream-data-recovery/blob/main/info/INTRO.MD
-
https://michaelnielsen.org/ddi/how-to-crawl-a-quarter-billion-webpages-in-40-hours/
: -
https://placesjournal.org/article/the-filing-cabinet-and-20th-century-information-infrastructure/
-
https://www.atlasobscura.com/articles/bbc-missing-horror-show
-
https://www.historytoday.com/archive/missing-pieces/lost-movies
-
https://www.johndcook.com/blog/2024/03/03/archiving-data-on-paper/
:View External Link:
https://www.johndcook.com/blog/2024/03/03/archiving-data-on-paper/
Bibliography
-
https://github.com/gildas-lormeau/SingleFile/
: “SingleFile”,