Skip to main content Link Menu Expand (external link) Document Search Copy Copied

A Decade Later: What Happened to That Deep Learning System at the National Archives

In 2014, I walked into the National Archives with a laptop and an idea that most people thought was premature at best, crazy at worst: use neural networks to automatically process America’s vast backlog of historical records.

TensorFlow didn’t exist. Neither did GPT. “Deep learning” was barely a buzzword outside academic circles. Most people still thought AI meant rule-based expert systems or chatbots that could barely understand basic commands.

But I built a working prototype anyway. A system that could ingest scanned documents, extract text through OCR, analyze content using neural networks to identify entities and topics, and make previously buried records searchable. It also used object recognition on NARA’s vast photographic holdings, automatically identifying people, objects, scenes, and activities in images that would otherwise require manual description. I demonstrated it to the Archivist of the United States and the executive leadership team.

Then I moved on.

Ten years later, we’re living in a world where AI can write poetry, generate photorealistic images, and hold nuanced conversations that pass for human. Large language models process and understand text with capabilities that would have seemed like science fiction in 2014. And I find myself thinking about the prototype gathering dust somewhere on a government server, wondering: what if?

What We Got Right

Looking back at my 2014-2015 work with the clarity of hindsight, several things stand out.

The core thesis was correct. Neural networks can absolutely extract meaningful information from historical documents at scale. This wasn’t obvious in 2014. Handwriting recognition was still unreliable, OCR struggled with degraded documents, and training data was scarce. But the fundamental bet that machine learning would transform archival processing has been vindicated.

Augmentation over replacement was the right frame. We never proposed replacing archivists. The goal was always to handle the tedious initial processing so humans could focus on contextual interpretation and quality review. This human-in-the-loop approach is now standard practice in AI deployment. NARA’s current AI initiatives use exactly this model.

The backlog problem hasn’t solved itself. In 2014, NARA held about 12 billion pages of textual records with only single-digit percentages digitized. Today? They hold 13.5 billion pages. Roughly 455 million are digitized, still only about 3% of holdings. At current rates, complete digitization would take over 100 years. The math hasn’t gotten better.

What Actually Happened at NARA

The organization didn’t adopt my prototype. That’s not surprising. Government technology adoption is slow, budgets were constrained, and a one-year fellowship doesn’t provide the sustained effort needed to move from proof-of-concept to production.

But the vision eventually arrived through other channels.

In April 2022, NARA released the 1950 Census with something unprecedented: an AI-generated name index available on day one. Using Amazon Textract, they extracted approximately 130 million handwritten names from 6.6 million population schedules. For comparison, the 1940 Census required manual indexing by Ancestry.com that took over nine months. The 2022 AI system completed the main indexing in nine days.

NARA now has a Chief AI Officer (Gulam Shakir, who’s also the CTO). They published an AI Strategic Framework in October 2024 with 11 active or planned AI use cases. They’re deploying AI for FOIA request processing, PII detection and redaction, semantic search, and automated metadata generation.

In November 2024, FamilySearch announced they’d completed AI extraction of all 2.3 million pages of Revolutionary War Pension Files, trained on 30,000 pages of human transcriptions from NARA’s Citizen Archivist program. The human-AI collaboration model I envisioned is now reality.

What We Missed

If there’s a tragedy in this story, it’s timing and training data.

When I built that prototype in 2014, large language models were years away. We were working with convolutional neural networks for image recognition and recurrent neural networks for sequence modeling. Powerful, but limited. The transformer architecture that powers GPT wouldn’t be published until 2017. BERT came in 2018. GPT-3 in 2020.

But here’s what haunts me: NARA has what every AI company desperately wants. Unique, high-quality training data at massive scale. Billions of pages of historical documents spanning centuries. 41 million photographs with human-written descriptions. Millions of human-generated transcriptions. Structured metadata created by professional archivists over decades.

If NARA had started systematically digitizing and processing records with machine learning in 2014, even at lower quality initially, they could have built training datasets that would be invaluable today. Every document scanned, every transcription contributed by citizen archivists, every correction made by a professional could have trained increasingly sophisticated models.

Instead, that data largely sat in boxes. The backlog grew. And when the AI revolution arrived, archives were playing catch-up rather than leading.

This is where companies like OpenAI, Anthropic, and Google should be paying attention. We’re at a point where training data has become so scarce that these companies are generating synthetic data to train their models. They’re running out of internet to scrape. Meanwhile, 13 billion pages of real, historical, human-generated text sits in boxes at the National Archives, never digitized, never processed, never available for training.

The incentives actually align here. AI companies need novel, high-quality training data. NARA needs resources to digitize and process their backlog. A partnership where frontier AI labs fund or provide technology for large-scale digitization, in exchange for access to the resulting datasets, could accelerate both missions. The archives get processed. The models get trained on genuine historical documents instead of synthetic approximations. The public gets access to their own history.

This isn’t charity. It’s mutual interest. And someone should be making this case loudly.

The Opportunity That Still Exists

The National Archives holds records that exist nowhere else on Earth. Cabinet meeting notes. Military service records. Immigration files. Presidential papers. Court documents spanning the entire history of the federal government. This is the primary source material that historians, genealogists, journalists, and citizens depend on to understand America.

With modern AI capabilities, we could:

Make every document searchable. Not just by archival description, but by actual content. Find every mention of a person, place, or topic across the entire holdings.

Generate draft descriptions automatically. Let AI create initial metadata that archivists review and refine, dramatically accelerating the processing pipeline.

Connect related records across collections. Use embeddings and semantic search to surface connections that no human would have time to identify manually.

Enable conversational research. Imagine asking a question in plain English and having an AI assistant search across millions of documents to synthesize an answer, with citations to primary sources.

Preserve at-risk materials. Prioritize digitization of deteriorating records using AI to assess condition and historical significance.

The technology exists today. What’s needed is investment, institutional will, and sustained execution.

What I’d Do Differently

If I could go back to 2014, I’d focus less on the technology and more on the ecosystem.

I’d spend more time building relationships with archivists, understanding their workflows deeply, and designing solutions that felt like tools rather than threats. I’d push harder for even a small production deployment, something that created real value and demonstrated the approach in practice, not just in demos.

I’d advocate loudly for NARA to start building training datasets systematically, even before the models existed to use them. The data is the moat. The algorithms can come later.

And I’d stay longer. Twelve months wasn’t enough to drive institutional change. Real transformation requires years of patient work, coalition building, and sustained advocacy. I moved on to other projects too quickly.

Ten years ago, I showed what was possible with crude neural networks and limited tools. Today, with large language models and cloud computing at scale, the possibilities are almost unlimited. The question isn’t whether AI will transform archives. It’s whether America’s National Archives will lead that transformation or follow it.


I served as a Presidential Innovation Fellow at the National Archives from September 2014 to September 2015. The prototype I built demonstrated automated document processing using deep learning techniques, a proof of concept that anticipated capabilities now being deployed at scale.

Read the original post: Deep Learning at the National Archives


Back to top

Copyright © 2025 David Naffis. All rights reserved.