Mini-contest: Archives, history & the Web–all your favorite things!

This comes out of a conversation on Twitter this morning in which Mark Matienzo (@anarchivist) said that as far as he knows the Oregon State Archives was the first U.S. archives on the Web. This is demonstrated by this article from the Spring of 1994.

So, here’s your challenge–was there a U.S. archives on Web before the Oregon State Archives?

And, of course, we’d love to hear from our readers about the first non-US archives on the Web. Earlier than 1994?

And, just out of curiosity, do you know when your archives got its first web page? Do you have any evidence of what it looked like?

How did the Web change archives?

Here’s a question you might have some fun with–how did the Web change archives? I’m thinking about this as I write a chapter for a book, and I thought it would be interesting to see what the cumulative wisdom of the smartest blog readers on the planet might add. (And the best looking.)

I’m thinking specifically about how the traditional web (Web 1.0) changed how archives function and how archivists do their jobs. My initial thoughts are that there were two main areas of change:

  • Changes in user expectations
  • New types of users
  • This should also probably be considered in the context of the overall change in the way the web affected conducting research. What about standardization of the way finding aids are created–and our friend, EAD–can that be attributed to the Web?

    Anyone have any thoughts about this? I haven’t yet done a literature search, so anyone who wants to suggest some citations is welcome to. I’d be surprised if this topic hasn’t attracted some scholarly attention.

    “This is not something society grows out of. It’s something society grows into.”

    I’ve had something filed away to write about it, and this seems like an appropriate time. I haven’t read Clay Shirky’s book Here Comes Everybody: The Power of Organizing Without Organizations, but I think I may just pick it up. Because I recently watched a video of the talk he gave at the 2008 Web 2.0 Expo, and I found it pretty, well, inspirational. Go and watch it here–it’s 16 minutes long. A senior blog adviser rolled his eyes when I told him how long it was, but after watching it he admitted that the time doesn’t drag. So, go, get yourself a cup of your beverage of choice and watch it.

    You don’t need to be told that I’m on board with Shirky’s message: that people increasingly want to (in fact, expect to) be able to interact with, share, and manipulate the information products in the world around them. I think we’ve seen evidence within our own profession of the potential for harnessing the “cognitive surplus” in the responsiveness of the Flickr community to supplying information for a small subset of the Library of Congress’s photo collections. Our collections are ideal raw material for the energy of people who choose to “do something” rather than just sit on their couches and passively absorb traditional media. If you’re reading this blog, Shirky and I are probably preaching to the choir. But you might want to pass this link along to any colleagues who you’ve ever heard say, “Where do you find the time?”

    But I think Shirky’s observations deserve to be considered in other contexts as well–let’s take professional organizations, for example. Over the last two weeks we had a very animated discussion of the issues related to SAA dues and the process for voting for dues increases. Your comments and votes on the poll clearly showed that just because you don’t go to all the meetings doesn’t mean you want to be passive consumers of SAA decisions. You, the blog-reading public, voted overwhelmingly against the SAA status quo: 94% of you voted in support of either mail-in or electronic voting (or both) for dues increases.

    (I think it’s only appropriate here to voice my sincere appreciation to Peter Hirtle for participating in this dialogue. He might not have expected his remarks to be catapulted to Internet fame, but when they were, he considered all your comments and revised his opinion, giving support to the proposed Constitutional changes. I am not aware of many members of our profession of Peter’s status who are participating here on the blogs and other online forums. Where are the other senior members of our profession, and why are they not contributing to the online discussion? A diversity of voices enriches the conversation. I wish more archivists with Peter’s depth of experiences realized that blogs like this one don’t represent a passing fad. This is the future of professional discourse.)

    People want to participate; they want to be able to engage and have their voices heard–whether in their professional societies or their political process. If an organization (or a whole profession) doesn’t “have a mouse” (to use Shirky’s analogy), then it will not flourish in this century.

    This week’s poll question comes down to earth a bit after all that lofty talk. Building on Clay Shirky’s discussion of Wikipedia, this week’s poll asks how much you’ve used and participated in Wikipedia. So, answer the poll question, and use some of your own cognitive surplus to absorb Shirky’s talk and think about how it might apply to your own organization, your profession, and if you want to think big, this coming election.

    NARA posts background information on web harvest decison

    The ArchivesNext blog was honored this morning by a comment from Paul Wester, Director of NARA’s Modern Records Program (the office responsible for the web harvest decision). Paul wrote:

    We read with interest your postings on this topic.

    The National Archives and Records Administration (NARA) has posted background information regarding our web harvest decision at This background document includes links to our guidance products related to web records and the decisionmaking process we went through to arrive at our decision.

    My thanks to Paul to sharing this information with me and my readers. Their reasoning seems consistent with the rationale I discussed in my last post. What do you think of NARA’s response?

    NARA and the web harvest: a discussion of the issues

    As I wrote last Thursday, the National Archives has decided not to conduct a harvest of Federal web sites at the end of this Presidential administration. In my post, I referred to this as a “public relations error.” It looks like I was right. Take a look at some of these links if you want to see how this story is being portrayed on the web:

    After my post went up, I was encouraged to look into this situation more carefully. Many of the issues at stake in this controversy have their roots in key archival principles, and I think it’s our duty as archivists to bring understanding of those issues to the public debate. I’ll provide some basic background first, then discuss some of the appraisal and resource issues.

    Continue reading “NARA and the web harvest: a discussion of the issues”

    NARA decides to leave Federal web records to Internet Archive

    The diligent bloggers at the Free Government Information tipped me off to another NARA topic that requires publicizing. FGI quotes a post at .govwatch, citing a NARA memo (NWM 13.2008) which states, in part:

    The National Archives and Records Administration (NARA) preserved a one-time snapshot of agency public web sites as they existed on or before January 20, 2001, as an archival record in the National Archives of the United States. NARA also conducted a harvest (i.e., capture) of Federal Agency public web sites in 2004 and of Congressional web sites in 2006. See

    After considering our other records management program priorities for FY 2008, availability of harvested web content at other “archiving” sites (e.g.,, and the resources required for conducting and preserving a government-wide web snapshot, NARA has determined that we will not conduct a web harvest or snapshot at the end of the current Administration.

    This seems, at the very least, a public relations error. Saying that NARA doesn’t have to capture records because the Internet Archive is doing it is a flimsy excuse. Rather, as John Wonderlich pointed out on the Sunlight Foundation blog:

    The fact that digital preservation is done by others outside NARA isn’t an excuse for NARA to abdicate their responsibility, but an argument that they should be capable of fulfilling it.

    Perhaps one of our knowledgeable NARA readers can clarify a few points. If my memory serves me correctly, Federal agencies are required to schedule their web records along with all other Federal records. Therefore NARA might consider that the web harvests are not the “official” records of government web sites, but rather are captured as supplementary records. I think it is likely that NARA might also believe (or know) that most agencies, even if they have correctly scheduled their web records (which is a big if) might not actually be effectively capturing them or transferring them to NARA custody in accordance with their schedules. Anyone want to place bets on that? Therefore these web harvests may, in some cases, be the only record NARA ever receives of some agency websites for this time period.

    Clearly, in the past NARA decided it was worth the resources to capture these records. What is different now? What are the “other records management program priorities for FY 2008” that are more important? I don’t doubt there are other many important priorities, but this harvest was planned and budgeted for long ago, I’m sure. What has changed now? I think we deserve a more complete explanation, and the excuse that the Internet Archive is doing it is not good enough. They are not an authoritative source for reliable, authentic Federal records. What’s the real story here?

    NARA latest digitization agreement: One archivist’s perspective

    As regular readers know, I used to work at the National Archives. Since starting this blog, I’ve avoided writing much about NARA because I worried that if was too critical people would think I was trying to get back at someone for something, or that if was too supportive people would think I was just a shill for my old employer. But I read a post a few days ago called “The NARA/TGN contract as a bad precedent” on a blog I admire, Free Government Information, and I felt I needed to write a response.

    The authors at FGI are advocates for, clearly, free government information. As an archivist and a former employee of NARA, I am an advocate for the broadest possible public access to NARA’s holdings, as well as for the general welfare of NARA as an institution. I am also keenly aware of the challenges NARA’s mission presents and the limited resources it is being given to carry out its mission. I am also a pragmatist. I am not, however, an expert in digitization. If you’re looking for a technical discussion, you won’t find it here.

    Many of you may remember an excellent article published last March in the New York Times, “History, Digitized (and Abridged)“which described the challenges NARA and other repositories face in digitizing their enormous collections. The article estimated that, given the expected annual rate of digitization, it would take 1800 years to digitize all of NARA’s textual holdings and 576 years to digitize all its non-textual holdings. Clearly they have to seek ways to speed up their digitization process. Without a large increase in their budget or a drastic shift in their institutional priorities, they cannot digitize their materials any faster. To make more materials available online more quickly NARA, like many other national archives across the world, has chosen to pursue corporate partnerships. In such partnerships, the corporate partner has to get enough out of the deal to make it worth their while. This leads to agreements, such as the one with The Generations Network (TGN) (available here) that limit free public access to the digitized materials for a given period–in this case five years. It’s a trade off, and it’s one I can live with. I think most archivists, based on their own experience with the challenges of finding money for digitization, would agree.

    I will not address all of the concerns raised in the FGI post, but think I will cover most of the substantial ones. Now, let’s get to the specifics of this agreement and FGI’s concerns with them, and I’ll share what I think people should really be worried about.

    Continue reading “NARA latest digitization agreement: One archivist’s perspective”

    Implications for archives from IMLS report on in-person and online uses of museums and libraries

    The IMLS recently published the findings from their study “InterConnections: A National Study of Users and Potential Users of Online Information.” The study examined the roles public libraries and museums play as sources of information in relation to sources of information found on the web. The conclusions should be heartening to the library and museum communities:

    • Libraries and museums evoke consistent, extraordinary public trust among diverse adult users.
    • An explosion of available information inspires the search for more information.
    • The public benefits significantly from the presence of museums and libraries on the Internet.
    • Internet use is positively related to in-person visits to museums and libraries.
    • Museums and public libraries serve important and complementary roles in supporting a wide variety of information needs.

    I haven’t combed the report in detail, but here are a few thoughts on what implications we might find here for archives.

    First, you might note that in the “Average Ratings of Trustworthiness of Sources of Information” chart, libraries come in first with a score of 4.58 (on a scale of 1 to 5). Museums are second with 4.33 and Archives/Historical Associations are third with 4.21. Genealogical societies are next with 3.71, followed by government websites (3.00), commercial websites (2.54), and private individual websites (2.14). I think claiming that we “evoke consistent, extraordinary public trust” is a little inflated, but I think we can claim that we are a highly trusted source of information.

    The study also found that: “Number of visits to museums and public libraries and public trust are positively correlated. Users have gained trust through greater use and/or they use museums and public libraries more because of this trust.” The same would probably be true of our users, although, of course, we have far fewer users.

    Interestingly, when discussing which users trusted more, information received from libraries and museums in-person or online (I am assuming “remotely” means primarily online), the report chose to highlight that users trusted information received in person more. What I found significant was how close the results were:

    • In-person from public libraries: 4.62
    • Remote from public libraries: 4.48
    • In-person from museums: 4.62
    • Remote from museums: 4.54

    I take away from this that there is a high level of trust for information received both in-person and “remotely.” The results were similar when people were asked about the “quality” of the information they received:

    • In-person from public libraries: 4.38
    • Remote from public libraries: 4.2
    • In-person from museums: 4.4
    • Remote from museums: 3.96

    Again, what I find significant here is how close the ratings are–particularly for public libraries.

    The report also confirmed something I’ve heard many people say–that increased web usage does not diminish in-person visits, rather, it increases them. “In 2006 remote online access increased adult visits to museums by 75% and to public libraries by 73% (while in-person visits have increased overtime).” I’m not sure if they are claiming that online access caused the increase, but certainly over that time period there was an increase in in-person visits while there was an increase in online visits. There is a lot of data in the report on this subject. It would be interesting to know these kinds of statistics for archives, wouldn’t it? Some data was collected on users’ visits to “other libraries” as opposed to public libraries, but I’m not confident we can really read most archives into those “other libraries.”

    The report claims that: “Internet users are about 91% more likely to visit museums and 50% more likely to visit public libraries than non-Internet users.” You can’t draw any conclusions here, can you? It’s a chicken and egg situation, I think. Are Internet users just the kind of people who visit libraries and museums or do they visit libraries and museums because they use the Internet?

    In support of its final conclusion, the report draws a correlation between the amount of time and money users spend traveling to libraries and museums and the value they place on them:

    “Museums visitors average spending nearly 5 hours of their time travelling to and visiting inside museums in addition to $41 in travel costs and fees. This contrasts with an average of 46 minutes for remote visits. Users’ willingness to pay for both kinds of visits is testimony to the value they place on museums.

    Public library visitors spend an average of 73 minutes travelling to and using public libraries and about $2.50 per visit, indicating the high value of public libraries to them.”

    I suspect the amount of time and money our users spend traveling to archives would rank much higher than public libraries, and close to museums. This is an interesting way of calculating our “value.”

    It’s a shame we don’t have this kind of study for archives. I suspect the results would do a lot of good to support some of our advocacy efforts and bolster the case for increasing our presence on the web.

    We’ve still got work to do on Web 1.0 too . . .

    There’s a lot of stuff out there that’s . . . not great. A while back someone sent a message to the archives listserv suggesting that people take a look at So, I did. It’s an interesting site in many ways. They have complied “thousands” of digital images of old photographs “in large part from Government archives and our personal collections.” No credits are given for the original sources of any of the digitized images, so they are presented with no context. If you want to obtain a reproduction of the images you must contact the site owners.

    The photos are presented with titles and narrative caption information, which sometimes includes the date of the photograph or the name of the photographer, and provides information about the subject or the circumstances under which the photo was taken. But again, with no context information, the information they provide cannot be verified. For example, there is a photo with the title “Jewish factories,” but no indication in the caption information about why we know this is a Jewish factory or why someone took the picture. The photos are organized into “picture collections” such as “Mathew Brady Studio,” “Daguerreotypes,” and “American Adventure.” There are also “themed collections” which seem to be organized primary by subject, such as individual Native American tribes, baseball, “Dignitaries and Statesmen of the 1800s,” and “People Working.” You can also conduct keyword searching on the site, but the search mechanism also searches the text of the Google ads that appear on the site and the titles of the “you many also like” images along the bottom, so many of the hits are not relevant.

    So, what is this site good for? Topical browsing of common subjects, I’d say–if you’re not concerned with authenticity of the images or the information about them. If you want access to a bunch of old photographs of cowboys, it’s easy for you to find them here. It’s like finding an old shoebox of pictures at a flea market–someone has organized them into groups and written some information on the back. You might not agree with how they grouped them, and you don’t know if the information provided is reliable. But, they’re interesting pictures. (Of course, you can’t keyword search that shoebox . . .)

    So, just for fun, I went over to a system that provides a great deal of context and maximum authenticity: the National Archives’ Archival Research Catalog (ARC) and looked for digital images of photographs of cowboys (one of the groupings in the other website). I first searched on the keyword “cowboy” (with filters for only digital images of photographs). I got 41 hits. Of these:

    • 7 were photographs of Native Americans with no cowboys in them,
    • 2 were pictures of prisoners at Leavenworth (not looking very much like
    • cowboys),
    • 1 was a picture of Japanese-Americans being “re-located” in 1942
    • 26 were from the series, “DOCUMERICA: The Environmental Protection Agency’s Program to Photographically Document Subjects of Environmental Concern, 1972-1977” they do show people in cowboy dress from the 1970s,
    • 1 is a “Photograph of President Dwight D. Eisenhower being lassoed by a cowboy while reviewing the Inaugural parade in Washington, as Vice President Richard M. Nixon and other dignitaries look on,”
    • 1 is a picture of Ronald Reagan in a cowboy hat, and
    • 3 are the kinds of pictures of cowboys I was looking for.

    I then tried “cowboys” as the keyword and got 18 hits. Of these:

    • 4 were pictures of the Badlands (in only one could I see any cowboys)
    • 4 were the kinds of pictures of cowboys I was looking for
    • 4 were Hollywood movie stills showing actors dressed as cowboys
    • 4 were from the same EPA series, DOCUMERICA: The Environmental Protection Agency’s Program to Photographically Document Subjects of Environmental Concern, 1972-1977
    • 1 was a 1923 picture of “Gerald R. Ford, Jr., holds the reins of a pioneer wagon prior to participating in a neighborhood parade, while three unidentified cowboys stand nearby,” and
    • 1 was of Richard Nixon, “Meeting with 1972 Poster Child of the National Association for Retarded Children, 03/28/1972″ (with a member of the Dallas Cowboys).

    (This is what you get with keyword searching, of course–all of these descriptions did have the word “cowboy” in them.) I then searched to see if the term “cowboys” was used in the catalog as a subject heading–it is, but for photographs it has only been applied once, to a album which has not been digitized.

    If is like a shoebox full of photos, I don’t know what to compare ARC to. As I’ve said elsewhere, I used to work at the National Archives. I understand, for the most part, why ARC is the way it is. They’ve got over 50% of their immense holdings described at some level in ARC, and that is an accomplishment. In order to do so, they’ve made trade-offs, and I respectfully suggest that it’s time to go back and consider how to make the information they already have in the catalog more accessible. I would like to see a separate interface that allows users to search or browse through only the digitized photographs (yes, I know you can do this through ARC, but it’s not a tremendously intuitive interface)–but I think this would only be truly useful if the images were accessible through accurate subject cataloging. You could navigate back from the image to more complete contextual information in ARC, of course. I think we should be able to have it both ways–all the user-friendliness and ease combined with the rich context and authenticity. Anyone else have any suggestions for our friends at the National Archives? (And if you have a couple of million dollars to give them along with your suggestion, I’m sure they’d appreciate that too. I’m not quite there yet, myself.)