UAlbany Web Archiving Program

preserving albany.edu and more since 2012

Web Archives are preserved websites that can be "replayed" even if the original changes or goes offline!

Purpose

Search Web Archives

An important part of the mission of The M. E. Grenander Department of Special Collections and Archives is to collect, preserve, and provide access to the official records of the University at Albany. The University website is one of the primary sources of information about campus policies, administrative activities, curricular changes, news, and events. Much of this information can no longer be found in print, so in order to preserve these records for posterity or to meet our legal requirements, it is necessary to collect parts of the web.

The goal of the web archiving program is to preserve records that deserve to be retained for the long term in accordance with our collection development policy. Primarily, this includes collecting the website of the University at Albany, SUNY to insure all permanent public records can be accessed in the future. Typically, we also collect the websites of organizations whose paper records we hold, as these group continue to provide more and more documentary evidence online.

We are committed to preserving websites in their original form to preserve their original context and structure. We expect future researchers to get more value out of web archives, than the same content printed to paper or flat PDF documents.

Websites are inherently transitory. As pages are edited and updates are made, older information is lost. Therefore, website archiving cannot be a one-time procedure, it must be done regularly in order to accurately capture the changing nature of web-based information.

How Websites are Collected

Websites that have been selected for the archives are harvested periodically using the Archive-It service from the Internet Archive. In a given web crawl, the Heritrix crawler begins from a seed (a high level domain, such as www.albany.edu) or set of seeds, and then automatically harvests successive layers of the website by following links from this seed. This process continues for a specified duration or until a specified number of documents have been harvested.

The frequency and depth of harvesting varies depending on the website. For example, Archive-It performs a daily shallow crawl of the top level University domain (www.albany.edu) and a more comprehensive monthly crawl to attempt to capture the entire UAlbany web presence, including a longer list of University domains.

How to Access Web Archives

Limitations of Web Archiving

Like photographs or other historical documents, archived websites represent a static snapshot of information. They contain only that information that was available on the website at the time of capture. As a result, archived websites may contain outdated information, broken links, malfunctioning email addresses or errors. Additionally, although the goal of web archiving is to create a complete snapshot, it is often impossible or infeasible to capture 100% of the content and functionality found on a complex webpage. Therefore, content may be missing from a given website due to issues such as rendering Javascript, streaming media, dynamic form and database-driven content and robots.txt exclusions.

Columbia University Libraries have compiled a useful list of website design best practices that can facilitate web archiving by mitigating some of the problems listed above. Website owners are encouraged to take these practices into account and consider the value of long term preservation when making web-design decisions. A link to these best practices can be found below.

Guidelines for Preservable Websites

Privacy Issues

We are eager to hear from website owners who have concerns about content that has been included in our web archives. If you wish to discuss the removal of your website from our web archives, please contact us.