UAlbany Web Archiving Program
preserving albany.edu and more since 2012
An important part of the mission of The M. E. Grenander Department of Special Collections and Archives is to collect, preserve, and provide access to the official records of the University at Albany. The University website is one of the primary sources of information about campus policies, administrative activities, curricular changes, news, and events. Much of this information can no longer be found in print, so in order to preserve these records for posterity or to meet our legal requirements, it is necessary to collect parts of the web.
The goal of the web archiving program is to preserve records that deserve to be retained for the long term in accordance with our collection development policy. Primarily, this includes collecting the website of the University at Albany, SUNY to insure all permanent public records can be accessed in the future. Typically, we also collect the websites of organizations whose paper records we hold, as these group continue to provide more and more documentary evidence online.
We are committed to preserving websites in their original form to preserve their original context and structure. We expect future researchers to get more value out of web archives, than the same content printed to paper or flat PDF documents.
Websites are inherently transitory. As pages are edited and updates are made, older information is lost. Therefore, website archiving cannot be a one-time procedure, it must be done regularly in order to accurately capture the changing nature of web-based information.
How Websites are Collected
Websites that have been selected for the archives are harvested periodically using the Archive-It service from the Internet Archive. In a given web crawl, the Heritrix crawler begins from a seed (a high level domain, such as www.albany.edu) or set of seeds, and then automatically harvests successive layers of the website by following links from this seed. This process continues for a specified duration or until a specified number of documents have been harvested.
The frequency and depth of harvesting varies depending on the website. For example, Archive-It performs a daily shallow crawl of the top level University domain (www.albany.edu) and a more comprehensive monthly crawl to attempt to capture the entire UAlbany web presence, including a longer list of University domains.
How to Access Web Archives
There are several ways to access the University's web archives. If a given collection includes archives of an organization's website, the web archives will be included within that organization's collection page.
Web Archives Collections
- Website of the University at Albany, SUNY
- University Senate
- UAlbany Sports
- Albany Student Press
- Business Council of New York State
- Environmental Advocates of New York State
- New York Civil Liberties Union
- Parks & Trails New York
- Pride Center of the Capital Region
All Web Archives Collections
UAlbany Archive-It page
Limitations of Web Archiving
Columbia University Libraries have compiled a useful list of website design best practices that can facilitate web archiving by mitigating some of the problems listed above. Website owners are encouraged to take these practices into account and consider the value of long term preservation when making web-design decisions. A link to these best practices can be found below.
We are eager to hear from website owners who have concerns about content that has been included in our web archives. If you wish to discuss the removal of your website from our web archives, please contact us.