WARC file for Albany Student Press – UAlbany's independent student-run newspaper, 2017 May 25
- Acquisition information:
-
crawl: 303101
Crawl RulesLimit host twitter.com to 500 documents
Limit host twimg.com to 100 documents
Block host accounts.google.com
Crawl Timesstart_date: 2017-05-25T13:56:39Z
original_start_date: 2017-05-25T13:56:39Z
last_resumption: None
processing_end_date: 2017-05-27T22:17:52Z
end_date: 2017-05-27T21:59:43Z
elapsed_ms: 201775035
Crawl Typestype: WEEKLY
recurrence_type: WEEKLY
pdfs_only: False
test: False
Crawl Limitstime_limit: 259200
document_limit: None
byte_limit: None
crawl_stop_requested: None
Crawl Resultsstatus: FINISHED
discovered_count: 132114
novel_count: 75397
duplicate_count: 56717
resumption_count: 0
queued_count: 0
downloaded_count: 132114
download_failures: 1273
warc_revisit_count: 56716
warc_url_count: 132087
total_data_in_kbs: 4735276
duplicate_bytes: 2972204490
warc_compressed_bytes: 856693606
Crawl Technical Detailsdoc_rate: 0.65
kb_rate: 23.0
- Physical / technical requirements:
- Researchers interested in data analysis with web archives may request a WARC file. WARC files are very large and difficult to work with. Your request may take time to process, and we may be unable to deliver your request remotely. Please consult an archivist if you are interested in advanced research with web archives.
Using these materials
- Access:
- The archives are open to the public and anyone is welcome to visit and view the collections.
- Collection restrictions:
- Access to this record group is unrestricted.
- Collection terms of access:
- The researcher assumes full responsibility for conforming with the laws of copyright. Whenever possible, the M.E. Grenander Department of Special Collections and Archives will provide information about copyright owners and other restrictions, but the legal determination ultimately rests with the researcher. Requests for permission to publish material from this collection should be discussed with the Head of Special Collections and Archives.