WARC file for New York Civil Liberties Union, 2017 July 28
- Acquisition information:
-
crawl: 317870
Crawl RulesApplied scope rule: IGNORE_ROBOTS for host nyclu.org (last updated (created 2016-08-25, last updated 2016-08-25)
Applied scope rule: ACCEPT_URL nyclu.org/(STRING_MATCH) for host None (created 2016-05-31, last updated 2016-05-31)
Applied scope rule: DOC_LIMIT 50 for host wikimedia.org (created 2017-06-02, last updated 2017-06-02)
Applied scope rule: DOC_LIMIT 50 for host facebook.com (created 2017-06-02, last updated 2017-06-02)
Applied scope rule: DOC_LIMIT 50 for host wikipedia.org (created 2017-06-02, last updated 2017-06-02)
Applied scope rule: DOC_LIMIT 50 for host twitter.com (created 2017-06-02, last updated 2017-06-02)
Applied scope rule: DOC_LIMIT 100 for host reddit.com (created 2017-06-02, last updated 2017-06-02)
Crawl Timesstart_date: 2017-07-28T19:45:53Z
original_start_date: 2017-07-28T19:45:53Z
last_resumption: None
processing_end_date: 2017-08-01T23:07:31Z
end_date: 2017-08-01T22:37:23Z
elapsed_ms: 259242816
Crawl Typestype: MONTHLY
recurrence_type: MONTHLY
pdfs_only: False
test: False
Crawl Limitstime_limit: 259200
document_limit: None
byte_limit: None
crawl_stop_requested: None
Crawl Resultsstatus: FINISHED_TIME_LIMIT
discovered_count: 118252
novel_count: 22347
duplicate_count: 20295
resumption_count: 0
queued_count: 75610
downloaded_count: 42642
download_failures: 13
warc_revisit_count: 20288
warc_url_count: 42629
total_data_in_kbs: 3374896
duplicate_bytes: 2432668106
warc_compressed_bytes: 317645
Crawl Technical Detailsdoc_rate: 0.16
kb_rate: 13.0
- Physical / technical requirements:
- Researchers interested in data analysis with web archives may request a WARC file. WARC files are very large and difficult to work with. Your request may take time to process, and we may be unable to deliver your request remotely. Please consult an archivist if you are interested in advanced research with web archives.
Using these materials
- Access:
- The archives are open to the public and anyone is welcome to visit and view the collections.
- Collection restrictions:
- Access to this collection is restricted because it is unprocessed. Portions of the collection may contain recent administrative records and/or personally identifiable information. Please contact an archivist for more information. Certain restrictions may apply.
- Collection terms of access:
- The University Archives are eager to hear from any copyright owners who are not properly identified so that appropriate information may be provided in the future.