web_monitoring.db.Client.get_pages

Client.get_pages(*, chunk=None, chunk_size=None, sort=None, tags=None, maintainers=None, url=None, title=None, include_versions=None, include_earliest=None, include_latest=None, source_type=None, hash=None, start_date=None, end_date=None, active=None, include_total=False)[source]

Get an iterable of all pages, optionally filtered by search criteria.

Any metadata about each paginated chunk of results is available on the “_list_meta” field of each page, e.g:

>>> pages = client.get_pages(include_total=True)
>>> next(pages)['_list_meta']
{'total_results': 123456}

Parameters:

chunkinteger, optional: Pagination chunk to start iterating from. If unset, starts at the beginning of the result set. (Under the hood, results are retrieved in “chunks”; using this to skip partway into the results is more optimized that skipping over the first few items in the iterable.)
chunk_sizeinteger, optional: Number of items per chunk. (Under the hood, results are retrieved in “chunks”; this specifies how big those chunks are.)
sortlist of str, optional: Fields to sort by in {field}:{order} format, e.g. title:asc.
tagslist of str, optional
maintainerslist of str, optional
urlstr, optional
titlestr, optional
include_versionsbool, optional
include_earliestbool, optional
include_latestbool, optional
source_typestr, optional: Only include pages that have versions from a given source, e.g. ‘versionista’ or ‘internet_archive’.
hashstr, optional: Only include pages that have versions whose response body has a given SHA-256 hash.
start_datedatetime, optional
end_datedatetime, optional
activebool, optional
include_totalbool, optional: Whether to include a meta.total_results field in the response. If not set, links.last will usually be empty unless you are on the last chunk. Setting this option runs a pretty expensive query, so use it sparingly. (Default: False)

Yields:

pagedict: Data about a page.