web_monitoring.db.Client.get_pages
- Client.get_pages(*, chunk=None, chunk_size=None, sort=None, tags=None, maintainers=None, url=None, title=None, include_versions=None, include_earliest=None, include_latest=None, source_type=None, hash=None, start_date=None, end_date=None, active=None, include_total=False)[source]
Get an iterable of all pages, optionally filtered by search criteria.
Any metadata about each paginated chunk of results is available on the “_list_meta” field of each page, e.g:
>>> pages = client.get_pages(include_total=True) >>> next(pages)['_list_meta'] {'total_results': 123456}
- Parameters:
- chunk
integer, optional Pagination chunk to start iterating from. If unset, starts at the beginning of the result set. (Under the hood, results are retrieved in “chunks”; using this to skip partway into the results is more optimized that skipping over the first few items in the iterable.)
- chunk_size
integer, optional Number of items per chunk. (Under the hood, results are retrieved in “chunks”; this specifies how big those chunks are.)
- sort
listofstr, optional Fields to sort by in {field}:{order} format, e.g. title:asc.
- tags
listofstr, optional - maintainers
listofstr, optional - url
str, optional - title
str, optional - include_versionsbool, optional
- include_earliestbool, optional
- include_latestbool, optional
- source_type
str, optional Only include pages that have versions from a given source, e.g. ‘versionista’ or ‘internet_archive’.
- hash
str, optional Only include pages that have versions whose response body has a given SHA-256 hash.
- start_date
datetime, optional - end_date
datetime, optional - activebool, optional
- include_totalbool, optional
Whether to include a meta.total_results field in the response. If not set, links.last will usually be empty unless you are on the last chunk. Setting this option runs a pretty expensive query, so use it sparingly. (Default: False)
- chunk
- Yields:
- page
dict Data about a page.
- page