Skip to content

Pagination | spectrumx.ops.pagination

spectrumx.ops.pagination

Pagination for SDS constructs.

Classes:

Name Description
Paginator

Manages the state for paginating through files in SDS.

Functions:

Name Description
main

Usage example for paginator.

Classes

Paginator

Paginator(
    *,
    Entry: type[SDSModel],
    gateway: GatewayClient,
    list_method: Callable[..., bytes],
    list_kwargs: dict[str, Any],
    dry_run: bool = False,
    page_size: int = 30,
    start_page: int = 1,
    total_matches: int | None = None,
    verbose: bool = False,
)

Bases: Generic[T]


              flowchart TD
              spectrumx.ops.pagination.Paginator[Paginator]

              

              click spectrumx.ops.pagination.Paginator href "" "spectrumx.ops.pagination.Paginator"
            

Manages the state for paginating through files in SDS.

A Paginator instance may be iterated over to fetch and return parsed entries automatically. Note that network calls will happen in the background and fetching requests happen once per page. Iterating it also consumes the generator, so any yielded content should be stored if needed in the future.

Usage example
# For file listings
file_paginator = Paginator[File](
    Entry=File,
    gateway=gateway,
    list_method=gateway.list_files,
    list_kwargs={"sds_path": "/path/to/files"},
    dry_run=False,
    verbose=True,
)

# For dataset files
dataset_paginator = Paginator[File](
    Entry=File,
    gateway=gateway,
    list_method=gateway.get_dataset_files,
    list_kwargs={"dataset_uuid": "123e4567-e89b-12d3-a456-426614174000"},
    dry_run=False,
    verbose=True,
)

print(f"Total files matched: {len(file_paginator)}")
# len() will fetch the first page

for my_file in file_paginator:
    print(f"Processing file: {my_file.name}")
    process_file(my_file)
    # new pages are fetched automatically

for _my_file in file_paginator:
    msg = "This will not run, as the paginator was consumed."
    raise AssertionError(msg)

Parameters:

Name Type Description Default
Entry type[SDSModel]

The SDSModel subclass to use when parsing the entries.

required
gateway GatewayClient

The gateway client to use for fetching pages.

required
list_method Callable[..., bytes]

The method to call for fetching pages (e.g., gateway.list_files).

required
list_kwargs dict[str, Any]

Keyword arguments to pass to the list_method.

required
dry_run bool

If True, will generate synthetic pages instead of fetching.

False
page_size int

The number of entries to fetch per page.

30
start_page int

The page number to start fetching from.

1
total_matches int | None

The total number of entries across all pages.

None
verbose bool

If True, will log more information about the pagination.

False
Source code in spectrumx/ops/pagination.py
def __init__(
    self,
    *,
    Entry: type[SDSModel],  # noqa: N803
    gateway: GatewayClient,
    list_method: Callable[..., bytes],
    list_kwargs: dict[str, Any],
    dry_run: bool = False,
    page_size: int = 30,
    start_page: int = 1,
    total_matches: int | None = None,
    verbose: bool = False,
) -> None:
    """Initializes the paginator with the required parameters.

    Args:
        Entry:          The SDSModel subclass to use when parsing the entries.
        gateway:        The gateway client to use for fetching pages.
        list_method:    The method to call for fetching pages
            (e.g., gateway.list_files).
        list_kwargs:    Keyword arguments to pass to the list_method.
        dry_run:        If True, will generate synthetic pages instead of fetching.
        page_size:      The number of entries to fetch per page.
        start_page:     The page number to start fetching from.
        total_matches:  The total number of entries across all pages.
        verbose:        If True, will log more information about the pagination.
    """

    # TODO: generalize this to any SDSModel subclass (too coupled to File now)

    if page_size <= 0:  # pragma: no cover
        msg = "Page size must be a positive integer."
        raise ValueError(msg)
    if not isinstance(start_page, int) or start_page < 1:  # pragma: no cover
        msg = "Start page must be a positive integer."
        raise ValueError(msg)
    if (
        not isinstance(total_matches, int) and total_matches is not None
    ):  # pragma: no cover
        msg = "Total matches must be an integer."
        raise ValueError(msg)
    if not callable(list_method):  # pragma: no cover
        msg = "List method must be callable."
        raise TypeError(msg)
    if not isinstance(list_kwargs, dict):  # pragma: no cover
        msg = "List kwargs must be a dictionary."
        raise TypeError(msg)
    if not isinstance(gateway, GatewayClient):  # pragma: no cover
        msg = "Gateway client must be provided."
        raise TypeError(msg)
    if not issubclass(Entry, SDSModel):  # pragma: no cover
        msg = "Entry must be a subclass of SDSModel."
        raise TypeError(msg)
    self.dry_run = dry_run
    self._Entry = Entry
    self._gateway = gateway
    self._list_method = list_method
    self._list_kwargs = copy.deepcopy(
        list_kwargs
    )  # Make a copy to avoid modifying the original
    self._next_page = start_page
    self._page_size = page_size
    self._total_matches = total_matches if total_matches else 1
    self._verbose: bool = verbose

    # internal state
    self._has_fetched = False
    self._is_fetching: bool = False
    self._current_page_data: dict[str, Any] | None = None
    self._current_page_entries: Generator[T] = iter(())
    self._next_element: T | Unset = Unset()
    self._yielded_count: int = 0

Functions

main

main() -> None

Usage example for paginator.

Source code in spectrumx/ops/pagination.py
def main() -> None:  # pragma: no cover
    """Usage example for paginator."""
    log.info("Running the main script.")
    file_paginator = Paginator[files.File](
        Entry=files.File,
        gateway=GatewayClient(
            host="localhost",
            api_key="does-not-matter-in-dry-run",
        ),
        list_method=lambda **kwargs: b'{"count": 25, "results": []}',  # Mock response
        list_kwargs={"sds_path": "/path/to/files"},
        page_size=10,
        dry_run=True,  # in dry-run this should always generate 2.5 pages
        verbose=True,
    )
    log.info(f"Total files matched: {len(file_paginator)}")
    processed_count: int = 0
    for my_file in file_paginator:
        log.info(f"Processing file: {my_file.name}")
        _process_file_fake(my_file)
        processed_count += 1
        # new pages are fetched automatically

    log.info(f"Processed {processed_count} / {len(file_paginator)} files.")

    log.info("Trying another loop:")
    for _my_file in file_paginator:
        msg = "This will not run, as the paginator was consumed."
        raise AssertionError(msg)
    log.info("No more files to process.")

    log.info("Paginator demo finished.")