2 - Choose a Pagination Strategy and Implementation
Date: 2019-12-31
Status
Accepted
Context
The SpatioTemporal Asset Catalog (STAC) API specification allows for the use of next
links in the responses for the the
/search
and /collections/<id>/items
endpoints. The specific implementation for creating a token and scheme for constructing
the parameters to get the next page of results is left open to different implementations.
According to the OGC API - Features specification for links in responses the next
relation signifies:
the link’s context is a part of a series, and that the next in the series is the link target
Since the parameters of the href
itself is not specified to follow a certain pattern or any additional requirements
implementation itself is left to franklin
. There are a few common options that have been documented by a number of
existing resources.
limit & offset
Using offset
and limit
parameters is a common patten for paginating results. With this approach the limit
parameter specifies the number of results in a response and the offset
parameter specifies which particular set of
results out of the total to return. This maps closely to how limit
and offset
work in SQL
already.
Benefits
Easy to implement: maps closely to SQL
and therefore easy to implement
Stateless: requires no additional context
Disadvantages
Performance: can be slow for large datasets
Inconsistent: depending on sort ordering and the inclusion/deletion of items the same results can be returned multiple times
keyset/cursor/continuation token
The keyset
, cursor
, or continuation
token approach uses an opaque-ish token to control the page of results
returned and can also be combined with a limit
parameter to control the number of results. Often the token used will
represent a unique identifier for the last result of the current page returned to the client. This token is then used in
subsequent requests use the token in a query parameter (e.g. /?next=<token>
).
Benefits
Performance: on large datasets with indices performs better than scanning previous results
Consistent Ordering: even when data is added/removed results are stable
Can page forward and backwards: tokens can be used to page before and after search results
Implementation logic hidden from users: while there may be an interpretation of a continuation token, it is not guaranteed to be consistent or long-lived, which frees up back-end optimizations and changes without breaking the API
Disadvantages
Implementation more complex: needs to manage sort ordering to ensure that the continuation token equality check is consistent even if data is added, removed, or sorted by additional fields
Tokens become invalid: continuation tokens can become invalid, users should not rely on them for long periods of time
Decision
We will use continuation tokens for managing pagination. Keeping the implementation flexible/hidden
from end-users so that it can be optimized or changed later without breaking functionality for existing users outweighs
disadvantages around increased complexity for implementation. Additionally, since franklin
APIs are expected to hold
large amounts of data (tens of millions of rows) it is important that the pagination approach scales accordingly.
Lastly, since eventually franklin
may move beyond relying solely on postgresl
for managing state it may actually be
better that we do not use offset/limit
which relies heavily on SQL
.
Consequences
An upcoming release for the STAC specification
changes how pagination will be communicated in the API and simplifies implementation. In order to implement this we will
need to update API responses for /search
and /collections/<id>/items/
to include a next
link for
paginating forward. Initial implementation should base64
encode the primary key for the last result in the current
page.
One thing that is unclear is how we should handle primary keys/identifiers when implementing continuation
pagination.
This approach to pagination relies on sorting by an identifier that signals the last value in the current page. If that
value is a primary key then then including the PK in the sort by
for SQL
queries with a limit gives the next set of
results. For a SERIAL
column in postgresql
this is performant with an index. However, we use client supplied
identifiers right now for IDs since they are likely meaningful to the client and don't create our own PK. In
implementing a continuation
token based approach we should keep the user supplied identifier, but also add a column
for a real PK that is a SERIAL
column.
The implementation should follow the approach laid out here every query ends up being something like:
SELECT * FROM table
WHERE
id > {continuation-id}
ORDER BY <any-user-supplied-sort-options>, id asc
LIMIT <page-size>