Component | query service |
Stakeholders | Jeb Bearer, Abdul Basit |
A successful response is always returned before an HTTP request with a reasonable configuration times out. In practice this means response times should rarely exceed 1s.
A malicious client cannot exhaust the computational resources of the server by making a malicious pattern of rapid requests. In some cases this may require extra defenses such as rate limiting or limiting the ranges of data that can be requested in a single request.
Acceptance: Run an Espresso network and configure a node to prune after a certain amount of disk usage – say 1 GB. Run the network until storage on the node reaches 1 GB. Check that blocks are being pruned (e.g. block 0 is no longer available). Continue running the network for some time and check that disk and memory usage on the subject node remain constant.
Acceptance: Run an Espresso network under light load and configure a node to prune after a certain retention period (say 1 hour) but with a very large permissible disk usage (say 1 TB). Check that after 1 hour, data older than 1 hour is being pruned even though the disk usage is below the threshold. Check that data younger than 1 hour is still accessible.
Acceptance: Run an Espresso network under heavy load (e.g. running the load
generator to create random transactions). Also simulate adversarial query
service requests (e.g. using the nasty client). Monitor CPU utilization and
HTTP response times on query nodes. CPU utilization should remain below
acceptable limits and response times should not exceed a few seconds, with a
95th percentile under 1 second.
An alert can always be triggered when the node is unable to participate in consensus
An alert is rarely triggered when the node is successfully participating in consensus
Alerts can optionally be configured to trigger or not when consensus is failing due to a network-wide issue rather than an issue with the specific nodes
Advisory warnings can be triggered and attributed to specific problems with the node that may or may not lead to an alert. These warnings should help trace a subsequent alert to the specific area of the problem, such as the network, the database, etc.
Acceptance: Using the query service metrics, configure alerts such that no
incidents are missed and the team doesn’t get grumpy from false alarms.
Acceptance: It should be possible to design a client side library with an
interface mirroring that of the query service itself, but whose functions
return errors in the case of any malicious response from the server.
Acceptance: Use the header, leaf, block, payload, and VID streams to check
that new objects are confirmed every few seconds. Check that the different
objects corresponding to each block are all consistent (e.g. the payload hash in
the leaf matches the actual payload). Check that the single object queries for a
given block are successful immediately after receiving that block on a stream.
Acceptance: Submit transactions such that a block is created with a small
transaction in namespace A and a very large transaction in namespace B. Fetch
the data for each namespace separately, and check that it matches the
submitted transactions. Inspect the size in bytes of the HTTP response for
namespace A and require that it is within a factor of 2 of the submitted
transaction size.
Acceptance: Run an Espresso network with a fee-paying builder. For randomly selected block heights h:
Query the block state for arbitrary block numbers n < h. For each, check that the Merkle path is valid and the leaf element of the path is the hash of header n, obtained by querying header n directly from the availability API.
Query the fee state for the builder account, check Merkle proofs, and check that the leaf element of the Merkle path is a reasonable fee amount which is decreasing over time (as the builder pays fees).
Query the reward state for a delegator account, check Merkle proofs, and check that the leaf element of the Merkle path is a reasonable reward amount which is increasing over time.
Acceptance: Run an Espresso network until it has changed epochs several times. Fetch the stake table by epoch number for each epoch, and check that it matches the configuration of the L1 Stake Table Smart Contract at the time corresponding to that epoch. Fetch the current stake table and check that the result is the same as fetching the stake table by epoch number for the current epoch number.
Acceptance: Make edge case queries like fetching future epochs and very old
epochs. They should quickly respond with an error.
This feature is especially useful for indexer type applications like a block explorer.
Tech Debt:
Currently, transaction iteration is not supported. Clients must iterate over blocks, manually expanding to transactions and filtering by namespace.
Acceptance: Run an Espresso network until it reaches a certain block height h. Iterate over all of the aformentioned object types, from 0 to h and h to 0. At each step, check that the object has the expected height (i.e. the iteration proceeds in order without skipping any objects) and matches the result of querying for that object individually.
Acceptance: Repeat with various randomly selected page sizes.
Acceptance: Submit transactions for many namespaces. Iterate over
transactions filtering by a particular namespace, and check that only and all
transactions from that namespace are returned, in the appropriate order.
For a given range of blocks or the entire history of the network
For a given namespace or all namespaces in aggregate
Tech Debt:
Currently aggregate statistics are not indexed by namespace, and can only be queried for all namespaces in aggregate.
Acceptance: Submit blocks of known sizes with various namespaces populated.
Check that the total transaction count, total payload size, and per-namespace
counts and sizes reflect the submitted data. Using ranged queries, ensure it is
possible to exclude certain block ranges from the counts and sizes in the query
results.
Assumption: The block is younger than the retention period of honest nodes that have enabled pruning.
Acceptance: In a running Espresso network, use the availability API to
download VID common data for a certain block, and use the node API to
download a corresponding VID share from each individual node. Run the VID
recovery algorithm from Jellyfish to reconstruct a complete payload. Fetch the
corresponding payload from the availability API and check that it is the same
as the recovered payload.