-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Description
Note
RPC 2.0 has launched, the most up-to-date version of its docs can be found on the Sui Docs.
Motivation
We’re excited to share a proposal re-imagining Sui’s RPCs (front- and back-end), optimizing for:
- Query Expressiveness. The API faithfully represents Sui’s highly composable and inter-connected object model.
- Stable Releases. The API offers a stable platform for developers to work against: RPC releases come with a commitment to not introduce breaking interface changes. This does not come at the cost of pace of development: Features continue releasing early and often for the community to try, according to a predictable roadmap.
- Extensibility. Advanced apps and RPC providers can add additional post-processing pipelines, and RPC endpoints serving data from those pipelines. Custom deployments can serve a subset of the default endpoints, and only post-process and store a subset of the backing data.
- Performance and Reliability. The RPC service should continue serving reads in times of high transaction activity on the network, and vice versa.
Summary of Changes
The biggest user-facing change is that RPC 2.0 will offer a GraphQL interface, instead of JSON-RPC. GraphQL offers a better fit for Sui’s Object Model, comes with established standards for extensions (federation, schema stitching) and pagination (cursor connections), and a more mature tooling ecosystem including an interactive development environment.
On the back-end, the RPC service and its data-store will be decoupled from fullnodes. Fullnodes’ APIs will be limited to transaction execution and data ingestion for indexers, with all read requests served by a new, stateless RPC service, reading from its own data store. Indexers will consume transaction data from fullnodes in bulk, post-process them and write them to the store.
This redesign also offers an opportunity to address many known pain points with the existing RPCs such as deprecating the unsafe transaction serialization APIs, and providing more efficient query patterns for dynamic fields, among other usability issues reported by users of the current RPC.
Timeline
By end of October 2023, we will release an interactive demo supporting most queries in the schema linked in the GraphQL section to follow. This service is offered as beta software, without SLA, primarily intended for SDK builders to target ahead of a production release. It will not support transaction execution or dry runs and will operate on a static snapshot of the data which will be periodically updated as new features in RPC 2.0 are implemented.
By end of December 2023 the first version of the new RPC will be released as 2024-01.0, at the end of the fourth quarter of 2023. This version of the RPC will support all MVP features in the proposed schema and it will be deployed by Mysten Labs and shared with third party RPC providers for integration into their services. There will be an opportunity for RPC providers to provide feedback on the service architecture and for their customers to provide more feedback on how the API is to use which we will aim to incorporate into future versions of the RPC (released quarterly).
Support for the existing RPC will continue at least until end of Q1 2024, to give time to migrate. Until that time, changes to the existing RPC will be kept to a minimum (barring bug-fixes), to avoid disruption. We will assess whether there is sufficient support for GraphQL in the ecosystem, before we sunset the existing RPC.
Versioning
RPCs will adopt a quarterly release schedule and a versioning scheme of [year]-[month].[patch] (e.g. 2023-01.0, 2023-02.3, etc). Breaking version changes are reserved for new [year]-[month] versions, while patch versions maintain interface backwards compatibility.
This replaces the current scheme which ties RPC version to fullnode/validator version (which can update weekly). Decoupling node and RPC versions allows RPC to evolve at its own pace and differentiates breaking RPC changes from breaking node changes, and even changes to the indexer that processes data for the RPC to read (which will be versioned separately).
Setting Versions
API versions can be supplied as a header, not including the patch version (e.g. X-Sui-RPC-Version: 2023-10). If a header is not supplied, the latest released version is assumed. The response header will include the full version used to respond to the request, including the patch version (e.g. X-Sui-RPC-Version: 2023-10.2).
Deprecation
Each RPC major version will receive 6 months of support for bugfixes. Publicly available Mysten-operated RPC instances will also continue to provide access to an RPC version for 6 months after its initial releaes. Clients that continue to use versions older than 6 months will be automatically routed to the oldest supported version by the public Mysten RPC instances. E.g. clients who continue to use 2023-10.x past the release of 2024-04.y will automatically be served responses by 2024-01.z to limit the number of versions an RPC provider needs to support.
When deprecating an individual feature, care will be taken to initially make changes in a schema-preserving way, and reserve breaking changes for a time when usage of the initial schema has dropped. When deprecations remove fields, subsequent interface changes will avoid re-adding the field with new semantics, to reduce the chances of an unexpected breaking change for a client that is late to update.
GraphQL
A draft of part of the schema follows, giving a flavor of what the new interface will look like. The design leverages GraphQL’s ability to nest entities (e.g. when querying a transaction block, it will be possible to query for the contents of the gas coin). Fields will be nullable by default, to leave flexibility for field deprecations without breaking backwards compatibility. Pagination will be implemented using using Cursor Connections with opaque cursor types:
type Query {
# Find a transaction block either by its transaction digest or its
# effects digest.
transactionBlock(filter: TransactionBlockID!): TransactionBlock
}
# String containing 32B hex-encoded address
scalar SuiAddress
# String representation of an arbitrary width, possibly signed integer
scalar BigInt
# String containing Base64-encoded binary data.
scalar Base64
# ISO-8601 Date and Time
scalar DateTime
# Find a transaction block either by its transaction digest, or its
# effects digest (can't be both, can't be neither).
input TransactionBlockID {
transactionDigest: String
effectsDigest: String
}
input EventFilter {
# ... snip ...
}
type TransactionBlock {
id: ID!
digest: String!
sender: Address
gasInput: GasInput
kind: TransactionBlockKind
signatures: [TransactionSignature]
effects: TransactionBlockEffects
expiration: Epoch
bcs: Base64
}
type TransactionBlockEffects {
digest: String!
status: ExecutionStatus!
errors: String
transactionBlock: TransactionBlock
dependencies: [TransactionBlock]
lamportVersion: BigInt
gasEffects: GasEffects
objectReads: [Object]
objectChanges: [ObjectChange]
balanceChanges: [BalanceChange]
epoch: Epoch
checkpoint: Checkpoint
eventConnection(
first: Int,
after: String,
last: Int,
before: String,
filter: EventFilter,
): EventConnection
bcs: Base64
}
enum ExecutionStatus {
SUCCESS
FAILURE
}
type GasInput {
gasSponsor: Address
gasPayment: [Object!]
gasPrice: BigInt
gasBudget: BigInt
}
type GasEffects {
gasObject: Coin
gasSummary: GasCostSummary
}
type GasCostSummary {
computationCost: BigInt
storageCost: BigInt
storageRebate: BigInt
nonRefundableStorageFee: BigInt
}
type ObjectChange {
inputState: Object
outputState: Object
idCreated: Boolean
idDeleted: Boolean
}
type BalanceChange {
owner: Owner
coinType: MoveType
amount: BigInt
}
type Coin {
id: ID!
balance: BigInt
asMoveObject: MoveObject!
}
type EventConnection {
edges: [EventEdge!]!
pageInfo: PageInfo!
}
type EventEdge {
cursor: String
node: Event!
}
type PageInfo {
hasNextPage: Boolean!
hasPreviousPage: Boolean!
startCursor: String
endCursor: String
}
type Address {
# ... snip ...
}
type Object {
# ... snip ...
}
type Epoch {
# ... snip ...
}
type Event {
# ... snip ...
}
union TransactionBlockKind =
ConsensusCommitPrologueTransaction
| GenesisTransaction
| ChangeEpochTransaction
| ProgrammableTransactionBlock
type ConsensusCommitPrologueTransaction {
# ... snip ...
}
type GenesisTransaction {
# ... snip ...
}
type ChangeEpochTransaction {
# ... snip ...
}
type ProgrammableTransactionBlock {
# ... snip ...
}
type TransactionSignature {
# ... snip ...
}
type MoveObject {
# ... snip ...
}
type MoveType {
# ... snip ...
} For a more detailed look at the proposed schema, and to follow its development, consult draft_schema.graphql or the snapshot of the schema currently supported by the implementation.
Extensions
The ability to add extensions to the RPC is a common request. Apps may require secondary indices, and RPC providers often provide their own data to augment what the chain provides.
GraphQL offers multiple standards for schema extensions (e.g. Federation, Schema Stitching) and even multiple implementations of those standards (e.g. Apollo Federation, Conductor) that offer the ability to seamlessly serve an extended schema over multiple services (which could be implemented on different stacks).
Mysten Labs will offer a base RPC service implementation that supports the same functionality as the existing indexer. Functionality will be split into logical groups (see below) which can be turned off for a given deployment to reduce CPU and storage requirements. This implementation will be compatible with a schema extension standard (but will not require one to work out-of-the-box).
Logical Functional Groups
- Core: Reading objects, checkpoints, transactions and events, executing, inspecting and dry-running transactions.
- Coins: Accessing coin metadata, and per-address coin and balance information.
- Dynamic Fields: Querying an object’s dynamic fields
- Subscriptions (Transactions and Events)
- Packages: Accessing struct and function signatures, tracking package versions and
UpgradeCaps, tracking popular packages. - System State: Read information about the current epoch (protocol config, committee, reference gas price).
- Name Server: SuiNS name lookup and reverse lookup.
- Analytics: Statistics about how the network was running (TPS, top packages, APY, etc).
Interface Changes
Unsafe APIs
The unsafe_ APIs, responsible for serializing transactions to BCS (e.g. unsafe_moveCall, unsafe_paySui, unsafe_publish etc) will be removed in RPC 2.0. SDKs that depend on these APIs for transaction serialization will be offered a native library with a C interface to convert transaction-related data structures between JSON and BCS to replace this functionality.
See Issue #13483 for a detailed proposal for this new library.
Dynamic Fields
The 1 + N query pattern related to dynamic fields was a common complaint with the existing RPC interface: Clients that wanted to access the contents of all dynamic fields for an object needed to issue a query to list all the dynamic fields, followed by N queries to get the contents of each object.
This will be addressed through RPC 2.0’s use of GraphQL, which allows a single query to access an object’s dynamic fields and their contents:
type Object {
dynamicFieldConnection(
first: Int,
after: String,
last: Int,
before: String,
): DynamicFieldConnection
type DynamicFieldConnection {
edges: [DynamicFieldEdge!]!
pageInfo: PageInfo!
}
type DynamicFieldEdge {
cursor: String
node: DynamicField!
}
type DynamicField {
id: ID!
name: MoveValue
value: DynamicFieldValue
}
union DynamicFieldValue = MoveObject | MoveValue
}Using this API, the following query can be used to page through the contents of all dynamic field names and values for a given object:
query {
object(address: objectId) {
dynamicFieldConnection {
pageInfo { hasNextPage, endCursor }
edges {
node {
name,
value {
... on MoveObject { contents { type, data } }
... on MoveValue { type, data }
}
}
}
}
}
}Dry Run and Dev Inspect
The existence of both a dryRun and a devInspect API has been a source of confusion, as they offer overlapping functionality. RPC 2.0 will combine the two into a single API to provide the behavior of both without overlap:
type Query {
dryRunTransactionBlock(
txBytes: Base64!,
txMeta: TransactionMetadata,
skipChecks: Boolean,
): DryRunResult
}
input TransactionMetadata {
sender: SuiAddress
gasPrice: U64
gasObjects: [SuiAddress!]
}
type DryRunResult {
transaction: TransactionBlock
errors: String
events: [Event!]
results: [DryRunEffect!]
}
type DryRunEffect {
mutatedReferences: [DryRunMutation!]
returnValues: [DryRunReturn!]
}
type DryRunMutation {
input: TransactionInput
type: MoveType
bcs: Base64
}
type DryRunReturn {
type: MoveType
bcs: Base64
}The combined API can be used to replicate the current functionality of dryRun and devInspect as follows:
query {
# To dry-run a transaction, pass its `TransactionData` BCS and Base64 encoded
# as `txBytes`.
dryRun: dryRunTransactionBlock(txBytes: "...") {
transaction
errors
events
}
# To replicate dev inspect, pass a `TransactionKind` BCS and Base64 encoded as
# `txBytes` and supply the information to turn it into `TransactionData` in
# `txMeta`. Pass `skipChecks: true` to bypass the usual consistency checks.
#
# All fields in `txMeta` are optional, and will be replaced by sensible defaults
# if omitted, but `txMeta` itself must be passed in order to treat `txBytes` as
# a `TransactionKind` rather than `TransactionData`.
devInspect: dryRunTransactionBlock(
txBytes: "...",
txMeta: {
sender: senderAddress,
gasPrice: "1000",
},
skipChecks: true,
epoch: 42,
) {
transaction
errors
events
results
}
# Gas estimation is currently done using `dryRun` and requires:
#
# - Making a call to get the reference gas price,
# - Creating `TransactionData` with a sentinel `gasObjects` value (an empty
# list), the reference gas price, and the `TransactionKind` containing the
# transaction body.
# - Calling `dryRun` with this `TransactionData` to get an estimate of the gas
# cost.
# - Creating a new `TransactionData` after having selected the appropriate
# coins for the transaction.
#
# The new API can also perform gas estimation by dry-running such a
# `TransactionData`, but also offers a more convenient API that accepts a
# `TransactionKind`. With this new API, the gas estimation flow is as follows:
#
# - Call `dryRunTransactionBlock` with the `TransactionKind`, to simultaneously
# get the reference gas price and the gas cost estimate.
# - Create a `TransactionData` from the `TransactionKind` after having selected
# the appropriate coins for the transaction.
gasEstimation: dryRunTransactionBlock(
txBytes: "...",
txMeta: { sender: senderAddress }) {
transaction {
gasInput { gasPrice }
effects { gasEffects { gasSummary } }
}
}
}Data Formats
The number of input and output formats will be limited to maintain consistency across API surfaces:
- Digests will use Base58 encoding everywhere.
- Other binary blobs will use Base64 encoding.
- IDs (Objects and Addresses) will be normalized to their maximum width with a leading
0xon output (but will be accepted in a truncated form in input). - 64-bit and wider integers will be represented as strings to avoid loss of precision bugs when converting to and from double-precision floating points (Doubles will not be accepted directly, and numbers will not be returned as Doubles even if the particular value fits in a Double).
Clients that depend on truncated package IDs in outputs and numbers represented as Doubles will need to migrate to the new data formats while adopting the new API.
Types that are represented using BCS on-chain (such as TransactionBlocks, TransactionEffects and Objects) will offer a consistent API for querying as BCS, to cater to clients that relied on the BCS or Raw output functionality in the existing JSON-RPC.
Data Consistency
Currently, clients that need read-after-write consistency use the WaitForLocationExecution execution type. This guarantees that reads on the fullnode that ran the transaction will be consistent with the writes from that transaction, however:
- If a service deploys multiple fullnodes behind a load balancer, clients must be pinned to the same fullnode to take advantage of this consistency guarantee.
- The guarantee does not extend to secondary indices, which may still lag behind.
The new interface will do away with WaitForLocalExecution and provide a blanket consistency guarantee for all its data sources (i.e. all data returned from a single RPC request will be from a consistent snapshot).
To enable this, the RPC’s indexer will need to commit writes as a result of checkpoint post-processing atomically, which increases latency (transactions that are final may take longer to show up in fullnode responses). A Core API will be provided to query which range of checkpoints the service has complete information for:
type Query {
# Range of checkpoints that the RPC has data available for (for data
# that can be tied to a particular checkpoint).
availableRange: AvailableRange!
}
type AvailableRange {
first: Checkpoint
last: Checkpoint
}
type Checkpoint {
id: ID!
digest: String!
sequenceNumber: BigInt!
# ... snip ...
}Typically, last will be the latest checkpoint in the current epoch (modulo some post-processing latency), and first will be determined by RPC store pruning. All APIs are guaranteed to work above first, some may continue to work when serving data based on checkpoints below it.
Consistency and Cursors
Paginated queries (using the Relay Cursor Connection spec) will also support consistency. For example, suppose Alice has an account, 0xA11CE, with many objects, and we query their objects’ IDs:
query {
address("0xA11CE") {
objectConnection {
edges { node { location } }
pageInfo { endCursor }
}
}
}After issuing this query, Alice transfers an object, O to Bob, at 0xB0B, so at the latest version of the network, Alice no longer owns an object that they previously did own. However, paginating the original query by successively querying:
query {
address("0xA11CE") {
objectConnection(after: endCursor) {
edges { node { location } }
pageInfo { endCursor }
}
}
}Will iterate through a set of objects that is guaranteed to include O, regardless of whether the object was transferred before the page containing it was fetched or after. This ensures that queries that run over multiple requests still represent a consistent snapshot of the network.
This feature depends on the RPC service having access to data from historical checkpoints, which may be pruned (e.g. if the historical checkpoint has a sequence number lower than availableRange.last). If the checkpoint is pruned, cursors pointing at data in that checkpoint will become invalidated, causing subsequent queries using that cursor to fail.
The history that is retained in the RPC’s data store is configurable. Publicly available, free RPC services will aim to retain enough history to support queries on cursors that are a couple of hours old, whereas paid services can support older historical queries.
This kind of consistency only applies on a cursor-by-cursor basis, so if we run a similar query for Bob, in a separate request, after the transfer from Alice:
query {
address("0xB0B") {
objectConnection {
edges { node { location } }
pageInfo { bobsEndCursor: endCursor }
}
}
}And later paginate both sets of cursors:
query {
alice: address("0xA11CE") {
objectConnection(after: alicesEndCursor) {
edges { node { location } }
pageInfo { alicesEndCursor: endCursor }
}
}
bob: address("0xB0B") {
objectConnection(after: bobsEndCursor) {
edges { node { location } }
pageInfo { bobsEndCursor: endCursor }
}
}
}Then object O will appear in both Alice’s object set (paginating through cursors that were initially created before the transfer) and Bob’s object set (paginating through cursors that were created after the transfer), so although both sets are self-consistent, the overall query may not represent a consistent snapshot.
Limits and Metering
The flexibility that GraphQL offers comes with a risk of handling more complex nested queries, which could consume too many resources and result in a bad experience for other RPC users. This will be addressed through limits that are configurable per RPC instance:
- Limits on GraphQL query nesting depth and structure.
- Limits on response sizes and DB query complexity (the amount of compute used or pages read for all database queries tied to a particular request).
- Timeouts on queries
Mysten-operated, publicly available RPC endpoints will be configured with conservative limits (and transaction rate-limiting) to ensure fair use, but other RPC providers are free to adapt the limits they offer.
Service Architecture
This new service architecture is intended to remove the fullnode from the data serving path as well as providing a solution that lends itself more towards scalability.
Data will be ingested from a fullnode to a customizable indexer for processing. From there indexers can do any required processing before sending the data off to various different storage solutions. The image below shows one such indexer storing blob data (raw object contents or raw transactions) in a key/value store while sending relational data to a relational database. From there we can have any number of stateless RPC services running in order to process requests from clients, grabbing data from the requisite data store in order to service each request.
Some of the motivations for removing the fullnode from the data serving path are as follows:
- We found that fullnodes that have heavy read load can fall behind leading to serving stale data to clients.
- In the event that a fullnode has an issue and needs to be taken out of service, it could take an unknown amount of time to spin up another fullnode and have to catch up sufficiently in order to be able to properly service requests. With the above architecture RPC requests could still be served even if fullnodes were having issues.
- Due to storage limitations a single fullnode wouldn’t be able to store the full history of the chain, leveraging scalable storage solutions lets us service historical RPC requests.
- GraphQL Federation, makes it possible to add a custom data source and extend the stateless RPC service to also serve that data.
Fullnodes may still expose a limited API, e.g. to submit transactions, query the live object set, etc, for debugging purposes but the bulk of traffic is expected to be served via instances of the RPC service.
Data Ingestion
In order to facilitate third-party custom indexers and data processing pipelines we’re designing and implementing an Indexer Framework with a more efficient data API between a FN and an Indexer.
The framework is built to allow for third-parties to build their own Handlers which contain custom logic to process and index the chain data that they care about. At the time of writing, the trait is as follows:
#[async_trait::async_trait]
trait Handler: Send {
fn name(&self) -> &str;
async fn process_checkpoint(&mut self, checkpoint_data: &CheckpointData) -> anyhow::Result<()>;
}
pub struct CheckpointData {
pub checkpoint_summary: CertifiedCheckpointSummary,
pub checkpoint_contents: CheckpointContents,
pub transactions: Vec<CheckpointTransaction>,
}
pub struct CheckpointTransaction {
/// The input Transaction
pub transaction: Transaction,
/// The effects produced by executing this transaction
pub effects: TransactionEffects,
/// The events, if any, emitted by this transaciton during execution
pub events: Option<TransactionEvents>,
/// The state of all inputs to this transaction as they were prior to execution.
pub input_objects: Vec<Object>,
/// The state of all output objects created or mutated by this transaction.
pub output_objects: Vec<Object>,
}The latest version of this trait can be found here. Running a custom indexing pipeline involves:
- Running a fullnode, setting
enable_experimental_rest_apiconfig totruein itsfullnode.yamlfile. - Implementing your custom logic by implementing the
Handlertrait (above) - Starting an indexer using the provided Indexer Framework by providing:
- The URL of the FN to query data from
- Your custom
HandlerorHandlers - The checkpoint to start from
Further Work
There are some additional known improvements that we want to add to the RPC, but have been reserved for future releases:
Filtering Dynamic Fields by Key Type
On Sui one package can extend another package’s objects using dynamic fields. Objects that are designed to be extended this way can collect a number of dynamic fields of completely unrelated types and applications that extended an object with one set of types may only be interested in querying for dynamic fields with those types. Augmenting the dynamic field querying API with filters on the types of dynamic fields will allow applications to achieve this without over-fetching dynamic fields and filtering on the client side:
type Object {
dynamicFieldConnection(
first: Int,
after: String,
last: Int,
before: String,
filter: DynamicFieldFilter,
): DynamicFieldConnection
}
input DynamicFieldFilter {
# Cascading (type requires module requires package)
namePackage: SuiAddress
nameModule: String
nameType: String
# Cascading (type requires module requires package)
valuePackage: SuiAddress
valueModule: String
valueType: String
}Wrapped Object Indexing
Wrapped objects (objects that are held as fields or dynamic fields of other objects in the store) present similarly to deleted objects in RPC output, and consequently in Explorer too. This causes confusion, when an object is available but not by querying its on-chain address.
Wrapped object indexing tracks the location of objects that are embedded within other objects so that RPC can “find” an object’s contents even when it is wrapped. Similarly, it can be used to detect Bags and Tables to improve their representation in Explorer as well.
Reinterpreting Package IDs
If an upgraded package includes a new type, that type’s package ID will match the upgraded package’s, but types in the same package that were introduced in previous versions will retain their package IDs.
This complicates reads with filters on type: Constructing such filters requires clients to keep track of the package that introduced each type. This can be simplified using an implicit cast: A type can be supplied as 0xA::m::T and will be cast to 0xD::m::T where 0xA and 0xD are versions of the same package, with 0xD being the greatest version less than or equal to 0xA to introduce the type m::T.
This process saves significant book-keeping for clients who can now refer to all the types in a particular package by that package’s ID, and not by the IDs of the packages that introduced them.
Improvements to Dry Run
- Being able to view a stacktrace from Move when a transaction fails during dry-run would improve transaction debuggability.
- Provide a detailed breakdown of where gas was spent (per-transaction breakdown of computation cost for gas, per-object breakdown of storage costs and rebates, effects of bucketing).
- Offer
Displayoutput on objects in Dry Run.
Package Upgrade History
Currently, it can be difficult to track all the versions of a package, as each version has its own Object ID. This situation can be improved with dedicated APIs for fetching all versions of a specific package.
Verifiable Reads from Fullnode
Although the proposed architecture decouples the RPC service, indexer and storage layer from fullnodes, an RPC provider is still currently required to ingest data only from a fullnode that they trust (which often means RPC providers run their own fullnode).
A trustless interface between indexers and fullnode (where the indexer could verify the integrity of data it reads from any given fullnode) will remove this requirement, as it eliminates the risk that a node run by an adversary could “lie” to an indexer for its own benefit.
API for Exposing RPC Limits
Feature request from @FrankC01 for pysui.
Some of the validation steps that the RPC performs on transactions can be replicated on the client, by SDKs, to avoid sending requests that are guaranteed to fail. Not all validation can be moved to the client (for example, it’s difficult to predict timeouts, or estimated query complexity), but this does not diminish the value of avoiding hitting other limits ahead of time.
Facilitating this feature in SDKs requires exposing information about limits in its own API. The core RPC implementation will include the following parameters:
type Query {
serviceConfig: ServiceConfig
}
type ServiceConfig {
# Maximum level of nesting in a valid GraphQL query
maxQueryDepth: Int
# Maximum GraphQL requests that can be made per second from this service
maxRequestPerSecond: Int
# Maximum number of responses per page in paginated queries
maxPageSize: Int
}Node providers are free to extend this with their own limits, to help SDKs avoid hitting their domain-specific limits.