Support tiered data storage#692
Conversation
|
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
|
🔔 1st Reminder Hey @tnull! This PR has been waiting for your review. |
29f47f3 to
264aa7f
Compare
|
🔔 2nd Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 3rd Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 4th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 5th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 6th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 7th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 8th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 9th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 10th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 11th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 12th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 13th Reminder Hey @tnull! This PR has been waiting for your review. |
264aa7f to
493dd9a
Compare
|
🔔 14th Reminder Hey @tnull! This PR has been waiting for your review. |
a30cbfb to
1e7bdbc
Compare
|
🔔 15th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 16th Reminder Hey @tnull! This PR has been waiting for your review. |
There was a problem hiding this comment.
Thanks for looking into this and excuse the delay here!
I did a first pass, generally this looks already pretty good, but it is a huge PR and in some areas could be simplified. For instance, we should drop the generic Retry logic as the concrete implementations would already implement that if they need it. Likewise, we shouldn't fallback to the backup for now as it's really only meant as disaster recovery (KVStore caching is out-of-scope for this PR now, even though we might want to explore that soon, too). There is also no need to replicate the write-ordering locks in TierStore, but we'll need them in ForeignKVStoreAdapter (which might be mergeable with DynStore?).
Generally, if you find opportunities to reduce the size of the changeset here it would be appreciated. It would also be cool if you could try to break up the PR into more feature commits, but feel free to leave as is if not.
|
|
||
| /// Configuration for exponential backoff retry behavior. | ||
| #[derive(Debug, Copy, Clone)] | ||
| pub struct RetryConfig { |
There was a problem hiding this comment.
Retrying is pretty specific to the particular KVStore implementation. I don't think we should expose retrying params on top of what we already do for implementations internally.
There was a problem hiding this comment.
Sure thing. This has been dropped.
I considered it because users can choose their own KVStore implementation without considering retrying. I thought to add some level of control but the added complexity and size of the changeset might not be worth it.
| } | ||
|
|
||
| pub struct TierStoreInner { | ||
| /// For remote data. |
| [Throws=BuildError] | ||
| Node build_with_vss_store_and_header_provider(NodeEntropy node_entropy, string vss_url, string store_id, VssHeaderProvider header_provider); | ||
| [Throws=BuildError] | ||
| Node build_with_tier_store(NodeEntropy node_entropy, DynStore primary_store); |
There was a problem hiding this comment.
Can we now also expose Builder::build_with_store?
There was a problem hiding this comment.
Yes we can.
This is exposed here 451a392
| Self { inner: Arc::new(adapter) } | ||
| } | ||
|
|
||
| pub fn from_ldk_store(store: Arc<dyn LdkSyncAndAsyncKVStore + Send + Sync>) -> Arc<Self> { |
There was a problem hiding this comment.
If we rather make this an impl From<Arc<dyn LdkSyncAndAsyncKVStore + Send + Sync>> for Arc .., we can drop the wrap_storemacro and just use.into()` instead.
| } | ||
|
|
||
| let store = wrap_store!(Arc::new(tier_store)); | ||
| self.build_with_store(node_entropy, store) |
There was a problem hiding this comment.
I think we need to use build_with_store_internal here to make sure we're using the same Runtime etc.
| Arc::clone(&outer_lock.entry(locking_key).or_default()) | ||
| } | ||
|
|
||
| async fn execute_locked_write< |
There was a problem hiding this comment.
I'm confused, why are we replicating all the write-ordering logic here? Given the implementations are required to fullfill LDK's requirements already, can't we just call the inner write and be done with it?
There was a problem hiding this comment.
Again, the reasoning here is that users might not implement the KVStore trait for their own stores as required and thus, the onus of correctness will fall to us. If we can't get it from their inner stores, the wrapper TierStore should provide some guarantees. I do agree that if the inner store already satisfies LDK's requirements, the wrapper shouldn't need to duplicate that logic. The write-ordering has been removed.
There was a problem hiding this comment.
I think I have to eat my words here: it seems we have to implement key-level versioning for TierStore too as the returned write/remove futures are not immediately polled.
Codex:
- P1 /home/tnull/workspace/ldk-node-pr-692/src/io/tier_store.rs:100: TierStore::write/remove delays delegation until the returned future is polled, so it
loses the existing stores’ “latest call wins” ordering. Example: create old = write(old), then new = write(new), await new, then await old; TierStore will
write old last. Existing SqliteStore assigns the version at call time and skips stale writes (/home/tnull/workspace/ldk-node-pr-692/src/io/sqlite_store/
mod.rs:79, /home/tnull/workspace/ldk-node-pr-692/src/io/sqlite_store/mod.rs:603). This can corrupt async channel monitor persistence by letting an older
pending update overwrite or remove newer state. TierStore needs its own call-time per-key versioning before returning the future.
| } | ||
| } | ||
|
|
||
| #[async_trait] |
There was a problem hiding this comment.
Probably not worth taking a dependency just to desugar impl Future<Output = Result<(), IOError>> + Send + Sync + 'a.
There was a problem hiding this comment.
Probably not worth taking a dependency just to desugar
impl Future<Output = Result<(), IOError>> + Send + Sync + 'a.
Well, it's 'the official'/supported way to do async traits with Uniffi: https://mozilla.github.io/uniffi-rs/latest/futures.html#exporting-async-trait-methods
There was a problem hiding this comment.
Does uniffi require that in some way? Its just a comically-overkill way to sugar impl Future which is kinda nuts...
There was a problem hiding this comment.
Oh, worse, it looks like async_trait desugars to the Pin<Box<dyn ...>> version...which is dumb but at least for uniffi it shouldn't matter cause we'd have to do that anyway.
There was a problem hiding this comment.
Unfortunately this will need a substantial rebase now that #696 landed, sorry for that!
Besides some tedious rebase work, we now have DynStoreWrapper which can be used on the ffi and the non-ffi side. I do wonder if we can actually just merge that wrapper with the TierStore, as well as the ForeignKVStoreAdapter/DynStore. Seems all these wrapper structs may not be needed and we could just get away with one trait and one wrapper struct, mostly?
|
🔔 3rd Reminder Hey @tnull! This PR has been waiting for your review. |
8dbb312 to
0ae61b7
Compare
|
🔔 4th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 5th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 6th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 7th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 8th Reminder Hey @tnull! This PR has been waiting for your review. |
|
🔔 9th Reminder Hey @tnull! This PR has been waiting for your review. |
tnull
left a comment
There was a problem hiding this comment.
Excuse the delay here! Mostly looks good, just a few comments.
| // http://www.apache.org/licenses/LICENSE-2.0> or the MIT license <LICENSE-MIT or | ||
| // http://opensource.org/licenses/MIT>, at your option. You may not use this file except in | ||
| // accordance with one or both of these licenses. | ||
| #![allow(dead_code)] // TODO: Temporal warning silencer. Will be removed in later commit. |
There was a problem hiding this comment.
The fixup for this seems to be in the wrong commit.
deb81d0 to
e53deb6
Compare
|
Hi @tnull |
4e221ed to
cb7b044
Compare
cb7b044 to
6d8fa02
Compare
|
🔔 1st Reminder Hey @tnull @TheBlueMatt! This PR has been waiting for your review. |
1 similar comment
|
🔔 1st Reminder Hey @tnull @TheBlueMatt! This PR has been waiting for your review. |
cadf81f to
1f2cd4d
Compare
|
🔔 2nd Reminder Hey @tnull @TheBlueMatt! This PR has been waiting for your review. |
1 similar comment
|
🔔 2nd Reminder Hey @tnull @TheBlueMatt! This PR has been waiting for your review. |
tnull
left a comment
There was a problem hiding this comment.
Feel free to squash fixups!
Unfortunately, cargo test --features uniffi currently fails to compile currently. I think we might also need to revert to implement the same key-level versioning pattern as we have in other stores afterall. Excuse the back-and-forth.
| primary_namespace, | ||
| secondary_namespace, | ||
| key, | ||
| buf.clone(), |
There was a problem hiding this comment.
Hmm, can we avoid cloning if no backup store is set?
| Arc::clone(&outer_lock.entry(locking_key).or_default()) | ||
| } | ||
|
|
||
| async fn execute_locked_write< |
There was a problem hiding this comment.
I think I have to eat my words here: it seems we have to implement key-level versioning for TierStore too as the returned write/remove futures are not immediately polled.
Codex:
- P1 /home/tnull/workspace/ldk-node-pr-692/src/io/tier_store.rs:100: TierStore::write/remove delays delegation until the returned future is polled, so it
loses the existing stores’ “latest call wins” ordering. Example: create old = write(old), then new = write(new), await new, then await old; TierStore will
write old last. Existing SqliteStore assigns the version at call time and skips stale writes (/home/tnull/workspace/ldk-node-pr-692/src/io/sqlite_store/
mod.rs:79, /home/tnull/workspace/ldk-node-pr-692/src/io/sqlite_store/mod.rs:603). This can corrupt async channel monitor persistence by letting an older
pending update overwrite or remove newer state. TierStore needs its own call-time per-key versioning before returning the future.
| { | ||
| Ok(data) => Ok(data), | ||
| Err(e) => { | ||
| log_error!( |
There was a problem hiding this comment.
Codex:
- P3 /home/tnull/workspace/ldk-node-pr-692/src/io/tier_store.rs:197: read_primary logs every read error as ERROR, including normal NotFound startup paths.
Since the builder now wraps every store in TierStore (/home/tnull/workspace/ldk-node-pr-692/src/builder.rs:897), fresh nodes will log expected missing
node_metrics, graph, scorer, event queue, peers, etc. as errors before callers handle them as defaults. The wrapper should avoid error-level logging for
NotFound and generally let callers decide.
This commit adds `TierStore`, a tiered `KVStore` implementation that routes node persistence across three storage roles: - a primary store for durable, authoritative data - an optional backup store for a second durable copy of primary-backed data - an optional ephemeral store for rebuildable cached data such as the network graph and scorer TierStore routes ephemeral cache data to the ephemeral store when configured, while durable data remains primary+backup. Reads and lists do not consult the backup store during normal operation. For primary+backup writes and removals, this implementation treats the backup store as part of the persistence success path rather than as a best-effort background mirror. Earlier designs used asynchronous backup queueing to avoid blocking the primary path, but that weakens the durability contract by allowing primary success to be reported before backup persistence has completed. TierStore now issues primary and backup operations together and only returns success once both complete. This gives callers a clearer persistence guarantee when a backup store is configured: acknowledged primary+backup mutations have been attempted against both durable stores. The tradeoff is that dual-store operations are not atomic across stores, so an error may still be returned after one store has already been updated. Additionally, adds unit coverage for the current contract, including: - basic read/write/remove/list persistence - routing of ephemeral data away from the primary store - backup participation in the foreground success path for writes and removals
Add native builder support for tiered storage by introducing `TierStoreConfig` and builder methods for configuring ephemeral storage and a local SQLite backup mirror. During node construction, wrap the configured primary store in `TierStore` and attach secondary tiers for cache-like ephemeral data and mirrored durable backup writes. The builder constructs the backup and ephemeral stores internally using a dedicated SQLite database file. Add test coverage for full-cycle backup mirroring, same-path rejection, and UniFFI-backed builder configuration. Update `setup_builder!` so FFI-backed builder tests can use mutable configuration helpers.
…ightningdevkit#871 - Added temporary #[allow(dead_code)] annotations to NodeBuilder::set_backup_storage_dir_path and NodeBuilder::set_ephemeral_storage_dir_path - Keeps these tiered storage builder setters compilable while they remain unused before dropping in lightningdevkit#871
1f2cd4d to
d424891
Compare
|
🔔 3rd Reminder Hey @TheBlueMatt! This PR has been waiting for your review. |
avoid buffer clone if backup isn't configured
What this PR does
In this PR we introduce
TierStore, a three-tiered (KVStore+KVStoreSync) implementation that manages data across three distinct storage layers based on criticality.Background
As we have moved towards supporting remote storage with
VssStore, we need to recognize that not all data has the same storage requirements. Currently, all data goes to a single store which creates some problems:This PR proposes tiered storage that provides granular control over where different data types are stored. The tiers include:
Additionally, we also permit the configuration of
Nodewith tiered storage allowing callers to:Nodewith a primary store.These configuration options also extend to our foreign interface, allowing bindings target to build the
Nodewith their own (KVStore+KVStoreSync) implementations. A sample Python implementation is provided and tested.Concerns
VssStorehas built-in retry logic. Wrapping it inTierStorecreates nested retries.KVStoreto the FFIRelated Issues