Skip to content

Conversation

@peterbroadhurst
Copy link
Contributor

Proposed fix for #512

By implementing the UpsertOptimization pattern for Groups, it lets us state UpsertOptimizationNew everywhere we're currently just trying to do a query-then-insert approach. UpsertOptimizationNew results in an attempt-insert, then attempt-update approach - which should handle+swallow a duplicate key exception (rather than rolling back the TX).

@peterbroadhurst peterbroadhurst marked this pull request as draft February 11, 2022 22:31
@peterbroadhurst
Copy link
Contributor Author

Moving back to draft until we have a good perf run

@dechdev
Copy link
Contributor

dechdev commented Feb 11, 2022

Seeing multiple errors are startup, no messages being confirmed:

{"log":"[2022-02-11T22:28:34.787Z] ERROR SQL insert failed: pq: duplicate key value violates unique constraint \"groups_hash\" sql=[ INSERT INTO groups (message_id,namespace,name,ledger,hash,created) VALUES ($1,$2,$3,$4,$5,$6)  RETURNING seq ]: pq: duplicate key value violates unique constraint \"groups_hash\" dbtx=XyK9KdgB pid=1 role=aggregator\n","stream":"stderr","time":"2022-02-11T22:28:34.787352257Z"}
{"log":"[2022-02-11T22:28:34.787Z] ERROR SQL query failed: pq: current transaction is aborted, commands ignored until end of transaction block sql=[ SELECT hash FROM groups WHERE hash = $1 ] dbtx=XyK9KdgB pid=1 role=aggregator\n","stream":"stderr","time":"2022-02-11T22:28:34.787966625Z"}
{"log":"[2022-02-11T22:28:34.788Z] ERROR process events attempt 1: FF10115: Database query failed: pq: current transaction is aborted, commands ignored until end of transaction block pid=1 role=ep[ff_system:ff_aggregator]\n","stream":"stderr","time":"2022-02-11T22:28:34.78855805Z"}
{"log":"[2022-02-11T22:28:34.896Z] ERROR SQL insert failed: pq: duplicate key value violates unique constraint \"groups_hash\" sql=[ INSERT INTO groups (message_id,namespace,name,ledger,hash,created) VALUES ($1,$2,$3,$4,$5,$6)  RETURNING seq ]: pq: duplicate key value violates unique constraint \"groups_hash\" dbtx=HxE1q4ED pid=1 role=aggregator\n","stream":"stderr","time":"2022-02-11T22:28:34.896982147Z"}

image

ff-perf.log
log_ethconnect_0.log
log_firefly_core_0.log

@peterbroadhurst
Copy link
Contributor Author

That's a shame, so it looks like PSQL was aborting the transaction on our behalf. So this approach does not allow concurrent insert.

@peterbroadhurst
Copy link
Contributor Author

Checking version of postgres in the test, due to this fixed in 9.5: https://www.symmetricds.org/issues/view.php?id=4823#bugnotes

@peterbroadhurst
Copy link
Contributor Author

peterbroadhurst commented Feb 11, 2022

Ah - need to explicitly add ON CONFLICT DO NOTHING

@peterbroadhurst
Copy link
Contributor Author

peterbroadhurst commented Feb 12, 2022

Got a little complicate, but I believe I'm there 🤞 - I've only changed Group as I wanted to limit the scope of the change. However, this could apply to other upsert optimization cases too.

@codecov-commenter
Copy link

codecov-commenter commented Feb 12, 2022

Codecov Report

Merging #513 (55fa3e2) into main (9a38438) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #513   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          275       275           
  Lines        15788     15812   +24     
=========================================
+ Hits         15788     15812   +24     
Impacted Files Coverage Δ
internal/database/sqlcommon/provider.go 100.00% <ø> (ø)
internal/database/postgres/postgres.go 100.00% <100.00%> (ø)
internal/database/sqlcommon/group_sql.go 100.00% <100.00%> (ø)
internal/database/sqlcommon/sqlcommon.go 100.00% <100.00%> (ø)
internal/database/sqlite3/sqlite3.go 100.00% <100.00%> (ø)
internal/privatemessaging/groupmanager.go 100.00% <100.00%> (ø)
internal/dataexchange/ffdx/ffdx.go 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9a38438...55fa3e2. Read the comment docs.

@dechdev
Copy link
Contributor

dechdev commented Feb 12, 2022

Now only seeing one error on startup. Appears to be an edge case where there is an empty table:

ERROR SQL insert failed: sql: no rows in result set sql=[ INSERT INTO groups (message_id,namespace,name,ledger,hash,created) VALUES ($1,$2,$3,$4,$5,$6)  ON CONFLICT DO NOTHING RETURNING seq ]: sql: no rows in result set dbtx=p0vfsbRI pid=1 role=aggregator\n","stream":"stderr","time":"2022-02-12T00:36:49.275111472Z"}

ff-perf.log
log_ethconnect_0.log
log_firefly_core_0.log.zip

@peterbroadhurst
Copy link
Contributor Author

Thanks @eberger727 - this is actually exactly what I hoped for. It proves the fix.

This error didn't abort the TX - we can see p0vfsbRI succeeds later in the log:

{"log":"[2022-02-12T00:36:49.324Z] DEBUG SQL-\u003e commit dbtx=p0vfsbRI pid=1 role=aggregator\n","stream":"stderr","time":"2022-02-12T00:36:49.324439055Z"}
{"log":"[2022-02-12T00:36:49.325Z] DEBUG SQL\u003c- commit dbtx=p0vfsbRI pid=1 role=aggregator\n","stream":"stderr","time":"2022-02-12T00:36:49.325446455Z"}

After it gets that INSERT failure it falls back to SELECT:

{"log":"[2022-02-12T00:36:49.275Z] DEBUG SQL-\u003e query: SELECT hash FROM groups WHERE hash = $1 dbtx=p0vfsbRI pid=1 role=aggregator\n","stream":"stderr","time":"2022-02-12T00:36:49.27539061Z"}

... and then update:

{"log":"[2022-02-12T00:36:49.277Z] DEBUG SQL\u003c- update affected=1 dbtx=p0vfsbRI pid=1 role=aggregator\n","stream":"stderr","time":"2022-02-12T00:36:49.277715375Z"}

@peterbroadhurst
Copy link
Contributor Author

Opening back up for @nguyer or @awrichar to review

@peterbroadhurst peterbroadhurst marked this pull request as ready for review February 12, 2022 00:51
Signed-off-by: Peter Broadhurst <[email protected]>
@peterbroadhurst peterbroadhurst merged commit 7b0ff81 into hyperledger:main Feb 12, 2022
@peterbroadhurst peterbroadhurst deleted the concurrent-groups branch February 12, 2022 20:09
@peterbroadhurst peterbroadhurst restored the concurrent-groups branch February 12, 2022 20:09
@peterbroadhurst peterbroadhurst deleted the concurrent-groups branch February 12, 2022 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants