SOLR-18290: Support configurable fusion candidate pool in Combined Query#4546
SOLR-18290: Support configurable fusion candidate pool in Combined Query#4546ercsonusharma wants to merge 2 commits into
Conversation
| public void testMultipleLexicalQuery() throws Exception { | ||
| prepareIndexDocs(); | ||
| String jsonQuery = | ||
| "{\"queries\":" |
There was a problem hiding this comment.
converted many test queries in this class as json string.
|
Hi David, when you get a chance, could you please take a look at this PR? Thank you, @dsmiley |
dsmiley
left a comment
There was a problem hiding this comment.
Interesting conundrum.
Can't we use the existing org.apache.solr.common.params.ShardParams#SHARDS_ROWS for this use-case?
I like to seek re-use/expansion of existing params instead of adding yet another bespoke param.
I can't tell but would a hypothetical queryResultWindowSize of say 20 mean that a page (rows) of 10 on first & second pages (start=0 & start=10) should get consistent results, even without this PR?
| if (depthParam > maxQueryDepth) { | ||
| throw new SolrException( | ||
| SolrException.ErrorCode.BAD_REQUEST, | ||
| CombinerParams.COMBINER_QUERY_DEPTH | ||
| + "=" | ||
| + depthParam | ||
| + " exceeds configured maxQueryDepth=" | ||
| + maxQueryDepth); | ||
| } |
There was a problem hiding this comment.
why bother enforce a max query depth? Solr doesn't stop a user from requesting a bajillion rows, after all. And the value will come from a ~colleague programmer... not some random internet user.
I thought about this but since it was specific to rrf and have to put some limit, I chose this. But since, we don't need the limit so we can re-use that param with a minor fix at mergeIds which is the actual bottle-neck.
The inconsistency in combined query paging happens at the coordinator's RRF step, not the shard query step. For above example, |
https://issues.apache.org/jira/browse/SOLR-18290
Description
Add combiner.queryDepth request parameter to the combined-query / RRF flow. It controls how many candidate documents each subquery fetches from each shard for fusion, decoupled from start + rows.
Holding queryDepth constant while paging keeps the underlying candidate pool and therefore the fused ranking stable across pages.
Solution
The combined-query coordinator already issues a single shard request per shard carrying every combiner.query=... key. Each shard runs all subqueries locally with the request's rows value. So per-subquery depth is governed by what the outer ResponseBuilder.shards_rows carries to createMainQuery.
Tests
Updated
DistributedCombinedQueryComponentTest#testHybridQueryWithPaginationto exercise the new param: same multi-subquery JSON request issued with and without combiner.queryDepth, asserting (a) returned doc count matches limit, (b) ordering matches RRF expectations for the configured depth.Validation paths (combiner.queryDepth=0, combiner.queryDepth > maxQueryDepth) covered by negative-path assertions.
Existing CombinedQueryComponent and RRF tests run green. no behavior change when combiner.queryDepth is absent.
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.