Skip to content

SOLR-18290: Support configurable fusion candidate pool in Combined Query#4546

Open
ercsonusharma wants to merge 2 commits into
apache:mainfrom
ercsonusharma:feat_combined_query_depth
Open

SOLR-18290: Support configurable fusion candidate pool in Combined Query#4546
ercsonusharma wants to merge 2 commits into
apache:mainfrom
ercsonusharma:feat_combined_query_depth

Conversation

@ercsonusharma

Copy link
Copy Markdown
Contributor

https://issues.apache.org/jira/browse/SOLR-18290

Description

Add combiner.queryDepth request parameter to the combined-query / RRF flow. It controls how many candidate documents each subquery fetches from each shard for fusion, decoupled from start + rows.
Holding queryDepth constant while paging keeps the underlying candidate pool and therefore the fused ranking stable across pages.

Solution

The combined-query coordinator already issues a single shard request per shard carrying every combiner.query=... key. Each shard runs all subqueries locally with the request's rows value. So per-subquery depth is governed by what the outer ResponseBuilder.shards_rows carries to createMainQuery.

Tests

  • Updated DistributedCombinedQueryComponentTest#testHybridQueryWithPagination to exercise the new param: same multi-subquery JSON request issued with and without combiner.queryDepth, asserting (a) returned doc count matches limit, (b) ordering matches RRF expectations for the configured depth.

  • Validation paths (combiner.queryDepth=0, combiner.queryDepth > maxQueryDepth) covered by negative-path assertions.

  • Existing CombinedQueryComponent and RRF tests run green. no behavior change when combiner.queryDepth is absent.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide
  • I have added a changelog entry for my change

@github-actions github-actions Bot added documentation Improvements or additions to documentation client:solrj tests cat:search labels Jun 23, 2026
Comment thread solr/solrj/src/java/org/apache/solr/common/params/CombinerParams.java Outdated
public void testMultipleLexicalQuery() throws Exception {
prepareIndexDocs();
String jsonQuery =
"{\"queries\":"

@ercsonusharma ercsonusharma Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converted many test queries in this class as json string.

@ercsonusharma

Copy link
Copy Markdown
Contributor Author

Hi David, when you get a chance, could you please take a look at this PR? Thank you, @dsmiley

@dsmiley dsmiley left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting conundrum.

Can't we use the existing org.apache.solr.common.params.ShardParams#SHARDS_ROWS for this use-case?

I like to seek re-use/expansion of existing params instead of adding yet another bespoke param.

I can't tell but would a hypothetical queryResultWindowSize of say 20 mean that a page (rows) of 10 on first & second pages (start=0 & start=10) should get consistent results, even without this PR?

Comment thread solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java Outdated
Comment on lines +186 to +194
if (depthParam > maxQueryDepth) {
throw new SolrException(
SolrException.ErrorCode.BAD_REQUEST,
CombinerParams.COMBINER_QUERY_DEPTH
+ "="
+ depthParam
+ " exceeds configured maxQueryDepth="
+ maxQueryDepth);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother enforce a max query depth? Solr doesn't stop a user from requesting a bajillion rows, after all. And the value will come from a ~colleague programmer... not some random internet user.

@ercsonusharma

ercsonusharma commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Can't we use the existing org.apache.solr.common.params.ShardParams#SHARDS_ROWS for this use-case?

I thought about this but since it was specific to rrf and have to put some limit, I chose this. But since, we don't need the limit so we can re-use that param with a minor fix at mergeIds which is the actual bottle-neck.

I can't tell but would a hypothetical queryResultWindowSize of say 20 mean that a page (rows) of 10 on first & second pages (start=0 & start=10) should get consistent results, even without this PR?

The inconsistency in combined query paging happens at the coordinator's RRF step, not the shard query step. For above example, queryResultWindowSize=20 would just mean each shard's queryResultCache happens to already hold 20 docs, so the page-2 shard request is a cache hit instead of a re-execution. But the coordinator still receives 10 candidates on page 1 and 20 on page 2, and RRF over a 10-doc pool for page 1 produces different scores/ordering than RRF over a 20-doc pool.

@github-actions github-actions Bot removed documentation Improvements or additions to documentation client:solrj labels Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants