Skip to content

[Fix] cohere 파싱 방식 변경 (#38)#100

Merged
shinae1023 merged 1 commit into
mainfrom
feat/#38-mock-jobposting-with-data
Jun 18, 2026
Merged

[Fix] cohere 파싱 방식 변경 (#38)#100
shinae1023 merged 1 commit into
mainfrom
feat/#38-mock-jobposting-with-data

Conversation

@shinae1023

@shinae1023 shinae1023 commented Jun 18, 2026

Copy link
Copy Markdown
Member

✨ 어떤 이유로 PR를 하셨나요?

  • feature 병합
  • 버그 수정(아래에 issue #를 남겨주세요)
  • 코드 개선
  • 코드 수정
  • 배포
  • 기타(아래에 자세한 내용 기입해주세요)

📋 세부 내용 - 왜 해당 PR이 필요한지 작업 내용을 자세하게 설명해주세요

📸 작업 화면 스크린샷

⚠️ PR하기 전에 확인해주세요

  • 로컬테스트를 진행하셨나요?
  • 머지할 브랜치를 확인하셨나요?
  • 관련 label을 선택하셨나요?

🚨 관련 이슈 번호 [#38]

Summary by CodeRabbit

  • Refactor
    • Improved internal data processing and embedding response handling for better performance and reliability.

@shinae1023 shinae1023 self-assigned this Jun 18, 2026
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Two independent fixes: a single-line SQL correction in the Python corpus import script changes the selected column from dcm.detail_classification_id to ccm.detail_classification_id in the mapping lookup query; and CohereCorpusEmbeddingClient replaces record-based deserialization with a parseEmbeddings(String) method using ObjectMapper/JsonNode to extract float[] vectors from the raw Cohere response body.

Changes

Classification ID SQL Column Fix

Layer / File(s) Summary
SQL column alias correction
scripts/import_corpus.py
The mapping lookup query now selects ccm.detail_classification_id (aliased as id) instead of dcm.detail_classification_id.

Cohere Embedding Response Parsing Refactor

Layer / File(s) Summary
Jackson-based embedding response parsing
src/main/java/.../CohereCorpusEmbeddingClient.java
Adds Jackson imports and an injected ObjectMapper field; updates embed(...) to read the HTTP response as a raw String and pass it to a new parseEmbeddings(String) method; parseEmbeddings validates the body, traverses embeddings.float via JsonNode, and constructs float[] vectors, throwing IllegalStateException on malformed input; removes the former EmbedResponse/Embeddings record types.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 A column switched from dcm to ccm,
And records vanished, replaced by a JSON tree!
ObjectMapper hops through each float array node,
Parsing embeddings down the rabbit road.
Two tidy fixes, small and clean —
The tidiest diff a bunny's seen! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description uses the template structure but lacks critical details: all checkboxes remain unchecked, the PR reason/type is not selected, and the '세부 내용' (detailed content) section is empty. Only the related issue number (#38) is filled in. Select the appropriate PR type checkbox (appears to be 'bugfix'), provide detailed explanation of the Cohere parsing changes and their rationale in the '세부 내용' section, and complete the pre-merge checklist.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly indicates a fix to Cohere parsing approach, which aligns with the main changes to CohereCorpusEmbeddingClient parsing logic in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#38-mock-jobposting-with-data

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.java (2)

68-74: ⚡ Quick win

Remove unused toFloatArray() method.

This method is now dead code. The new parseEmbeddings() extracts floats directly from JsonNode (lines 94-97) without using this helper.

🧹 Proposed fix to remove dead code
-    private float[] toFloatArray(List<Double> values) {
-        float[] array = new float[values.size()];
-        for (int i = 0; i < values.size(); i++) {
-            array[i] = values.get(i).floatValue();
-        }
-        return array;
-    }
-
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.java`
around lines 68 - 74, The toFloatArray() method in CohereCorpusEmbeddingClient
is no longer used since the parseEmbeddings() method now extracts floats
directly from JsonNode without relying on this helper. Remove the entire
toFloatArray() method definition (the private method that takes a List<Double>
and returns a float array) as it is dead code that is no longer called anywhere
in the class.

101-103: 💤 Low value

Consider catching IOException instead of Exception.

Catching generic Exception can mask programming errors (e.g., NullPointerException, ArrayIndexOutOfBoundsException) that indicate bugs rather than malformed API responses. objectMapper.readTree() throws JsonProcessingException (extends IOException).

♻️ Proposed fix
-        } catch (Exception e) {
+        } catch (IOException e) {
             throw new IllegalStateException("Cohere 임베딩 응답 파싱에 실패했습니다.", e);
         }

This requires adding the import:

import java.io.IOException;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.java`
around lines 101 - 103, In the CohereCorpusEmbeddingClient class, replace the
generic `Exception` catch type with `IOException` in the catch block that
handles the objectMapper.readTree() call. This is more specific since
JsonProcessingException (thrown by objectMapper.readTree()) extends IOException,
and catching the more specific exception type prevents masking programming
errors like NullPointerException. Add the necessary import statement for
java.io.IOException at the top of the file.
scripts/import_corpus.py (1)

213-213: ⚡ Quick win

Consider removing the unnecessary join for clarity and performance.

The join to detail_classifications dcm appears redundant since:

  • You only select from ccm.detail_classification_id (line 211)
  • The WHERE clause filters only on ccm columns (lines 214-216)
  • The FK constraint on corpus_classification_mappings.detail_classification_id already ensures referential integrity

Removing the join would simplify the query and avoid an unnecessary table scan.

♻️ Proposed simplification
     row = fetch_one(
         cur,
         """
         select ccm.detail_classification_id as id
         from corpus_classification_mappings ccm
-        join detail_classifications dcm on dcm.id = ccm.detail_classification_id
         where ccm.source_job_group_l1 = %s
           and ccm.source_job_family_l2 = %s
           and ccm.source_role_l3 = %s
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/import_corpus.py` at line 213, Remove the redundant join to the
detail_classifications table (the join clause that references dcm.id =
ccm.detail_classification_id) from the SQL query in the import_corpus.py script.
Since no columns from the detail_classifications table are being selected in the
query and the WHERE clause only filters on ccm columns, this join is unnecessary
and can be safely deleted to simplify the query and improve performance.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@scripts/import_corpus.py`:
- Line 213: Remove the redundant join to the detail_classifications table (the
join clause that references dcm.id = ccm.detail_classification_id) from the SQL
query in the import_corpus.py script. Since no columns from the
detail_classifications table are being selected in the query and the WHERE
clause only filters on ccm columns, this join is unnecessary and can be safely
deleted to simplify the query and improve performance.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.java`:
- Around line 68-74: The toFloatArray() method in CohereCorpusEmbeddingClient is
no longer used since the parseEmbeddings() method now extracts floats directly
from JsonNode without relying on this helper. Remove the entire toFloatArray()
method definition (the private method that takes a List<Double> and returns a
float array) as it is dead code that is no longer called anywhere in the class.
- Around line 101-103: In the CohereCorpusEmbeddingClient class, replace the
generic `Exception` catch type with `IOException` in the catch block that
handles the objectMapper.readTree() call. This is more specific since
JsonProcessingException (thrown by objectMapper.readTree()) extends IOException,
and catching the more specific exception type prevents masking programming
errors like NullPointerException. Add the necessary import statement for
java.io.IOException at the top of the file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 38031f2a-b32e-4221-8494-0d95b6b9bcfd

📥 Commits

Reviewing files that changed from the base of the PR and between 371b777 and 476b985.

📒 Files selected for processing (2)
  • scripts/import_corpus.py
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.java

@shinae1023 shinae1023 merged commit 3e9fb48 into main Jun 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant