perf: read each source file once, restore per-file parallelism, use Foundation .strings parser (~4× faster)#4
Open
memoto wants to merge 1 commit into
Conversation
…oundation .strings parser (~4× faster)
Four independent optimizations, each profiled separately:
1. Read each source file once
SourceFileChecker.start() previously called fastCheck(), which loaded
and decoded the file just to scan for the literal marker, then re-loaded
the same file for SwiftParser. Now the file is read once and the string
is reused for both the early-exit check and parse.
2. Replace the hand-rolled .strings parser
The old tokenizer split on ';', which mis-parsed values containing
semicolons and was slow due to repeated String allocations. Replaced
with Foundation's NSDictionary(contentsOf:error:) — faster and correct.
3. Restore real parallelism in SourceFileBatchChecker
The type was an actor in 0.1.13. Every group.addTask closure hopped back
onto the actor's serial executor, so chunking added overhead without
delivering any parallelism. Reverted to final class & Sendable and
switched from fixed-size chunks to one task per file — Swift Concurrency's
cooperative pool then load-balances SwiftParser work across cores.
4. Smaller cleanups
- Cache the nextToken(viewMode:) walk in LocalizeParser.visit (the
original walked twice in a row per node).
- Use a Set for unused-key detection (was Array.contains(where:), O(N·M)).
- Drop debug print() calls from LocalizeBundle.init.
- Stream SourceFilesTraversalTrait.parseSourceDirectory with a for-in
loop instead of chaining lazy filters that allocate intermediate arrays.
Benchmarked on Apple Silicon, corpus: 400 Swift files (~1.85 MB), 600 bundle keys.
Baseline avg: 8.020 s → Optimized avg: 1.930 s (≈4.1× speed-up).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
LocalizeCheckerruns as a build-phase script on every Debug build of large iOS apps. Profiling showed several real, fixable inefficiencies in the implementation.Benchmarked on Apple Silicon, corpus: 400 Swift files (~1.85 MB), 600 bundle keys, ×10 XCTest.measure runs:
What changed
1. Read each source file once
SourceFileChecker.start()previously calledfastCheck(), which loaded and UTF-8-decoded the file just to substring-check for.localized, then re-loaded the same file for SwiftParser. Now the file is read once and the string is reused for both the early-exit check and the parse.2. Replace the hand-rolled
.stringsparserThe old tokenizer split on
;and used per-entryString.init+ twotrimmingCharacterscalls — slow, and mis-parsed values that contain;. Replaced with Foundation'sNSDictionary(contentsOf:error:)— faster and correct. The bug is covered by a new correctness test in the companion PR to the consuming repo.3. Restore real parallelism in
SourceFileBatchChecker0.1.13 made this type an
actor. Everygroup.addTask { try await self.processBatch(...) }closure hopped back onto the actor's serial executor, so chunking added overhead without delivering any parallelism. Reverted tofinal class & Sendableand switched from fixed-size chunks to one task per file — Swift Concurrency's cooperative pool then load-balances CPU-bound SwiftParser work across cores.4. Smaller cleanups
nextToken(viewMode:)walk inLocalizeParser.visit(the original walked the syntax tree twice per node).Setfor unused-key detection (wasArray.contains(where:), O(N·M) on large bundles).print()calls fromLocalizeBundle.initthat fired on every build.SourceFilesTraversalTrait.parseSourceDirectorywith afor-in loop instead of chaining lazy filters that allocate intermediate arrays.Correctness
All existing public API shapes are preserved. The only breaking change is that
SourceFileBatchCheckeris no longer anactor— callers thatawaitits methods still compile; theasyncqualifier is retained where the method itself spawns tasks.🤖 Generated with Claude Code