perf: compute CFG in Rust native engine for all languages#342
perf: compute CFG in Rust native engine for all languages#342carlos-alm wants to merge 2 commits intomainfrom
Conversation
Add walk_ast_nodes_with_config() to helpers.rs with per-language configs for node type mappings (new, throw, await, string, regex). Each non-JS extractor now calls this during extract(), producing astNodes for Python, Go, Rust, Java, C#, Ruby, and PHP. On the JS side, buildAstNodes() now checks symbols.astNodes first (all languages) before falling back to WASM tree walk (JS/TS/TSX only). This eliminates the WASM dependency for AST extraction when using the native engine. Expected impact: astMs drops from ~651ms to ~50ms (DB inserts only) for native builds once the binary is compiled. Impact: 14 functions changed, 11 affected
Port buildFunctionCFG algorithm from JS (cfg.js) to Rust (cfg.rs) with per-language CfgRules for all 8 supported languages. Each extractor now calls build_function_cfg() on function/method AST nodes during extraction, storing the CFG directly on the Definition struct. JS pipeline updated to use native CFG data when available (def.cfg), falling back to WASM tree walk only when native data is absent. This eliminates the need for WASM re-parsing in the CFG phase for native engine builds. Rust changes: - New cfg.rs module with CfgRules struct and 8 language configs - CfgBlock, CfgEdge, CfgData types in types.rs - All extractors call build_function_cfg for function/method defs JS changes: - parser.js normalizeNativeSymbols maps def.cfg through - cfg.js buildCFGData checks def.cfg before WASM fallback - Skips WASM parser init when all defs have native CFG Tests: 1437 pass, new cfg-all-langs.test.js with JS-side mock tests and native parity tests (block/edge count + type matching). Impact: 38 functions changed, 56 affected
|
Claude finished @carlos-alm's task —— View job Reviewing PR: perf: compute CFG in Rust native engine for all languagesConducted comprehensive PR review following the 4-phase analysis process:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Performance optimization in CFG (Control Flow Graph) building phase. Current JS implementation does WASM re-parsing (~169ms) even when native AST data is already available from the Rust engine. This creates unnecessary double-parsing overhead. Approach Justification: This is a legitimate performance optimization that eliminates redundant WASM tree-walking by computing CFG data during the initial native extraction phase. The approach maintains backward compatibility with WASM fallback when native data is unavailable. This addresses a real bottleneck, not masking symptoms. Phase 2: Configuration Change Detection ✅No High-Risk Configuration Changes Detected:
Phase 3: Test Integrity ✅Test Coverage Assessment:
Phase 4: Code Quality ✅Rust Implementation Analysis:
Critical Issue Identified:
Minor Issues:
Root Cause AnalysisProblem Being Solved: Legitimate performance bottleneck - double WASM parsing when native data available Critical ConcernsLanguage ID Inconsistency: C# language ID mismatch will cause silent CFG extraction failures in production Final Recommendation
Required Changes:
|
Greptile SummaryThis PR successfully ports CFG (Control Flow Graph) building from JavaScript to Rust for all 8 supported languages, eliminating WASM re-parsing in the CFG phase for native builds. The implementation adds ~1045 lines in Key changes:
Expected impact: cfgMs reduction from ~169ms to ~20ms (DB inserts only) for native builds Test coverage: JS-side mock tests pass locally; native parity tests (21 tests) will run in CI Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Parser as parser.js
participant Native as Rust Native Engine
participant Extractor as Language Extractor
participant CFG as cfg.rs
participant JS_CFG as cfg.js (buildCFGData)
participant WASM as WASM Parser (fallback)
Note over Parser,WASM: Native Path (New)
Parser->>Native: parseFilesAuto(files)
Native->>Extractor: extract(tree, source)
loop For each function/method
Extractor->>CFG: build_function_cfg(node, lang_id)
CFG-->>Extractor: Option<CfgData> (blocks + edges)
end
Native-->>Parser: FileSymbols with def.cfg
Parser->>Parser: normalizeNativeSymbols(result)
Parser->>JS_CFG: buildCFGData(db, fileSymbols)
JS_CFG->>JS_CFG: Check def.cfg?.blocks?.length
alt Has native CFG
JS_CFG->>JS_CFG: Use def.cfg directly
JS_CFG->>JS_CFG: Insert blocks/edges to DB (~20ms)
else No native CFG (fallback)
JS_CFG->>WASM: Initialize parser + buildFunctionCFG
WASM-->>JS_CFG: CFG from WASM tree walk
JS_CFG->>JS_CFG: Insert blocks/edges to DB (~169ms)
end
Last reviewed commit: ac418b2 |
| end_line: Some(end_line(&member)), | ||
| decorators: None, | ||
| complexity: None, | ||
| cfg: build_function_cfg(&member, "go"), |
There was a problem hiding this comment.
Interface methods (method_elem in interface_type) don't have bodies - they're just signatures. build_function_cfg will return a trivial entry->exit CFG. Consider using cfg: None for interface methods to match how complexity: None is set.
Summary
buildFunctionCFGalgorithm from JS (src/cfg.js) to Rust (crates/codegraph-core/src/cfg.rs) with per-languageCfgRulesfor all 8 supported languages (JS/TS/TSX, Python, Go, Rust, Java, C#, Ruby, PHP)build_function_cfg()on function/method AST nodes during extraction, storing CFG data directly on theDefinitionstructcfg.js,parser.js) updated to use native CFG data when available (def.cfg), falling back to WASM tree walk only when native data is absentChanges
Rust (~750 lines):
cfg.rs:CfgRulesstruct, 8 language configs,CfgBuilderstate machine porting the full algorithm (if/elif/else 3 patterns, for/while/do-while/infinite loops, switch/match, try/catch/finally, break/continue with labels, loop stack)types.rs:CfgBlock,CfgEdge,CfgDatastructs with napi bindingsbuild_function_cfg(node, lang_id)for function/method definitionsJS (~30 lines):
parser.js:normalizeNativeSymbolsmapsdef.cfgthrough to JScfg.js:buildCFGDatachecksdef.cfgbefore WASM fallback, skips WASM parser init when all defs have native CFGTest plan
cfg-all-langs.test.js— JS-side mock tests (2 pass locally)buildFunctionCFGPR 2 of 4 in the WASM double-parse elimination plan.