A comprehensive, interactive syntax analysis detection system demonstrating compiler design principles
- Overview
- Features
- Live Demo
- Architecture
- Grammar Definition
- Parser Implementations
- Error Detection
- Project Structure
- Getting Started
- Test Cases
- Applications
- Technologies Used
- Screenshots
- Contributing
- License
SyntaxScope is an interactive web-based project that demonstrates the design and implementation of a syntax analysis detection system — a core component of compiler design. It takes user-provided arithmetic expressions or code snippets and analyzes them to verify whether they follow the rules of a defined context-free grammar (CFG).
Syntax analysis (parsing) is the second phase of a compiler. It takes a stream of tokens from the lexical analyzer and determines whether the token sequence can be generated by the grammar of the source language. The parser constructs a parse tree that represents the hierarchical syntactic structure of the program.
Source Code → [Lexer] → Tokens → [Parser] → Parse Tree → [Semantic Analyzer] → AST → [Code Gen] → Output
↑
You are here!
| Aspect | Lexical Analysis | Syntax Analysis |
|---|---|---|
| Phase | First phase | Second phase |
| Input | Character stream | Token stream |
| Output | Tokens | Parse tree / AST |
| Grammar | Regular expressions | Context-Free Grammar |
| Errors | Invalid characters | Structural errors |
| Tool | Finite Automaton | Pushdown Automaton |
- 🔄 Three Parser Implementations: Recursive Descent, LL(1), and LR(0)
- 📝 Interactive Tokenizer: Real-time lexical analysis with token visualization
- 🌳 Parse Tree Visualization: Canvas-based tree rendering with color-coded nodes
- ❌ Syntax Error Detection: Meaningful error messages with position indicators
- 📊 Step-by-Step Tracing: Watch the parsing process unfold step by step
- 🧪 Automated Testing: 15+ built-in test cases (valid and invalid)
- 📱 Responsive Design: Works on desktop, tablet, and mobile
- 🎨 Premium Dark Theme: Modern glassmorphism UI with animations
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Input │────▶│ Lexer │────▶│ Parser │
│ Expression │ │ (Tokenizer) │ │ (RD/LL1/LR) │
└──────────────┘ └──────────────┘ └──────┬───────┘
│
┌──────────────┐ ┌────────▼───────┐
│ Tree │◀────│ AST │
│ Renderer │ │ (Parse Tree) │
└──────────────┘ └────────────────┘
A formal grammar G = (V, Σ, P, S) where:
- V (Variables): {Expr, Term, Factor, Primary}
- Σ (Terminals): {NUMBER, ID, +, -, *, /, ^, (, )}
- S (Start symbol): Expr
1. Expr → Expr + Term (Addition)
2. Expr → Expr − Term (Subtraction)
3. Expr → Term (Base case)
4. Term → Term * Factor (Multiplication)
5. Term → Term / Factor (Division)
6. Term → Factor (Base case)
7. Factor → Primary ^ Factor (Exponentiation, right-assoc)
8. Factor → Primary (Base case)
9. Primary → NUMBER (Number literal)
10. Primary → IDENTIFIER (Variable)
11. Primary → ( Expr ) (Grouping)
12. Primary → − Primary (Unary negation)
| Level | Operators | Associativity |
|---|---|---|
| 1 | +, - |
Left |
| 2 | *, / |
Left |
| 3 | ^ |
Right |
| 4 | Unary - |
Right (prefix) |
A hand-written parser where each non-terminal in the grammar has a corresponding function. The parser processes input from left to right and constructs a leftmost derivation.
function parseExpr() {
let node = parseTerm();
while (token === '+' || token === '-') {
let op = token;
advance();
node = { type: 'BinaryExpr', op, left: node, right: parseTerm() };
}
return node;
}Pros: Easy to implement, excellent error messages, mirrors grammar structure
Cons: Cannot handle left-recursive grammars directly
Uses a parsing table built from FIRST and FOLLOW sets. Reads input Left-to-right, produces Leftmost derivation, uses 1 token of lookahead.
Pros: No recursion overhead, predictable O(n) performance
Cons: Requires grammar transformation (left-recursion elimination)
A bottom-up parser that shifts tokens onto a stack and reduces them using grammar rules. Uses operator precedence to resolve shift/reduce conflicts.
Pros: Handles wider class of grammars, very powerful
Cons: Complex to implement, harder error messages
The system provides detailed, meaningful error messages:
Input: 3 + * 5
3 + * 5
^ error here
ParseError: Unexpected token STAR('*') — expected NUMBER, IDENTIFIER, '(', or unary '-'
Input: (3 + 5
(3 + 5
^ error here
ParseError: Expected RPAREN but found EOF('EOF')
- Position tracking: Each token stores its character position for precise error pointing
- Contextual messages: Errors indicate what was expected vs. what was found
- Visual indicators: Source code with caret pointing to the error location
syntax-analysis-detection-system/
├── index.html # Main HTML page
├── styles.css # Complete CSS design system
├── js/
│ ├── lexer.js # Tokenizer — converts input to tokens
│ ├── recursive-descent-parser.js # Recursive Descent parser implementation
│ ├── ll1-parser.js # LL(1) table-driven parser
│ ├── lr-parser.js # LR(0) shift-reduce parser
│ ├── tree-renderer.js # Canvas-based parse tree visualizer
│ └── app.js # Main application controller
├── README.md # Project documentation
└── LICENSE # MIT License
- A modern web browser (Chrome, Firefox, Safari, Edge)
- No build tools or dependencies required!
-
Clone the repository:
git clone https://github.com/YOUR_USERNAME/syntax-analysis-detection-system.git cd syntax-analysis-detection-system -
Open in browser:
# Option 1: Simply open the file open index.html # Option 2: Use a local server npx serve . # Option 3: Python server python3 -m http.server 8080
-
Try it out:
- Enter an expression like
(3 + 5) * 2 - Select a parser type
- Click "Parse" to see tokens, steps, and the parse tree
- Enter an expression like
| # | Input | Description |
|---|---|---|
| 1 | 3 + 5 |
Simple addition |
| 2 | 3 + 5 * 2 |
Precedence: multiply before add |
| 3 | (3 + 5) * 2 |
Parenthesized expression |
| 4 | 2 ^ 3 ^ 2 |
Right-associative exponentiation |
| 5 | -x + 3 |
Unary minus with identifier |
| 6 | a * b + c / d |
Multiple operators |
| 7 | ((1 + 2)) |
Nested parentheses |
| 8 | 3.14 * r ^ 2 |
Decimal with exponent |
| 9 | -(-x) |
Double unary negation |
| 10 | 42 |
Single number |
| # | Input | Expected Error |
|---|---|---|
| 11 | 3 + * 5 |
Missing operand between operators |
| 12 | (3 + 5 |
Unmatched left parenthesis |
| 13 | 3 + 5) |
Unmatched right parenthesis |
| 14 | 3 + |
Trailing operator |
| 15 | (empty) | Empty input |
Syntax analysis is used extensively across computing:
- 🛠️ Compilers & Interpreters — GCC, Clang, V8 Engine
- 📝 Code Editors & IDEs — VS Code, IntelliJ, Tree-sitter
- 🔎 Linters & Formatters — ESLint, Prettier, Black
- 🗄️ Database Query Engines — PostgreSQL, MySQL, SQLite
- 🌐 Web Browsers — HTML/CSS/JS parsing (Blink, Gecko, WebKit)
- 🤖 Natural Language Processing — spaCy, NLTK, Stanford NLP
| Technology | Purpose |
|---|---|
| JavaScript (ES6+) | Parser logic, lexer, application controller |
| HTML5 Canvas | Parse tree visualization |
| CSS3 | Dark theme, animations, glassmorphism, responsive design |
| Inter Font | UI typography |
| JetBrains Mono | Code/grammar typography |
- Grammar design matters — A well-structured grammar directly enables clean parser implementation
- Top-down vs Bottom-up — Each approach has tradeoffs; recursive descent is intuitive, LR is powerful
- Error recovery is crucial — Real-world parsers need robust error recovery
- Visualization aids understanding — Parse trees make abstract structure tangible
- Extend grammar to support assignment, conditionals, loops
- Implement error recovery strategies (panic mode, phrase-level)
- Add semantic analysis phase (type checking)
- Support LALR(1) and SLR(1) parsing algorithms
- Generate intermediate code from the AST
- Integrate with Lex/Yacc or ANTLR for comparison
This project is licensed under the MIT License — see the LICENSE file for details.
Built with ❤️ for Compiler Design