Lexer (ExpresionesLexer)

Overview

The ExpresionesLexer class is automatically generated by ANTLR 4.13.1 from the grammar file Expresiones.g. It performs lexical analysis by converting raw source code into a stream of tokens that the parser can understand.

This class is generated code. Do not modify ExpresionesLexer.py directly. Instead, update the lexical rules in Expresiones.g and regenerate using ANTLR.

Class Definition

class ExpresionesLexer(Lexer):
    atn = ATNDeserializer().deserialize(serializedATN())
    decisionsToDFA = [ DFA(ds, i) for i, ds in enumerate(atn.decisionToState) ]

Location: ~/workspace/source/ExpresionesLexer.py:75

Token Type Constants

The lexer defines 27 token type constants used throughout the compilation process:

Keywords

PROGRAMA = 1   # 'program'
SI = 2         # 'if'
SINO = 3       # 'else'
TIPO = 4       # 'int' | 'float' | 'bool'

Grammar Definition: Keywords

From Expresiones.g:34-37:

PROGRAMA : 'program' ;
SI       : 'if' ;
SINO     : 'else' ;
TIPO     : 'int' | 'float' | 'bool' ;

Delimiters

LLAVE_IZQ = 5   # '{'
LLAVE_DER = 6   # '}'
PAR_IZQ = 7     # '('
PAR_DER = 8     # ')'
PUNTO_COMA = 9  # ';'

Grammar Definition: Delimiters

From Expresiones.g:39-43:

LLAVE_IZQ : '{' ; 
LLAVE_DER : '}' ;
PAR_IZQ   : '(' ;
PAR_DER   : ')' ;
PUNTO_COMA: ';' ;

Operators

Arithmetic Operators

ASIGNACION = 10  # '='
SUMA = 11        # '+'
RESTA = 12       # '-'
MULT = 13        # '*'
DIV = 14         # '/'

Grammar Definition: Arithmetic Operators

From Expresiones.g:44-49:

ASIGNACION: '=' ; 
SUMA  : '+' ; 
RESTA : '-' ;
MULT  : '*' ;
DIV   : '/' ;

Relational Operators

MAYOR = 15        # '>'
MENOR = 16        # '<'
IGUAL = 17        # '=='
DIFERENTE = 18    # '!=' | '<>'
MAYOR_IGUAL = 19  # '>='
MENOR_IGUAL = 20  # '<='

Grammar Definition: Relational Operators

From Expresiones.g:51-56:

MAYOR       : '>' ;
MENOR       : '<' ;
IGUAL       : '==' ;
DIFERENTE   : '!=' | '<>' ;
MAYOR_IGUAL : '>=' ;
MENOR_IGUAL : '<=' ;

Logical Operators

Y_LOGICO = 21   # '&&'
O_LOGICO = 22   # '||'
NO_LOGICO = 23  # '!'

Grammar Definition: Logical Operators

From Expresiones.g:58-60:

Y_LOGICO  : '&&' ;
O_LOGICO  : '||' ;
NO_LOGICO : '!' ;

Identifiers and Literals

ID = 24      # [a-zA-Z][a-zA-Z0-9]*
NUMERO = 25  # [0-9]+ ('.' [0-9]+)?
WS = 26      # [ \t\r\n]+ -> skip
COMENTARIO = 27  # '//' ~[\n\r]* -> skip

Grammar Definition: Identifiers and Literals

From Expresiones.g:62-65:

ID     : [a-zA-Z][a-zA-Z0-9]* ;
NUMERO : [0-9]+ ('.' [0-9]+)? ;
WS     : [ \t\r\n]+ -> skip ;
COMENTARIO : '//' ~[\n\r]* -> skip ;

Lexer Configuration

The lexer maintains several configuration arrays defined in ExpresionesLexer.py:109-131:

Channel Names

channelNames = [ u"DEFAULT_TOKEN_CHANNEL", u"HIDDEN" ]

Tokens are sent to either the default channel (for parser consumption) or hidden channel (for whitespace and comments).

Mode Names

modeNames = [ "DEFAULT_MODE" ]

The lexer operates in a single mode. More complex languages may define multiple lexer modes.

Literal Names

literalNames = [ "<INVALID>",
        "'program'", "'if'", "'else'", "'{'", "'}'", "'('", "')'", "';'", 
        "'='", "'+'", "'-'", "'*'", "'/'", "'>'", "'<'", "'=='", "'>='", 
        "'<='", "'&&'", "'||'", "'!'" ]

Literal representations of fixed-string tokens used in error messages and debugging.

Symbolic Names

symbolicNames = [ "<INVALID>",
        "PROGRAMA", "SI", "SINO", "TIPO", "LLAVE_IZQ", "LLAVE_DER", 
        "PAR_IZQ", "PAR_DER", "PUNTO_COMA", "ASIGNACION", "SUMA", "RESTA", 
        "MULT", "DIV", "MAYOR", "MENOR", "IGUAL", "DIFERENTE", "MAYOR_IGUAL", 
        "MENOR_IGUAL", "Y_LOGICO", "O_LOGICO", "NO_LOGICO", "ID", "NUMERO", 
        "WS", "COMENTARIO" ]

Symbolic names used in the grammar and referenced by the parser.

Rule Names

ruleNames = [ "PROGRAMA", "SI", "SINO", "TIPO", "LLAVE_IZQ", "LLAVE_DER", 
              "PAR_IZQ", "PAR_DER", "PUNTO_COMA", "ASIGNACION", "SUMA", 
              "RESTA", "MULT", "DIV", "MAYOR", "MENOR", "IGUAL", "DIFERENTE", 
              "MAYOR_IGUAL", "MENOR_IGUAL", "Y_LOGICO", "O_LOGICO", 
              "NO_LOGICO", "ID", "NUMERO", "WS", "COMENTARIO" ]

Tokenization Process

Input Reception

The lexer receives raw source code as a character stream.

Pattern Matching

Using the serialized ATN (Augmented Transition Network), the lexer matches character sequences against lexical rules.

Token Generation

When a pattern matches, a token object is created with:

Token type (one of the 27 constants)
Text content
Line and column position
Channel (DEFAULT or HIDDEN)

Filtering

Tokens on the HIDDEN channel (whitespace and comments) are filtered from the parser’s view.

Stream Output

Valid tokens are emitted to the token stream for parser consumption.

Tokenization Examples

Example 1: Variable Declaration

Input:

int x = 10;

Token Stream:

TIPO('int')  ID('x')  ASIGNACION('=')  NUMERO('10')  PUNTO_COMA(';')

Example 2: Arithmetic Expression

Input:

y = (a + b) * 2;

Token Stream:

ID('y')  ASIGNACION('=')  PAR_IZQ('(')  ID('a')  SUMA('+')  ID('b')  
PAR_DER(')')  MULT('*')  NUMERO('2')  PUNTO_COMA(';')

Example 3: Conditional Statement

Input:

if (x > 5) { y = x * 2; }

Token Stream:

SI('if')  PAR_IZQ('(')  ID('x')  MAYOR('>')  NUMERO('5')  PAR_DER(')')  
LLAVE_IZQ('{')  ID('y')  ASIGNACION('=')  ID('x')  MULT('*')  NUMERO('2')  
PUNTO_COMA(';')  LLAVE_DER('}')

Example 4: Comments and Whitespace

Input:

// This is a comment
int x = 10;  // Another comment

Token Stream:

TIPO('int')  ID('x')  ASIGNACION('=')  NUMERO('10')  PUNTO_COMA(';')

Comments and whitespace are automatically filtered out. They match rules with -> skip directives in the grammar.

Special Token Patterns

Identifier Pattern

From Expresiones.g:62:

ID : [a-zA-Z][a-zA-Z0-9]* ;

Must start with a letter (uppercase or lowercase)
Can contain letters and digits
Examples: x, variable1, myVar, count2
Invalid: 1var (starts with digit), my-var (contains hyphen)

Number Pattern

From Expresiones.g:63:

NUMERO : [0-9]+ ('.' [0-9]+)? ;

Integer: one or more digits
Float: digits, followed by decimal point, followed by one or more digits
Examples: 42, 3.14, 0.5, 100
Invalid: .5 (no leading digit), 5. (no trailing digits)

Comment Pattern

From Expresiones.g:65:

COMENTARIO : '//' ~[\n\r]* -> skip ;

Single-line comments only
Start with //
Continue until end of line
Automatically skipped by lexer

Lexer Constructor

From ExpresionesLexer.py:133-138:

def __init__(self, input=None, output:TextIO = sys.stdout):
    super().__init__(input, output)
    self.checkVersion("4.13.1")
    self._interp = LexerATNSimulator(self, self.atn, self.decisionsToDFA, PredictionContextCache())
    self._actions = None
    self._predicates = None

input: Character stream to tokenize
output: Output stream for lexer messages
checkVersion: Ensures ANTLR runtime version matches generator version
_interp: ATN simulator that executes the lexical rules

Error Handling

The lexer automatically handles lexical errors when it encounters:

Invalid characters not matching any rule
Malformed number literals
Unexpected EOF

Errors are reported with line and column information through the ANTLR error listener mechanism.

For custom error handling, implement a custom error listener and attach it to the lexer instance.

Components

Development

Overview

Class Definition

Token Type Constants

Keywords

Delimiters

Operators

Arithmetic Operators

Relational Operators

Logical Operators

Identifiers and Literals

Lexer Configuration

Channel Names

Mode Names

Literal Names

Symbolic Names

Rule Names

Tokenization Process

Tokenization Examples

Example 1: Variable Declaration

Example 2: Arithmetic Expression

Example 3: Conditional Statement

Example 4: Comments and Whitespace

Special Token Patterns

Identifier Pattern

Number Pattern

Comment Pattern

Lexer Constructor

Error Handling

Build docs developers (and LLMs) love

Components

Development

​Overview

​Class Definition

​Token Type Constants

​Keywords

​Delimiters

​Operators

​Arithmetic Operators

​Relational Operators

​Logical Operators

​Identifiers and Literals

​Lexer Configuration

​Channel Names

​Mode Names

​Literal Names

​Symbolic Names

​Rule Names

​Tokenization Process

​Tokenization Examples

​Example 1: Variable Declaration

​Example 2: Arithmetic Expression

​Example 3: Conditional Statement

​Example 4: Comments and Whitespace

​Special Token Patterns

​Identifier Pattern

​Number Pattern

​Comment Pattern

​Lexer Constructor

​Error Handling

Build docs developers (and LLMs) love

Overview

Class Definition

Token Type Constants

Keywords

Delimiters

Operators

Arithmetic Operators

Relational Operators

Logical Operators

Identifiers and Literals

Lexer Configuration

Channel Names

Mode Names

Literal Names

Symbolic Names

Rule Names

Tokenization Process

Tokenization Examples

Example 1: Variable Declaration

Example 2: Arithmetic Expression

Example 3: Conditional Statement

Example 4: Comments and Whitespace

Special Token Patterns

Identifier Pattern

Number Pattern

Comment Pattern

Lexer Constructor

Error Handling