Skip to main content

Overview

The ExpresionesLexer class is automatically generated by ANTLR 4.13.1 from the grammar file Expresiones.g. It performs lexical analysis by converting raw source code into a stream of tokens that the parser can understand.
This class is generated code. Do not modify ExpresionesLexer.py directly. Instead, update the lexical rules in Expresiones.g and regenerate using ANTLR.

Class Definition

class ExpresionesLexer(Lexer):
    atn = ATNDeserializer().deserialize(serializedATN())
    decisionsToDFA = [ DFA(ds, i) for i, ds in enumerate(atn.decisionToState) ]
Location: ~/workspace/source/ExpresionesLexer.py:75

Token Type Constants

The lexer defines 27 token type constants used throughout the compilation process:

Keywords

PROGRAMA = 1   # 'program'
SI = 2         # 'if'
SINO = 3       # 'else'
TIPO = 4       # 'int' | 'float' | 'bool'
From Expresiones.g:34-37:
PROGRAMA : 'program' ;
SI       : 'if' ;
SINO     : 'else' ;
TIPO     : 'int' | 'float' | 'bool' ;

Delimiters

LLAVE_IZQ = 5   # '{'
LLAVE_DER = 6   # '}'
PAR_IZQ = 7     # '('
PAR_DER = 8     # ')'
PUNTO_COMA = 9  # ';'
From Expresiones.g:39-43:
LLAVE_IZQ : '{' ; 
LLAVE_DER : '}' ;
PAR_IZQ   : '(' ;
PAR_DER   : ')' ;
PUNTO_COMA: ';' ;

Operators

Arithmetic Operators

ASIGNACION = 10  # '='
SUMA = 11        # '+'
RESTA = 12       # '-'
MULT = 13        # '*'
DIV = 14         # '/'
From Expresiones.g:44-49:
ASIGNACION: '=' ; 
SUMA  : '+' ; 
RESTA : '-' ;
MULT  : '*' ;
DIV   : '/' ;

Relational Operators

MAYOR = 15        # '>'
MENOR = 16        # '<'
IGUAL = 17        # '=='
DIFERENTE = 18    # '!=' | '<>'
MAYOR_IGUAL = 19  # '>='
MENOR_IGUAL = 20  # '<='
From Expresiones.g:51-56:
MAYOR       : '>' ;
MENOR       : '<' ;
IGUAL       : '==' ;
DIFERENTE   : '!=' | '<>' ;
MAYOR_IGUAL : '>=' ;
MENOR_IGUAL : '<=' ;

Logical Operators

Y_LOGICO = 21   # '&&'
O_LOGICO = 22   # '||'
NO_LOGICO = 23  # '!'
From Expresiones.g:58-60:
Y_LOGICO  : '&&' ;
O_LOGICO  : '||' ;
NO_LOGICO : '!' ;

Identifiers and Literals

ID = 24      # [a-zA-Z][a-zA-Z0-9]*
NUMERO = 25  # [0-9]+ ('.' [0-9]+)?
WS = 26      # [ \t\r\n]+ -> skip
COMENTARIO = 27  # '//' ~[\n\r]* -> skip
From Expresiones.g:62-65:
ID     : [a-zA-Z][a-zA-Z0-9]* ;
NUMERO : [0-9]+ ('.' [0-9]+)? ;
WS     : [ \t\r\n]+ -> skip ;
COMENTARIO : '//' ~[\n\r]* -> skip ;

Lexer Configuration

The lexer maintains several configuration arrays defined in ExpresionesLexer.py:109-131:

Channel Names

channelNames = [ u"DEFAULT_TOKEN_CHANNEL", u"HIDDEN" ]
Tokens are sent to either the default channel (for parser consumption) or hidden channel (for whitespace and comments).

Mode Names

modeNames = [ "DEFAULT_MODE" ]
The lexer operates in a single mode. More complex languages may define multiple lexer modes.

Literal Names

literalNames = [ "<INVALID>",
        "'program'", "'if'", "'else'", "'{'", "'}'", "'('", "')'", "';'", 
        "'='", "'+'", "'-'", "'*'", "'/'", "'>'", "'<'", "'=='", "'>='", 
        "'<='", "'&&'", "'||'", "'!'" ]
Literal representations of fixed-string tokens used in error messages and debugging.

Symbolic Names

symbolicNames = [ "<INVALID>",
        "PROGRAMA", "SI", "SINO", "TIPO", "LLAVE_IZQ", "LLAVE_DER", 
        "PAR_IZQ", "PAR_DER", "PUNTO_COMA", "ASIGNACION", "SUMA", "RESTA", 
        "MULT", "DIV", "MAYOR", "MENOR", "IGUAL", "DIFERENTE", "MAYOR_IGUAL", 
        "MENOR_IGUAL", "Y_LOGICO", "O_LOGICO", "NO_LOGICO", "ID", "NUMERO", 
        "WS", "COMENTARIO" ]
Symbolic names used in the grammar and referenced by the parser.

Rule Names

ruleNames = [ "PROGRAMA", "SI", "SINO", "TIPO", "LLAVE_IZQ", "LLAVE_DER", 
              "PAR_IZQ", "PAR_DER", "PUNTO_COMA", "ASIGNACION", "SUMA", 
              "RESTA", "MULT", "DIV", "MAYOR", "MENOR", "IGUAL", "DIFERENTE", 
              "MAYOR_IGUAL", "MENOR_IGUAL", "Y_LOGICO", "O_LOGICO", 
              "NO_LOGICO", "ID", "NUMERO", "WS", "COMENTARIO" ]

Tokenization Process

1

Input Reception

The lexer receives raw source code as a character stream.
2

Pattern Matching

Using the serialized ATN (Augmented Transition Network), the lexer matches character sequences against lexical rules.
3

Token Generation

When a pattern matches, a token object is created with:
  • Token type (one of the 27 constants)
  • Text content
  • Line and column position
  • Channel (DEFAULT or HIDDEN)
4

Filtering

Tokens on the HIDDEN channel (whitespace and comments) are filtered from the parser’s view.
5

Stream Output

Valid tokens are emitted to the token stream for parser consumption.

Tokenization Examples

Example 1: Variable Declaration

Input:
int x = 10;
Token Stream:
TIPO('int')  ID('x')  ASIGNACION('=')  NUMERO('10')  PUNTO_COMA(';')

Example 2: Arithmetic Expression

Input:
y = (a + b) * 2;
Token Stream:
ID('y')  ASIGNACION('=')  PAR_IZQ('(')  ID('a')  SUMA('+')  ID('b')  
PAR_DER(')')  MULT('*')  NUMERO('2')  PUNTO_COMA(';')

Example 3: Conditional Statement

Input:
if (x > 5) { y = x * 2; }
Token Stream:
SI('if')  PAR_IZQ('(')  ID('x')  MAYOR('>')  NUMERO('5')  PAR_DER(')')  
LLAVE_IZQ('{')  ID('y')  ASIGNACION('=')  ID('x')  MULT('*')  NUMERO('2')  
PUNTO_COMA(';')  LLAVE_DER('}')

Example 4: Comments and Whitespace

Input:
// This is a comment
int x = 10;  // Another comment
Token Stream:
TIPO('int')  ID('x')  ASIGNACION('=')  NUMERO('10')  PUNTO_COMA(';')
Comments and whitespace are automatically filtered out. They match rules with -> skip directives in the grammar.

Special Token Patterns

Identifier Pattern

From Expresiones.g:62:
ID : [a-zA-Z][a-zA-Z0-9]* ;
  • Must start with a letter (uppercase or lowercase)
  • Can contain letters and digits
  • Examples: x, variable1, myVar, count2
  • Invalid: 1var (starts with digit), my-var (contains hyphen)

Number Pattern

From Expresiones.g:63:
NUMERO : [0-9]+ ('.' [0-9]+)? ;
  • Integer: one or more digits
  • Float: digits, followed by decimal point, followed by one or more digits
  • Examples: 42, 3.14, 0.5, 100
  • Invalid: .5 (no leading digit), 5. (no trailing digits)

Comment Pattern

From Expresiones.g:65:
COMENTARIO : '//' ~[\n\r]* -> skip ;
  • Single-line comments only
  • Start with //
  • Continue until end of line
  • Automatically skipped by lexer

Lexer Constructor

From ExpresionesLexer.py:133-138:
def __init__(self, input=None, output:TextIO = sys.stdout):
    super().__init__(input, output)
    self.checkVersion("4.13.1")
    self._interp = LexerATNSimulator(self, self.atn, self.decisionsToDFA, PredictionContextCache())
    self._actions = None
    self._predicates = None
  • input: Character stream to tokenize
  • output: Output stream for lexer messages
  • checkVersion: Ensures ANTLR runtime version matches generator version
  • _interp: ATN simulator that executes the lexical rules

Error Handling

The lexer automatically handles lexical errors when it encounters:
  • Invalid characters not matching any rule
  • Malformed number literals
  • Unexpected EOF
Errors are reported with line and column information through the ANTLR error listener mechanism.
For custom error handling, implement a custom error listener and attach it to the lexer instance.

Build docs developers (and LLMs) love