Module lexer

Performs lexxing of Scintilla documents.

Usage

Writing a Dynamic Lexer (somewhat brief tutorial):
  This may seem like a daunting task, judging by the length of this document,
  but the process is actually fairly straight- forward. I just need to
  include every little detail in order for you to be able to utilize
  everything I have provided for your lexer development.

  In order to setup a dynamic lexer, create a lua script with your lexer's
  name as the filename followed by '.lua' in the /lexers/ directory. Then at
  the top of your lexer, the following must appear:
    module(..., package.seeall)
  Lexers are meant to be modules, not to be loaded in the global namespace.
  The ... parameter means this module assumes the name it is being 'require'd
  with. So doing
    require 'ruby'
  means the lexer will be the table 'ruby' in the global namespace. (Useful
  for a 'require'd lexer to check if another particular lexer has been
  loaded.)

  Now you'll need a way to style patterns of text. This is accomplished
  through tokens.

  Tokens:
    Each lexer is composed of a series of tokens. Each token contains a state
    identifier and an associated LPeg pattern. Generally the identifier
    should be prefixed or otherwise individualized in some way so as not to
    create conflicts with other lexer states if either your lexer is to be
    embedded in another, or another lexer is to be embedded in yours. You can
    create a token with a specified pattern by calling the 'token' function.
      e.g. local comment = token('comment', comment_pattern)
           local variable = token('my_variable', var_pattern)
    Note that 'comment' is part of the default Types and Styles, so it will
    be colored with the same style as default comments. If you wish for your
    comments to be different, you should create a token with a unique id and
    add_style() your style in your 'LoadStyles' function (discussed later).

    What are the default Types and Styles? You can look in
    /lexers/lexer.lua's DefaultTypesAndStyles function. Each lexer has a
    Types and Styles table. They initially contain types and styles that are
    common to nearly every lexer, saving you the trouble of creating the
    same states and styles for every lexer you write. You can of course
    redefine them in your own lexer if you wish, but they must be redefined
    in the LoadStyles function described later.

  So now you can create patterns and give them identifiers in a token. Next
  you need to create: the simple patterns that appear in most every lexer;
  styles; and colors. Rather than it being tedious, it has already been done
  for you and is available globally (from lexer.lua):
    Patterns:
      any - matches any single character
      ascii - matches any ASCII character (0..127)
      extend - matches any ASCII extended character (0..255)
      alpha - matches any alphabetic character (A-Z, a-z)
      digit - matches any digit (0-9)
      alnum - matches any alphanumeric character (A-Z, a-z, 0-9)
      lower - matches any lowercase character (a-z)
      upper - matches any uppercase character (A-Z)
      xdigit - matches any hexadecimal digit (0-9, A-F, a-f)
      cntrl - matches any control character (0..31)
      graph - matches any graphical character (! to ~)
      print - matches any printable character (space to ~)
      punct - matches any punctuation character not alphanumeric (! to /,
        : to @, [ to ', { to ~)
      space - matches any whitespace character (\t, \v, \f, \n, \r, space)
      newline - matches any newline characters
      nonnewline - matches any non-newline character
      nonnewline_esc - matches any non-newline character excluding newlines
        escaped with '\\'
      dec_num - matches a decimal number
      hex_num - matches a hexadecimal number
      oct_num - matches an octal number
      integer - matches a decimal, hexadecimal, or octal number
      float - matches a floating point number
      word = matches a typical word starting with a letter or underscore and
        then any alphanumeric or underscore characters
      any_char - token defined as token('default', any)
    Colors via the 'colors' table:
      red, yellow, green, blue, teal, white, black
    Styles:
      style_nothing, style_char, style_comment,
      style_definition, style_error, style_keyword,
      style_number, style_operator, style_string,
      style_preproc, style_tag, style_identifier
    Note: colors and styles are identical to those defined in
      SciTEGlobal.properties.

  Okay, so at this time you're probably thinking about keywords and keyword
  lists that were provided in SciTE properties files because you surely will
  want to style those! Unfortunately there is no way to read those keywords,
  but there are a couple functions that will make your life easier. Rather
  than creating a lpeg.P('keyword1') + lpeg.P('keyword2') + ... pattern for
  keywords, you can use a combination of the 'word_list' and 'word_match'
  functions.
    word_list(words)
      Creates a word hash from a given table of [string] words.
        e.g. local keywords = word_list{ 'foo', 'bar', 'baz' }
    word_match(word_list[, chars, case_insensitive])
      Creates an LPeg pattern that checks to see if the current word is in
      word_list.
        e.g. local keyword = word_match(keywords)
      where keywords is defined in the previous example.
      Optional second parameter chars is a string of characters that count as
      word characters. Default word characters are alpha-numeric or an
      underscore (_). In HTML and CSS for example, the hyphen (-) is
      considered a word character, so '-' would be the value of the second
      argument.
      Optional third [boolen] parameter is whether or not words are matched
      case insensitively.
  These functions make sense to have because the maximum pattern size for a
  lexer is SHRT_MAX - 10, or generally 32757 elements. If an lpeg.P was
  created for each keyword in a language, this number would probably come
  into effect -- especially for embedded languages. Also, it would be SLOW to
  have a pattern for every keyword. 'word_match' gets the identifier once and
  checks if it exists in word_list using a hash, which is very fast.

  When you were creating your tokens, you gave them identifer states. For the
  identifier states that aren't part of the default Types and Styles, styles
  will need to be defined for them. For this, a 'style' function is
  available. It's only parameter is a table which can contain the following
  fields:
    font - font name (string)
    size - font size (integer)
    bold - bold font (boolean)
    italic - italic font (boolean)
    underline - underline text (boolean)
    fore - text foreground color (integer)*
    back - text background color (integer)*
    eolfilled - use background color for entire line, not stopping at a
      newline character (boolean)
    characterset - ?
    case - the default text case; 0 for normal case, 1 for uppercase, 2 for
      lowercase (integer)
    visible - text is visible or not (boolean)
    changeable - text is changeable or not (boolean)
    hotspot - text is hotspot or not (boolean)
    ---
    * Use the 'color' function to create appropriate integer values from hex
      colors (#RRGGBB). Arguments are red, green, blue hexadecimal values as
      STRINGS. e.g. red = color('FF', '00', '00')
  Styles can be simple, like:
    style_bold = style { bold = true }
  or they can be composed of existing styles with added
    style_bold_italic = style_bold..{ italic = true }
  or modified fields
    style_normal = style_bold..{ bold = false }
  Note in both cases that style_bold is left unchanged.

  Now that you have styles defined for you identifiers, it's time to add them
  to Scintilla. This is done in a global LoadStyles function. LoadStyles is
  called when the lexer has been initialized and Scintilla is ready to setup
  the lexer's styles. The 'add_style' function provides a way to easily
  define your styles. The first parameter is your token identifier, and the
  second is the style you created for it. For example:
    function LoadStyles()
      add_style('variable', style_variable)
      add_style('function', style_function)
    end
  'add_style' returns the style number of the identifier added. This is
  useful for associating a particular style with the number returned by the
  function GetStyleAt (see below) or SciTE's editor.StyleAt.

  Finally! All your tokens have been created. All that is left to do is add
  them to your lexer. This is done in a global LoadTokens function.
  LoadTokens is called when the lexer has been initialized and the lexer
  library is ready to create the LPeg table capture that will lexx any input
  given. The 'add_token' function provides a way to easily define your
  tokens. The first parameter is your lexer, the second is your token
  identifier, and the third is the pattern returned by the 'token' function.
  For example:
      function LoadTokens()
        add_token(mylexer, 'comment', comment)
        add_token(mylexer, 'variable', variable)
      end
    where comment and variable have been defined in an above example as the
    returns of calls to 'token'.
  Keep in mind order matters. If the match to the first token added fails,
  the next token is tried, then the next, etc. If you want one token to match
  before another, move it's declaration before the latter's. Not having
  tokens in proper order can be tricky to debug if something goes wrong.
  Ah, you have all your tokens added, but what if some input does not match?
  This is where a global 'any_char' variable comes in. It is defined as
    any_char = token('default', lpeg.P(1))
  so that any pattern you hadn't accounted for is styled (one character
  only). You can of course override any_char to display something you can
  recognize if you are debugging your lexer or you count unmatched patterns
  as syntax errors. Now:
    add_token(mylexer, 'any_char', any_char)
  'add_token' adds your identifier and pattern to a TokenPatterns table. This
  table is available to any other lexer as a means of accessing or modifying
  your lexer's tokens. This is especially useful for embedded lexer
  functionality. See the supplemental section Writing a lexer that will embed
  in another lexer for more details.

  The only thing left to do at this point is to lex the document with your
  LPeg tokens. If your approach is to lex the entire document (not line-by-
  line), you're done! /lexers/lexer.lua realizes this is what you intend and
  does it automatically for you. If you wanted to have a line-by-line lexer
  instead of one that lexxes the entire document at once, set a global
  LexByLine variable to true and you're finished. You can lex your own way if
  you'd like by creating a global Lex function that returns a table whose
  indices contain style numbers and positions to style to. The LPeg table
  capture for a lexer is defined as Tokens and the pattern to match a single
  token is defined as Token.

  Because you have your styles and colors defined in the lexer itself, you
  may be wondering if your SciTE properties files can still be used. The
  answer is absolutely! All styling information is ignored though.

  Optional -- Code Folding:
    It is sometimes convenient to "fold", or not show blocks of code when
    editing, whether they be functions, classes, comments, etc. The basic
    idea behind implementing a folder is to iterate line by line through the
    document, assigning a fold level to each line. Lines to be "hidden" have
    a higher fold level than lines that are the 'fold header's. This means
    that when you click the 'fold header', it folds all lines below that have
    a higher fold level than it.
    In order to implement a folder, define the following global function in
    your lexer:
      Fold(input, start_pos, start_line, start_level)
        Fold is called when Scintilla is ready to fold your document.
        Parameters are: input, which is the text to fold; start_pos, the
        current position in the buffer of the text (used for obtaining style
        information from the document); start_line, the line number the text
        starts at; start_level, the fold level of the text at start_line.

    The following Scintilla fold constants are also available (see
    Scintilla's documentation for more detail on what these flags mean):
      SC_FOLDLEVELBASE
      SC_FOLDLEVELWHITEFLAG
      SC_FOLDLEVELHEADERFLAG
      SC_FOLDLEVELBOXHEADERFLAG
      SC_FOLDLEVELBOXFOOTERFLAG
      SC_FOLDLEVELCONTRACTED
      SC_FOLDLEVELUNINDENT
      SC_FOLDLEVELNUMBERMASK
    An important one to remember is SC_FOLDLEVELBASE which is the value
    you'll add your fold levels to if you aren't using the previous line's
    fold level at all (e.g. folding by indent level).

    Now you'll want to iterate over each line, setting fold levels as well as
    keeping track of the line number you're on, the current position at the
    end of each line, and the fold level of the previous line. As an example:
      local current_pos, current_line = start_pos, start_line
      local prev_level = start_level
      for line, data in text:gmatch('((.-)\r?\n)')
        local current_level = prev_level
        if #data > 0 -- not an empty line
          local header
          -- code to determine if this will be a header level
          if header then
            -- header level flag
            current_level = bit.bor(prev_level, SC_FOLDLEVELHEADERFLAG)
          else
            -- code to determine fold level, and add (+) it to
            -- current_level
            current_level = current_level + ...
          end
        else
          -- empty line flag
          current_level = bit.bor(prev_level, SC_FOLDLEVELWHITEFLAG)
        end
        SetFoldLevel(current_line, current_level)

        -- keep track of necessary buffer information
        prev_level = current_level
        current_line = current_line + 1
        current_pos = current_pos + #line
      end
      -- important: keep current flags on next line
      local flags_next = bit.band(GetFoldLevel(current_line),
        bit.bnot(SC_FOLDLEVELNUMBERMASK))
      SetFoldLevel(current_line, bit.bor(prev_level, flags_next))

    That last 'important' section, just copy and paste to the end of your
    Fold function.

    In order to get or set fold levels for a specific line, the following
    functions are provided:
      GetFoldLevel(line)
        Returns the fold level + SC_FOLDLEVELBASE of line.

      SetFoldLevel(line, level)
        Sets the fold level of line to level (remember to add
        SC_FOLDLEVELBASE to it if you haven't already).

    What is the 'bit.band' and 'bit.bor' stuff about? Well that's where
    bitlib comes in. 'bit' is a global table that contains binary operations.
    Briefly:
      bit.band(b1, b2) performs binary & between b1 and b2
      bit.bor(b1, b2) performs binary | between b1 and b2
      bit.bnot(b1) performs binary not of b1
      ...

    There are additional Lua functions provided to help you fold your
    document:
      GetStyleAt(position)
        Returns the integer style at position.

      GetIndentAmount(line_number)
        Returns the indent amount of line_number (taking into account
        tabsize, tabs or spaces, etc.)

    Note: do not use GetProperty for getting fold options from a .properties
    file because SciTE needs to be compiled to forward those specific
    properties to Scintilla. Instead, provide options that can be set at the
    top of the lexer.

    There is a new 'fold.by.indentation' property where if the 'fold'
    property is set for a lexer, but there is no Fold function available, the
    document is folded by indentation. This is done in /lexers/lexer.lua and
    should serve as an example of folding in this manner.

  Congratulations! You have finished writing a dynamic lexer. Now you can
  either create a properties file for it (don't forget to 'import' it in your
  Global or User properties file), or elsewhere define the necessary
    file.patterns.[lexer_name]=[file_patterns]
    lexer.$(file.patterns.[lexer_name])=[lexer_name]
  in order for the lexer to be loaded automatically when a specific file type
  is opened.

  Supplementals:
    Writing a lexer that will have languages embedded in it:
      This is pretty easy. Nothing. That's right. If you've followed the
      rules for creating lexers, no further modifications are necessary.
      If you want to embed languages in the lexer by default:
        1: Load the child lexer module by doing something like:
          local child = require('child_lexer')
        2: Load the child lexer's styles in the LoadStyles function.
          e.g. child.LoadStyles()
        3: Load the child lexer's tokens in the LoadTokens function.
          e.g. child.LoadTokens()
        4: In the parent's LoadTokens function, use 'embed_language' as
          described below.
      The html.lua lexer is a good example.

    Writing a lexer that will embed in another lexer:
      1: Load the parent lexer module that you will embed your child lexer
        into by doing something like:
          local parent = require('parent_lexer')
      2: In the LoadTokens function, create start and end tokens for your
        child lexer. They are tokens that define the start and end of your
        embedded lexer respectively. For example, PHP requires a '' to end. Then modify your lexer's 'any_char' token
        (or equivalent, via the TokenPattern table) to a character that does
        not match the end_token. Finally, call the 'make_embeddable'
        function. It accepts 4 parameters: the language to embed, the parent
        language to embed in, the start_token, and the end_token. Here's an
        example:
          local start_token = foo
          local end_token = bar
          child.TokenPatterns.any_char = token('default', 1 - end_token)
          make_embeddable(child, parent, start_token, end_token)
      3: Use the 'add_langage' function:
        embed_language(parent, child[, preproc=false])
          parent is the parent lexer module, child is your lexer module, and
          preproc is an optional [boolean] argument that indicates whether
          this embedded language is a preprocessor language. A preprocessor
          language will have its tokens embedded in each of the parent
          language's embedded languages. (Note the SHRT_MAX limitation may
          come into effect.)
      4: Load the parent lexer's styles in the LoadStyles function.
        e.g. parent.LoadStyles()
      5: Load the parent lexer's tokens in the LoadTokens function.
        e.g. parent.LoadTokens()
      6: If your embedded lexer is a preprocessor language, you may want to
        modify some of parent's tokens to embed your lexer in (i.e. strings).
        You can access them through the parent's TokenPatterns table. Then
        you must rebuild the parent's token patterns by calling
        'rebuild_token' and 'rebuild_tokens' one after the other passing the
        parent lexer as the only parameter. For example:
          parent.TokenPatterns.string = string_with_embedded
          rebuild_token(parent)
          rebuild_tokens(parent)
      6: If your child lexer, not the parent lexer, is being loaded, specify
        that you want the parent's tokens to be used for lexxing instead of
        child's. Set a global UseOtherTokens variable to be parent's tokens.
          e.g. UseOtherTokens = parent.Tokens
      The php.lua lexer is a good example.

    Optimization:
      Lexers can usually be optimized for speed by re-arranging tokens so
      that the most common tokens are recognized first. Be careful that by
      putting some tokens in front of others, the latter tokens may not be
      recognized because the former tokens may 'eat' them because they match
      first.

  Affects on SciTE-tools and SciTE-st Lua modules:
    Because most custom styles aren't fixed numbers, both scope-specific
    snippets and key commands need to be tweaked a bit. SCE_* scope constants
    are no longer available. Instead, named keys are scopes in that lexer.
    See /lexers/lexer.lua for default named scopes. Each individual lexer
    uses the 'add_style' function to add additional styles/scopes to it, so
    use the string argument passed as the scope's name.

  Additional Examples:
    See the lexers contained in /lexers/.
    Be sure to see /lexers/lexer.lua for more information too.

  When things aren't working:
    Lexers can be tricky to debug if you do not write them carefully. Errors
    are printed to STDOUT as well as any print() statements in the lexer
    itself.

  Limitations:
    Patterns can only be comprised of SHRT_MAX - 10 or generally 32757
    elements. This should be suitable for most language lexers however.

Performance:
  The lexer is not quite as efficient as Scintilla's built-in lexers because
  Scintilla uses a variable endStyled to keep track of the last position in
  the document where the syntax is most likely styled correctly so the entire
  document does not need to be lexxed each time styling is needed. This lexer
  does away with that and lexxes all of the text, but only styles the text
  Scintilla asks it to. This has to be done because of the nature of the
  pattern-based styling.

  Although the entire document must be lexxed each time, the operation is
  done in nearly O(1) time. A lot of regex-based syntax highlighting editors
  apply each rule to their documents one at a time, coloring pieces of text
  in chunks. This is not O(1) time because generally the entire document must
  be searched through again and again for each pattern. I personally think
  the slight sacrifice in performance is worth the phenominal amount of power
  the dynamic lexer gives to the user, not to mention how easy it is to write
  LPeg lexers.

Disclaimer:
  Because of its dynamic nature, crashes could potentially occur because of
  malformed lexers. In the event that this happens, I CANNOT be liable for
  any damages such as loss of data. You are encouraged, however, to report
  the crash with any information that can produce it, or submit a patch to me
  that fixes the error.

Acknowledgements:
  When Peter Odding posted his original Lua lexer to the Lua mailing list, it
  was just what I was looking for to start making the LPeg lexer I had been
  dreaming of since Roberto announced the library. Until I saw his code, I
  wasn't sure what the best way to go about implementing a lexer was -- at
  least one that Scintilla could utilize. I liked the way he tokenized
  patterns, because it was really easy for me to assign styles to them. I
  also learned much more about LPeg through his amazingly simple, but
  effective script.

Functions

DefaultTypesAndStyles () Returns default Types and Styles common to most every lexer.
InitLexer (name) Initializes the lexer language.
RunFolder (text, start_pos, start_line, start_level) Performs the folding of the document.
RunLexer (text) Performs the lexxing of the document.
add_style (id, style) Adds a new Scintilla style to Scintilla.
add_token (lexer, id, token_patt, exclude, pos) Adds a token to a lexer's current ordered list of tokens.
color (r, g, b) Creates a Scintilla color.
delimited_range (chars, escape, end_optional, balanced, forbidden) Creates an LPeg pattern that matches a range of characters delimitted by a specific character(s).
delimited_range_with_embedded (chars, escape, id, patt, forbidden) Creates an LPeg pattern that matches a range of characters delimitted by a specific character(s) with an embedded pattern.
embed_language (parent, child, preproc) Embeds a child lexer language in a parent one.
make_embeddable (child, parent, start_token, end_token) Allows a child lexer to be embedded in a parent one.
nested_pair (start_chars, end_chars, end_optional) Creates an LPeg pattern that matches a range of characters delimitted by a set of nested delimitters.
rebuild_token (parent) (Re)constructs parent.Token.
rebuild_tokens (parent) (Re)constructs parent.Tokens.
starts_line (patt) Creates an LPeg pattern from a given pattern that matches the beginning of a line and returns it.
style (style_table) Creates a Scintilla style from a table of style properties.
token (id, patt) Creates an LPeg capture table index with the id and position of the capture.
word_list (word_table) Creates a table of given words for hash lookup.
word_match (word_list, word_chars, case_insensitive) Creates an LPeg pattern function that checks to see if the current word is in word_list, returning the index of the end of the word.

Tables

TokenOrder Ordered list of token identifiers for a specific lexer.
TokenPatterns List of token identifiers with associated LPeg patterns for a specific lexer.
colors Dark theme initial colors.
styles [Local table] Default (initial) Styles.
types [Local table] Default (initial) Types.


Functions

DefaultTypesAndStyles ()
Returns default Types and Styles common to most every lexer. Note this does not need to be called by the lexer. It is called for the lexer automatically when it is initialized.

Return value:

Types and Styles tables.
InitLexer (name)
Initializes the lexer language. Called by LexLPeg.cxx to initialize lexer.

Parameters

  • name: The name of the lexxing language.
RunFolder (text, start_pos, start_line, start_level)
Performs the folding of the document. Called by C++ lexer to fold the document. If the current Lexer has no Fold function, folding by indentation is performed unless forbidden by the 'fold.by.indentation' property.

Parameters

  • text: The document text to fold.
  • start_pos: The position in the document text starts at.
  • start_line: The line number text starts on.
  • start_level: The fold level text starts on.
RunLexer (text)
Performs the lexxing of the document. Called by C++ lexer to lexx the document. If the lexer has a LexByLine flag set, the document is lexxed one line at a time. If the lexer does not specify either the flag or a Lex function, the entire document is lexxed at once. But if a Lex function is defined, it is used for lexxing. The results of a lexxing is a table of LPeg captures. Each item in the table is another table that contains the string identifier of a token and a position in the document the identifier applies to. That table is iterated over, styling up to each position with a style number determined from the identifier.

Parameters

  • text: Document text to lexx.
add_style (id, style)
Adds a new Scintilla style to Scintilla.

Parameters

  • id: An identifier passed when creating a token.
  • style: A Scintilla style created from style().
See also: token , style
add_token (lexer, id, token_patt, exclude, pos)
Adds a token to a lexer's current ordered list of tokens.

Parameters

  • lexer: The lexer adding the token to.
  • id: The string identifier of patt.
  • token_patt: The LPeg pattern (returned by the 'token' function) associated with the identifier.
  • exclude: Optional flag indicating whether or not to exclude this token from lexer.Token when rebuilding. This flag would be set to true when tokens are only meant to be accessible to other lexers in the lexer.TokenPatterns table.
  • pos: Optional index to insert this token in TokenOrder.
color (r, g, b)
Creates a Scintilla color.

Parameters

  • r: The red component of the hexadecimal color [string].
  • g: The green component of the color [string].
  • b: The blue component of the color [string].
delimited_range (chars, escape, end_optional, balanced, forbidden)
Creates an LPeg pattern that matches a range of characters delimitted by a specific character(s). This can be used to match a string, parenthesis, etc.

Parameters

  • chars: The character(s) that bound the matched range.
  • escape: Optional escape character. This parameter may be omitted, nil, or the empty string.
  • end_optional: Optional flag indicating whether or not an ending delimiter is optional or not. If true, the range begun by the start delimiter matches until an end delimiter or the end of the input is reached. This is useful for finding unmatched delimiters.
  • balanced: Optional flag indicating whether or not that a a balanced range is matched. This flag only applies if 'chars' consists of two different characters, like parenthesis for example. Any character indicating the start of a range requires its end complement. When the complement of the first range-start character is found, the match ends.
  • forbidden: Optional string of characters forbidden in a delimited range. Each character is part of the set.
delimited_range_with_embedded (chars, escape, id, patt, forbidden)
Creates an LPeg pattern that matches a range of characters delimitted by a specific character(s) with an embedded pattern. This is useful for embedding additional lexers inside strings for example.

Parameters

  • chars: The character(s) that bound the matched range.
  • escape: Escape character. If there isn't one, nil or the empty string should be passed.
  • id: Specifies the identifier used to create tokens that match everything but patt.
  • patt: Pattern embedded in the range.
  • forbidden: Optional string of characters forbidden in a delimited range. Each character is part of the set.
embed_language (parent, child, preproc)
Embeds a child lexer language in a parent one. The 'make_embeddable' function must be called first to prepare the child lexer for embedding in the parent. The child's tokens are placed before the parent's and maybe inside other embedded lexers depending on the preproc argument.

Parameters

  • parent: The parent lexer language.
  • child: The child lexer language.
  • preproc: Boolean flag specifying if the child lexer is a preprocessor language. If so, its tokens are placed before all embedded lexers' tokens.
See also: make_embeddable
make_embeddable (child, parent, start_token, end_token)
Allows a child lexer to be embedded in a parent one. An appropriate entry in child.EmbeddedIn is created; then the 'embed_language' function can be called to embed the child lexer in the parent.

Parameters

  • child: The child lexer language.
  • parent: The parent lexer language.
  • start_token: The token that signals the beginning of the embedded lexer.
  • end_token: The token that signals the end of the embedded lexer.
nested_pair (start_chars, end_chars, end_optional)
Creates an LPeg pattern that matches a range of characters delimitted by a set of nested delimitters. Use this function for multi-character delimitters, delimited_range otherwise with balance set to 'true'. This is useful for languages with tokens such as nested block comments.

Parameters

  • start_chars: The string starting delimiter character sequence.
  • end_chars: The string ending delimiter character sequence.
  • end_optional: Optional flag indicating whether or not an ending delimiter is optional or not. If true, the range begun by the start delimiter matches until an end delimiter or the end of the input is reached. This is useful for finding unmatched delimiters.
rebuild_token (parent)
(Re)constructs parent.Token. Creates the token pattern from parent.TokenOrder, an ordered list of tokens. Rebuilding is useful for modifying parent's tokens for embedded lexers. Generally calling 'rebuild_tokens' is also necessary after this.

Parameters

  • parent: The parent lexer language.

Return value:

token pattern (for convenience), but parent.Token is still modified, so setting it manually is not necessary. See also: rebuild_tokens
rebuild_tokens (parent)
(Re)constructs parent.Tokens. This is generally called after 'rebuild_token' in order to create the pattern used to lexx input.

Parameters

  • parent: The parent lexer language.
See also: rebuild_token
starts_line (patt)
Creates an LPeg pattern from a given pattern that matches the beginning of a line and returns it.

Parameters

  • patt: The LPeg pattern to match at the beginning of a line.
style (style_table)
Creates a Scintilla style from a table of style properties.

Parameters

  • style_table: A table of style properties. Style properties available: font = [string] size = [integer] bold = [boolean] italic = [boolean] underline = [boolean] fore = [integer]* back = [integer]* eolfilled = [boolean] characterset = ? case = [integer] visible = [boolean] changeable = [boolean] hotspot = [boolean] * Use the value returned by the color function.
See also: color
token (id, patt)
Creates an LPeg capture table index with the id and position of the capture.

Parameters

  • id: The string identifier of patt. It is recommended to prefix id with something unique to your lexer if your lexer will be embedded in another one.
  • patt: The LPeg pattern associated with the identifier.
See also: add_style
word_list (word_table)
Creates a table of given words for hash lookup. This is usually used in conjunction with word_match.

Parameters

  • word_table: A table of words.
See also: word_match
word_match (word_list, word_chars, case_insensitive)
Creates an LPeg pattern function that checks to see if the current word is in word_list, returning the index of the end of the word. (Thus the pattern succeeds.)

Parameters

  • word_list: A word list constructed from word_list.
  • word_chars: Optional string of additional characters considered to be part of a word.
  • case_insensitive: Optional boolean flag indicating whether the word match is case-insensitive.
See also: word_list

Tables

TokenOrder
Ordered list of token identifiers for a specific lexer. Contains an ordered list (by numerical index) of token identifier strings. This is used in conjunction with TokenPatterns for building the Token and Tokens lexer variables. This table doesn't need to be modified manually, as calls to the 'add_token' function update this list appropriately.
TokenPatterns
List of token identifiers with associated LPeg patterns for a specific lexer. It provides a public interface to this lexer's tokens by other lexers. This list is used in conjunction with TokenOrder and also doesn't need to be modified manually.
colors
Dark theme initial colors. Fields
  • green: The color green.
  • blue: The color blue.
  • red: The color red.
  • yellow: The color yellow.
  • teal: The color teal.
  • white: The color white.
  • black: The color black.
  • grey: The color grey.
  • purple: The color purple.
  • orange: The color orange.
styles
[Local table] Default (initial) Styles. Contains style numbers and associated styles.
types
[Local table] Default (initial) Types. Contains token identifiers and associated style numbers. Fields
  • whitespace: The whitespace type (0).
  • default: The default type (1).
  • comment: The comment type (2).
  • string: The string type (3).
  • number: The number type (4).
  • keyword: The keyword type (5).
  • identifier: The identifier type (6).
  • operator: The operator type (7).
  • error: The error type (8).
  • preprocessor: The preprocessor type (9).
  • constant: The constant type (10).
  • function: The function type (11).
  • class: The class type (12).
  • type: The type type (13).

Valid XHTML 1.0!