Skip to content

Parsing

Parse a code file into meaningful chunks.

First tries to use treesitter to generate chunks, falls back to a more basic text spliter if treesitter parsing not available or fails.

Note: 2024-11-25 -- This is generally CPU limited work that does not benefit from threading.

logger = logging.getLogger(__name__) module-attribute

parse_file(file_contents, lang_or_ext)

Parse a code file into chunks.

PARAMETER DESCRIPTION
file_contents

Contents of the file to parse.

TYPE: str

lang_or_ext

Language or file extension of the file. (note: FileLanguage is a Union of enums)

TYPE: FileLanguage | str

RETURNS DESCRIPTION
CodeFileChunks | GeneralTextChunks

Either FileInfo from treesitter parsing or TextSplitChunks from text splitter parsing.