Skip to content

init

Parsing (chunking) of individual code files.

Either use treesitter to parse information from code files in a standard format. Or fall back to a more general text splitting method.

Information generated is limited to individual files (i.e. won't include file paths etc.). When parsing, treesitter treats the file as the root node. Information is only extracted within that.

If unable to parse with treesitter, a simple code textsplitter is used instead to generate generic chunks.

Note: These produce general information about contents of the file. More specific use cases (e.g. Code searching service) should add a layer to extract/add any the specific information they need.

__all__ = ['get_language_from_file_ext', 'get_language_from_str', 'parse_file', 'parse_text'] module-attribute

get_language_from_file_ext(file_ext)

get_language_from_str(lang)

parse_file(file_contents, lang_or_ext)

Parse a code file into chunks.

PARAMETER DESCRIPTION
file_contents

Contents of the file to parse.

TYPE: str

lang_or_ext

Language or file extension of the file. (note: FileLanguage is a Union of enums)

TYPE: FileLanguage | str

RETURNS DESCRIPTION
CodeFileChunks | GeneralTextChunks

Either FileInfo from treesitter parsing or TextSplitChunks from text splitter parsing.

parse_text(contents, lang, splitter_args=None)