init
Parsing (chunking) of individual code files.
Either use treesitter to parse information from code files in a standard format. Or fall back to a more general text splitting method.
Information generated is limited to individual files (i.e. won't include file paths etc.). When parsing, treesitter treats the file as the root node. Information is only extracted within that.
If unable to parse with treesitter, a simple code textsplitter is used instead to generate generic chunks.
Note: These produce general information about contents of the file. More specific use cases (e.g. Code searching service) should add a layer to extract/add any the specific information they need.
__all__ = ['get_language_from_file_ext', 'get_language_from_str', 'parse_file', 'parse_text']
module-attribute
get_language_from_file_ext(file_ext)
get_language_from_str(lang)
parse_file(file_contents, lang_or_ext)
Parse a code file into chunks.
| PARAMETER | DESCRIPTION |
|---|---|
file_contents
|
Contents of the file to parse.
TYPE:
|
lang_or_ext
|
Language or file extension of the file. (note: FileLanguage is a Union of enums)
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
CodeFileChunks | GeneralTextChunks
|
Either FileInfo from treesitter parsing or TextSplitChunks from text splitter parsing. |