init

`GITHUB_SEARCH_COLLECTION_NAME = 'github_search'` `module-attribute`

`LiteralQueryType = Literal['text', 'code']` `module-attribute`

`all = ['GITHUB_SEARCH_COLLECTION_NAME', 'CodeChunkType', 'CodeFileQdrantMetadata', 'GithubCollectionInitializer', 'GithubFilterArgs', 'GithubScrollResult', 'GithubSearchResult', 'GithubSearchResults', 'GithubSearchUOW', 'LiteralQueryType', 'TextFileQdrantMetadata', 'add_parsed_github_files', 'delete_repository_points', 'get_num_points', 'make_github_filter', 'scroll_all_points', 'scroll_points', 'search', 'search_file_results', 'update_repository_points', 'vector_stats_for_repo']` `module-attribute`

`CodeChunkType`

Bases: StrEnum

`CLASS = 'class'` `class-attribute` `instance-attribute`

`DOCSTRING = 'docstring'` `class-attribute` `instance-attribute`

`FUNCTION = 'function'` `class-attribute` `instance-attribute`

`IMPORT = 'import'` `class-attribute` `instance-attribute`

`OTHER = 'other'` `class-attribute` `instance-attribute`

`CodeFileQdrantMetadata`

Bases: AbstractQdrantMetadata

`code_chunk_type` `instance-attribute`

`full_chunk_info = Field(default=None, description='Additional information about the chunk optionally loaded from the database.')` `class-attribute` `instance-attribute`

`github_metadata` `instance-attribute`

`language` `instance-attribute`

`name` `instance-attribute`

`parent_classes = Field(default_factory=list)` `class-attribute` `instance-attribute`

`pk_id = Field(description='The primary key id of the related chunk in the database.')` `class-attribute` `instance-attribute`

`user_clerk_id` `instance-attribute`

`as_artifact()`

`as_content()`

`from_chunk_and_gh_metadata(chunk, gh_metadata, user_clerk_id, chunk_type, language, parent_classes=None)` `classmethod`

Create the metadata object that will be stored in qdrant.

Note: The full_chunk_info is not stored in qdrant, but can be loaded from the database when needed.

`GithubCollectionInitializer`

Bases: CollectionInitializer

`collection_name = COLLECTION_NAME` `class-attribute` `instance-attribute`

`default_indexes` `property`

`GithubFilterArgs`

Bases: BaseModel

`chunk_type = Field(default=None, description='The code chunk type to filter by.')` `class-attribute` `instance-attribute`

`file_path_end = Field(default=None, description="The end of the file path to filter by. (e.g. '.py', 'some_file.py', 'sub_dir/file.py')")` `class-attribute` `instance-attribute`

`file_path_start = Field(default=None, description="The start of the file path to filter by. (e.g. 'src/', 'src/main/')")` `class-attribute` `instance-attribute`

`language = Field(default=None, description='The language to filter by.')` `class-attribute` `instance-attribute`

`repo_id = Field(description='The repository to search in.')` `class-attribute` `instance-attribute`

`user_id = Field(description='The user to search as.')` `class-attribute` `instance-attribute`

`GithubScrollResult` `dataclass`

`last_point_id` `instance-attribute`

`results` `instance-attribute`

`init(results, last_point_id)`

`as_artifact()`

`as_content()`

`GithubSearchResult`

Bases: BaseModel

`content` `instance-attribute`

`metadata` `instance-attribute`

`qdrant_id` `instance-attribute`

`score` `instance-attribute`

`as_artifact()`

`as_content()`

`GithubSearchResults` `dataclass`

`results` `instance-attribute`

`init(results)`

`__post_init__()`

`as_artifact()`

`as_content()`

`GithubSearchUOW`

`collection_name = COLLECTION_NAME` `class-attribute`

`qdrant = qdrant` `instance-attribute`

`read_only_uow` `property`

`init(qdrant, read_only_uow=None)`

`TextFileQdrantMetadata`

Bases: AbstractQdrantMetadata

`chunk_index` `instance-attribute`

`github_metadata` `instance-attribute`

`language` `instance-attribute`

`pk_id` `instance-attribute`

`user_clerk_id` `instance-attribute`

`as_artifact()`

`as_content()`

`add_parsed_github_files(github_search_uow, parsed_github_files, user_id, wait=False)` `async`

Add parsed github files to the qdrant database.

`delete_repository_points(github_search_uow, repo_id, user_id, wait=False)` `async`

Delete all points for a specific repository id.

Note: Passing "user_id" to delete a public repo is not necessary, but it is allowed.

TODO 2024-12-18: Should I allow this, or make it require explicit "public"?

`get_num_points(uow, filter_args)` `async`

Get the number of points that match the given filter.

`make_github_filter(filter_args)`

Create a qdrant filter for searching for code chunks (embedded as text or code).

Note: The file_path filter is a basic qdrant text filter and can return unexpected results. Further filtering of results should be done on the returned data. Examples of search terms and unexpected results: - "some/inner/folder" -> "some/folder/inner" (order of words not preserved (I think)) - "file.txt" -> "some_file.txt", "another_file.txt" (substring matching (2025-01-02 -- may be fixed by WORD indexing))

PARAMETER	DESCRIPTION
`filter_args`	The filter arguments to use for creating the filter. TYPE: `GithubFilterArgs`

RETURNS	DESCRIPTION
`Filter`	A qdrant filter object that can be used for searching, scrolling, deleting, etc.

`scroll_all_points(uow, filter_args=None, num_per_iteration=100, load_additional_chunk_info=False)` `async`

Scroll through all points that match the given filter.

PARAMETER	DESCRIPTION
`uow`	The unit of work to use for the search. TYPE: `GithubSearchUOW`
`filter_args`	The filter to use for the scroll. TYPE: `GithubFilterArgs \| None` DEFAULT: `None`
`num_per_iteration`	The number of results to return in each iteration. TYPE: `int` DEFAULT: `100`
`load_additional_chunk_info`	Whether to load additional metadata from the db for the search results. TYPE: `bool` DEFAULT: `False`

`scroll_points(uow, filter_args=None, limit=100, from_point_id=None, load_additional_chunk_info=False)` `async`

Scroll through all points that match the given filter.

PARAMETER	DESCRIPTION
`uow`	The unit of work to use for the search. TYPE: `GithubSearchUOW`
`filter_args`	The filter to use for the scroll. TYPE: `GithubFilterArgs \| None` DEFAULT: `None`
`limit`	The maximum number of results to return. TYPE: `int` DEFAULT: `100`
`from_point_id`	The id of the last point from previous scroll to start from (won't be included). TYPE: `str \| None` DEFAULT: `None`
`load_additional_chunk_info`	Whether to load additional metadata from the db for the search results. TYPE: `bool` DEFAULT: `False`

`search_file_results(uow, query_text, query_type, filter_args, max_results=None, load_full_chunk_info=False)` `async`

Return whole files of any search results.

Note: This does not return additional metadata that is stored in db only for the search results (only the basic metadata stored in qdrant).

PARAMETER	DESCRIPTION
`uow`	The unit of work to use for the search. TYPE: `GithubSearchUOW`
`query_text`	The text to search for. TYPE: `str`
`query_type`	The type of query to perform (e.g. "text" or "code"). Determines which vectors to search against. TYPE: `LiteralQueryType`
`filter_args`	The filter arguments to use for the search. TYPE: `GithubFilterArgs`
`max_results`	The maximum number of results to return. TYPE: `int \| None` DEFAULT: `None`
`load_full_chunk_info`	Whether to load additional metadata from the db for the search results. TYPE: `bool` DEFAULT: `False`

`update_repository_points(uow, repo_id, user_id, update_info, wait=False)` `async`

Update the repository in the qdrant database.

Note: user_id must be explicitly provided as "public" if it's a public repository.

Note: This handles changed files by deleting them and adding the new version back in.

`vector_stats_for_repo(uow, repo_id, user_id, iter_size=100)` `async`

Get some basic stats about what is vectorized for the given repository.

Note: This streams back the results as they are found -- changing the results in-place.

I.e., Can show updates as they are found, but not useful to collect all the results as they will all be the same object.

PARAMETER	DESCRIPTION
`uow`	The unit of work to use for the search. TYPE: `GithubSearchUOW`
`repo_id`	The repository to get the stats for. TYPE: `RepoID`
`user_id`	The user (for permissions to access repo info). TYPE: `UserID \| Literal['public']`
`iter_size`	The number of results to scroll through at a time TYPE: `int` DEFAULT: `100`

init

GITHUB_SEARCH_COLLECTION_NAME = 'github_search' module-attribute

LiteralQueryType = Literal['text', 'code'] module-attribute

CodeChunkType

CLASS = 'class' class-attribute instance-attribute

DOCSTRING = 'docstring' class-attribute instance-attribute

FUNCTION = 'function' class-attribute instance-attribute

IMPORT = 'import' class-attribute instance-attribute

OTHER = 'other' class-attribute instance-attribute

CodeFileQdrantMetadata

code_chunk_type instance-attribute

full_chunk_info = Field(default=None, description='Additional information about the chunk optionally loaded from the database.') class-attribute instance-attribute

github_metadata instance-attribute

language instance-attribute

name instance-attribute

parent_classes = Field(default_factory=list) class-attribute instance-attribute

pk_id = Field(description='The primary key id of the related chunk in the database.') class-attribute instance-attribute

user_clerk_id instance-attribute

as_artifact()

as_content()

from_chunk_and_gh_metadata(chunk, gh_metadata, user_clerk_id, chunk_type, language, parent_classes=None) classmethod

GithubCollectionInitializer

collection_name = COLLECTION_NAME class-attribute instance-attribute

default_indexes property

GithubFilterArgs

chunk_type = Field(default=None, description='The code chunk type to filter by.') class-attribute instance-attribute

file_path_end = Field(default=None, description="The end of the file path to filter by. (e.g. '.py', 'some_file.py', 'sub_dir/file.py')") class-attribute instance-attribute

file_path_start = Field(default=None, description="The start of the file path to filter by. (e.g. 'src/', 'src/main/')") class-attribute instance-attribute

language = Field(default=None, description='The language to filter by.') class-attribute instance-attribute

repo_id = Field(description='The repository to search in.') class-attribute instance-attribute

user_id = Field(description='The user to search as.') class-attribute instance-attribute

GithubScrollResult dataclass

last_point_id instance-attribute

results instance-attribute

__init__(results, last_point_id)

as_artifact()

as_content()

GithubSearchResult

content instance-attribute

metadata instance-attribute

qdrant_id instance-attribute

score instance-attribute

as_artifact()

as_content()

GithubSearchResults dataclass

results instance-attribute

__init__(results)

__post_init__()

as_artifact()

as_content()

GithubSearchUOW

collection_name = COLLECTION_NAME class-attribute

qdrant = qdrant instance-attribute

read_only_uow property

__init__(qdrant, read_only_uow=None)

TextFileQdrantMetadata

chunk_index instance-attribute

github_metadata instance-attribute

language instance-attribute

pk_id instance-attribute

user_clerk_id instance-attribute

as_artifact()

as_content()

add_parsed_github_files(github_search_uow, parsed_github_files, user_id, wait=False) async

delete_repository_points(github_search_uow, repo_id, user_id, wait=False) async

TODO 2024-12-18: Should I allow this, or make it require explicit "public"?

get_num_points(uow, filter_args) async

make_github_filter(filter_args)

scroll_all_points(uow, filter_args=None, num_per_iteration=100, load_additional_chunk_info=False) async

scroll_points(uow, filter_args=None, limit=100, from_point_id=None, load_additional_chunk_info=False) async

search_file_results(uow, query_text, query_type, filter_args, max_results=None, load_full_chunk_info=False) async

update_repository_points(uow, repo_id, user_id, update_info, wait=False) async

vector_stats_for_repo(uow, repo_id, user_id, iter_size=100) async

`GITHUB_SEARCH_COLLECTION_NAME = 'github_search'` `module-attribute`

`LiteralQueryType = Literal['text', 'code']` `module-attribute`

`CodeChunkType`

`CLASS = 'class'` `class-attribute` `instance-attribute`

`DOCSTRING = 'docstring'` `class-attribute` `instance-attribute`

`FUNCTION = 'function'` `class-attribute` `instance-attribute`

`IMPORT = 'import'` `class-attribute` `instance-attribute`

`OTHER = 'other'` `class-attribute` `instance-attribute`

`CodeFileQdrantMetadata`

`code_chunk_type` `instance-attribute`

`full_chunk_info = Field(default=None, description='Additional information about the chunk optionally loaded from the database.')` `class-attribute` `instance-attribute`

`github_metadata` `instance-attribute`

`language` `instance-attribute`

`name` `instance-attribute`

`parent_classes = Field(default_factory=list)` `class-attribute` `instance-attribute`

`pk_id = Field(description='The primary key id of the related chunk in the database.')` `class-attribute` `instance-attribute`

`user_clerk_id` `instance-attribute`

`as_artifact()`

`as_content()`

`from_chunk_and_gh_metadata(chunk, gh_metadata, user_clerk_id, chunk_type, language, parent_classes=None)` `classmethod`

`GithubCollectionInitializer`

`collection_name = COLLECTION_NAME` `class-attribute` `instance-attribute`

`default_indexes` `property`

`GithubFilterArgs`

`chunk_type = Field(default=None, description='The code chunk type to filter by.')` `class-attribute` `instance-attribute`

`file_path_end = Field(default=None, description="The end of the file path to filter by. (e.g. '.py', 'some_file.py', 'sub_dir/file.py')")` `class-attribute` `instance-attribute`

`file_path_start = Field(default=None, description="The start of the file path to filter by. (e.g. 'src/', 'src/main/')")` `class-attribute` `instance-attribute`

`language = Field(default=None, description='The language to filter by.')` `class-attribute` `instance-attribute`

`repo_id = Field(description='The repository to search in.')` `class-attribute` `instance-attribute`

`user_id = Field(description='The user to search as.')` `class-attribute` `instance-attribute`

`GithubScrollResult` `dataclass`

`last_point_id` `instance-attribute`

`results` `instance-attribute`

`init(results, last_point_id)`

`as_artifact()`

`as_content()`

`GithubSearchResult`

`content` `instance-attribute`

`metadata` `instance-attribute`

`qdrant_id` `instance-attribute`

`score` `instance-attribute`

`as_artifact()`

`as_content()`

`GithubSearchResults` `dataclass`

`results` `instance-attribute`

`init(results)`

`__post_init__()`

`as_artifact()`

`as_content()`

`GithubSearchUOW`

`collection_name = COLLECTION_NAME` `class-attribute`

`qdrant = qdrant` `instance-attribute`

`read_only_uow` `property`

`init(qdrant, read_only_uow=None)`

`TextFileQdrantMetadata`

`chunk_index` `instance-attribute`

`github_metadata` `instance-attribute`

`language` `instance-attribute`

`pk_id` `instance-attribute`

`user_clerk_id` `instance-attribute`

`as_artifact()`

`as_content()`

`add_parsed_github_files(github_search_uow, parsed_github_files, user_id, wait=False)` `async`

`delete_repository_points(github_search_uow, repo_id, user_id, wait=False)` `async`

`get_num_points(uow, filter_args)` `async`

`make_github_filter(filter_args)`

`scroll_all_points(uow, filter_args=None, num_per_iteration=100, load_additional_chunk_info=False)` `async`

`scroll_points(uow, filter_args=None, limit=100, from_point_id=None, load_additional_chunk_info=False)` `async`

`search_file_results(uow, query_text, query_type, filter_args, max_results=None, load_full_chunk_info=False)` `async`

`update_repository_points(uow, repo_id, user_id, update_info, wait=False)` `async`

`vector_stats_for_repo(uow, repo_id, user_id, iter_size=100)` `async`