Skip to content

init

__all__ = ['AbstractGithubParsingUnitOfWork', 'ParsedCodeFileID', 'ParsedFileID', 'ParsedTextFileID', 'SQLGithubParsingUnitOfWork', 'delete_repository', 'get_repository_files', 'initialize_new_repository', 'parse_github_tree', 'parsed_repo_stats', 'update_repository'] module-attribute

AbstractGithubParsingUnitOfWork

Bases: AbstractDatabaseUnitOfWork, ABC

github_adapter abstractmethod property

repository instance-attribute

ParsedCodeFileID

Bases: ParsedFileID

ParsedFileID

Bases: UniqueIDBase[ParsedGithubFile], ABC

branch instance-attribute

node_id instance-attribute

owner instance-attribute

path instance-attribute

repo instance-attribute

equal_excluding_version_and_deletion(other)

from_schema(schema, exact_version=False) classmethod

ParsedTextFileID

Bases: ParsedFileID

SQLGithubParsingUnitOfWork

Bases: SessionDatabaseUnitOfWork, AbstractGithubParsingUnitOfWork

github_adapter property

repository_class property

__init__(session_maker, github_adapter)

delete_repository(uow, repo_id, user_id) async

Delete all parsed files for a specific repository id.

PARAMETER DESCRIPTION
uow

The unit of work to use for the database operations.

TYPE: AbstractGithubParsingUnitOfWork

repo_id

The owner/repo/branch to get the tree for.

TYPE: RepoID

user_id

ID of user for this repository (or "public").

TYPE: UserID | Literal['public']

get_repository_files(uow, repo_id, user_id, fully_load=False) async

Get all parsed files for a specific repository id.

PARAMETER DESCRIPTION
uow

The unit of work to use for the database operations.

TYPE: AbstractGithubParsingUnitOfWork

repo_id

The owner/repo/branch to get the tree for.

TYPE: RepoID

user_id

ID of user for this repository (or "public").

TYPE: UserID | Literal['public']

fully_load

Whether to fully load the parsed files (i.e. load all nested objects).

TYPE: bool DEFAULT: False

initialize_new_repository(uow, repo_id, user_id, max_size_fill=10000, as_repo_id=None) async

Combine getting repository from github, parsing, and saving to database.

This only applies for a new repository (one that is not already stored in the database).

PARAMETER DESCRIPTION
uow

The unit of work to use for the database operations.

TYPE: AbstractGithubParsingUnitOfWork

repo_id

The owner/repo/branch to get the tree for.

TYPE: RepoID

user_id

ID of user for this repository (or "public" if it's a public repo).

TYPE: UserID | Literal['public']

max_size_fill

If the size of the blob is greater than this (in bytes), don't fill the text

TYPE: int DEFAULT: 10000

as_repo_id

The RepoID to store the updated files as. If None, will use the repo_id.

TYPE: RepoID | None DEFAULT: None

parse_github_tree(git_tree, repo_id, clerk_owner)

Walk through all blobs in a git tree parsing them into objects.

Where objects contain data suitable for storing in database/qdrant (specifically for code search).

PARAMETER DESCRIPTION
git_tree

The tree to walk through.

TYPE: Tree

repo_id

The repository ID.

TYPE: RepoID

clerk_owner

The owner of the repository (either clerk_id or "public").

TYPE: str

parsed_repo_stats(uow, repo_id, user_id) async

update_repository(uow, repo_id, user_id, max_size_fill=10000, from_repo_id=None, modify_from_repo=False, as_repo_id=None) async

Coordinate the github service and database store of repo to update contents to latest version of repo_id.

I.e., If the repo was previously parsed, but there have been new commits since then, this will make any necessary updates to the database and return information about the changes made.

PARAMETER DESCRIPTION
uow

The unit of work to use for the database operations.

TYPE: AbstractGithubParsingUnitOfWork

repo_id

The owner/repo/branch to get the tree for.

TYPE: RepoID

user_id

ID of user for this repository (or "public" if it's a public repo).

TYPE: UserID | Literal['public']

max_size_fill

Ignore any blobs that are larger than this size (in bytes).

TYPE: int DEFAULT: 10000

from_repo_id

The RepoID to compare against. E.g., A different branch of the same repository, or specific commit hash.

TYPE: RepoID | None DEFAULT: None

modify_from_repo

Whether to change the from_repo_id entries in the database. If from_repo_id references a specific commit on the same branch, this should generally be True (only want to keep the latest version of a branch), but if it references a different branch, it might make sense for this to be false (e.g., use the main branch as a starting point for creating updates for a new branch, but keep the main branch entries)

TYPE: bool DEFAULT: False

as_repo_id

The RepoID to store the updated files as. If None, will use the repo_id.

TYPE: RepoID | None DEFAULT: None