init
__all__ = ['AbstractGithubParsingUnitOfWork', 'ParsedCodeFileID', 'ParsedFileID', 'ParsedTextFileID', 'SQLGithubParsingUnitOfWork', 'delete_repository', 'get_repository_files', 'initialize_new_repository', 'parse_github_tree', 'parsed_repo_stats', 'update_repository']
module-attribute
AbstractGithubParsingUnitOfWork
Bases: AbstractDatabaseUnitOfWork, ABC
github_adapter
abstractmethod
property
repository
instance-attribute
ParsedCodeFileID
Bases: ParsedFileID
ParsedFileID
Bases: UniqueIDBase[ParsedGithubFile], ABC
branch
instance-attribute
node_id
instance-attribute
owner
instance-attribute
path
instance-attribute
repo
instance-attribute
equal_excluding_version_and_deletion(other)
from_schema(schema, exact_version=False)
classmethod
ParsedTextFileID
Bases: ParsedFileID
SQLGithubParsingUnitOfWork
Bases: SessionDatabaseUnitOfWork, AbstractGithubParsingUnitOfWork
github_adapter
property
repository_class
property
__init__(session_maker, github_adapter)
delete_repository(uow, repo_id, user_id)
async
get_repository_files(uow, repo_id, user_id, fully_load=False)
async
Get all parsed files for a specific repository id.
| PARAMETER | DESCRIPTION |
|---|---|
uow
|
The unit of work to use for the database operations. |
repo_id
|
The owner/repo/branch to get the tree for.
TYPE:
|
user_id
|
ID of user for this repository (or "public").
TYPE:
|
fully_load
|
Whether to fully load the parsed files (i.e. load all nested objects).
TYPE:
|
initialize_new_repository(uow, repo_id, user_id, max_size_fill=10000, as_repo_id=None)
async
Combine getting repository from github, parsing, and saving to database.
This only applies for a new repository (one that is not already stored in the database).
| PARAMETER | DESCRIPTION |
|---|---|
uow
|
The unit of work to use for the database operations. |
repo_id
|
The owner/repo/branch to get the tree for.
TYPE:
|
user_id
|
ID of user for this repository (or "public" if it's a public repo).
TYPE:
|
max_size_fill
|
If the size of the blob is greater than this (in bytes), don't fill the text
TYPE:
|
as_repo_id
|
The RepoID to store the updated files as. If None, will use the repo_id.
TYPE:
|
parse_github_tree(git_tree, repo_id, clerk_owner)
Walk through all blobs in a git tree parsing them into objects.
Where objects contain data suitable for storing in database/qdrant (specifically for code search).
| PARAMETER | DESCRIPTION |
|---|---|
git_tree
|
The tree to walk through.
TYPE:
|
repo_id
|
The repository ID.
TYPE:
|
clerk_owner
|
The owner of the repository (either clerk_id or "public").
TYPE:
|
parsed_repo_stats(uow, repo_id, user_id)
async
update_repository(uow, repo_id, user_id, max_size_fill=10000, from_repo_id=None, modify_from_repo=False, as_repo_id=None)
async
Coordinate the github service and database store of repo to update contents to latest version of repo_id.
I.e., If the repo was previously parsed, but there have been new commits since then, this will make any necessary updates to the database and return information about the changes made.
| PARAMETER | DESCRIPTION |
|---|---|
uow
|
The unit of work to use for the database operations. |
repo_id
|
The owner/repo/branch to get the tree for.
TYPE:
|
user_id
|
ID of user for this repository (or "public" if it's a public repo).
TYPE:
|
max_size_fill
|
Ignore any blobs that are larger than this size (in bytes).
TYPE:
|
from_repo_id
|
The RepoID to compare against. E.g., A different branch of the same repository, or specific commit hash.
TYPE:
|
modify_from_repo
|
Whether to change the from_repo_id entries in the database. If from_repo_id references a
specific commit on the same branch, this should generally be True (only want to keep the latest version of
a branch), but if it references a different branch, it might make sense for this to be false (e.g., use the
TYPE:
|
as_repo_id
|
The RepoID to store the updated files as. If None, will use the repo_id.
TYPE:
|