Skip to content

Github file parsing

ChangedInfo dataclass

new instance-attribute

old instance-attribute

__init__(old, new)

FullBlob

Bases: BaseModel

A MiniBlob with additional info extracted from the TreeEntry.

This is easier to work with for embedding etc. since it contains both the blob contents and additional metadata from its location within a tree.

is_binary instance-attribute

name instance-attribute

node_id instance-attribute

path instance-attribute

size instance-attribute

text instance-attribute

from_miniblob(mini_blob, path, name) classmethod

from_tree_entry(tree_entry) classmethod

GithubMetadata

Bases: DatabaseSchemaMixin, SchemaBase

branch instance-attribute

node_id instance-attribute

owner instance-attribute

path instance-attribute

repo instance-attribute

size_bytes instance-attribute

as_content()

from_repo_and_blob(repo_id, blob) classmethod

update_from_repo_id(repo_id)

ParsedCodeFile

Bases: ParsedGithubFile

clerk_owner instance-attribute

code_file_chunks instance-attribute

full_content instance-attribute

github_metadata instance-attribute

iterate_nested_schemas = True class-attribute

new_instance_on_change = False class-attribute

from_chunks_and_metadata(chunks, metadata, clerk_owner) classmethod

ParsedGithubFile

Bases: DatabaseSchemaMixin, SchemaBase, ABC

clerk_owner = Field(description="The owner of the repository (either clerk_id or 'public')") class-attribute instance-attribute

full_content = Field(description='The full content of the file.') class-attribute instance-attribute

github_metadata instance-attribute

ParsedRepositoryInfo

Bases: BaseModel

Summarize info about parsed files stored in db.

num_code_files instance-attribute

num_text_files instance-attribute

ParsedTextFile

Bases: ParsedGithubFile

clerk_owner instance-attribute

full_content instance-attribute

general_text_chunks instance-attribute

github_metadata instance-attribute

iterate_nested_schemas = True class-attribute

new_instance_on_change = False class-attribute

from_chunks_and_metadata(chunks, metadata, clerk_owner, full_content) classmethod

RenamedInfo dataclass

new instance-attribute

old instance-attribute

__init__(old, new)

RepoID

Bases: BaseModel

branch = 'main' class-attribute instance-attribute

owner = Field(description='The (Case Sensitive) owner/org of the repository.') class-attribute instance-attribute

repo instance-attribute

__str__()

UpdatedRepositoryInfo dataclass

added = field(default_factory=list) class-attribute instance-attribute

changed = field(default_factory=list) class-attribute instance-attribute

deleted = field(default_factory=list) class-attribute instance-attribute

renamed = field(default_factory=list) class-attribute instance-attribute

__add__(other)

__init__(added=list(), changed=list(), renamed=list(), deleted=list())

as_artifact()

as_content()