source: tracdarcs/INTERNALS.rst @ 190

Revision 190, 5.4 KB checked in by lele@…, 2 years ago (diff)

Split out the internals note from the README.rst, now used as the long_description of the package

Internals

The entire darcs change history is imported into the database, using the output of darcs changes --xml-output --summary --reverse.

A check for newer patches is performed everytime the DarcsRepository object is created, and any new patches are immediately imported into the database.

After that the darcs repository is used only for fetching the contents of a file: with darcs 2.x we use darcs query contents, while with darcs 1.x we have to do ugly tricks; at the extreme, darcs annotate output is massaged by ann2ascii.py to fetch the contents of a file at any given point in time.

Each changeset is assigned a revision number according to their order in the output of darcs changes --xml-output --summary --reverse. The first patch gets a revision number of 1, and second revision number 2 etc...

This assumes that the patches in a darcs repository NEVER get reordered or deleted. This condition is satisfied as long as commands such as darcs unpull or darcs optimize are not performed.

Cache

For performance reasons, the backend creates and maintains a few other tables, where it keeps darcs specific information. The following tables are automatically created at upgrade time and populated by sync (see components.py).

darcs_changesets

Each row represents a darcs changeset:

create table darcs_changesets (
    repo_id text,
    rev integer,
    hash text,
    name text,
    primary key (repo_id, rev));
repo_id
repository containing this changeset
rev
the revision number assigned
hash
the unique patch identifier assigned by darcs
name
the name of the darcs patch

darcs_nodes

Each row represents a single node: a node is either a file or a directory which has its history stored in the repository.

Note

a node doesn't have a particular name or content but, for a given revision, its name and content will be well defined.

create table darcs_nodes (
    repo_id text,
    node_id integer,
    node_type text,
    add_rev integer,
    remove_rev integer,
    primary key (repo_id, node_id) );
node_type
is one of (dbutil.NODE_FILE_TYPE, dbutil.NODE_DIR_TYPE)
add_rev
is the revision that added this node
remove_rev
is the revision that removed this node (possibly NULL)

darcs_node_changes

Each row represents a node change for a particular revision. Only one entry can exist for a node in each revision. Of course, if there are no changes to the node then no entries will be present! :)

create table darcs_node_changes (
    repo_id text,
    node_id integer,
    rev integer,
    path text,
    parent_id integer,
    the_change text,
    primary key (repo_id, node_id,rev) );
the_change
one of following (defined in dbutil.py): CHANGE_ADDED, CHANGE_REMOVED, CHANGE_MOVED, CHANGE_EDITED, CHANGE_MOVED_EDITED
parent_id
the node id for the node's parent directory
path
the path of the node at the end of revision 'rev': when change is CHANGE_REMOVED then 'path' is the previous path.

darcs_cache

A cache of file contents: as soon as the content of any file at any particular revision is requested for the first time, it's computed and stored here, so succeeding requests won't require executing darcs at all.

Warning

this may quickly grow in size! OTOH, you can just delete all the rows at any time, the content will be recomputed when reasked.

create table darcs_cache (
    repo_id text,
    node_id integer,
    rev integer,
    content blob,
    size integer,
    primary key (repo_id, node_id,rev) );

Some sample queries

Get all existing nodes as of revision r

select dnc.node_id as node_id, max(dnc.rev) as rev
from darcs_node_changes as dnc, darcs_nodes as dn
where dnc.node_id = dn.node_id
  and dnc.rev <= r
  and dnc.repo_id = dn.repo_id and dnc.repo_id = 'somerepo'
  and (dn.remove_rev is null or dn.remove_rev > r)
group by dnc.node_id

Get all latest nodes

select dnc.node_id as node_id, max(dnc.rev) as rev
from darcs_node_changes as dnc, darcs_nodes as dn
where dnc.node_id = dn.node_id
  and dn.remove_rev is null
  and dnc.repo_id = dn.repo_id and dnc.repo_id = 'somerepo'
group by dnc.node_id

Get node_id of /some/path p, as of revision r

select dnc.node_id as node_id
from darcs_node_changes as dnc, (node_rev(r)) as nr
where dnc.node_id = nr.node_id
  and dnc.rev = nr.rev
  and dnc.repo_id = nr.repo_id and dnc.repo_id = 'somerepo'
  and dnc.path = p

Get history of node_id nid, till revision r

select * from darcs_node_changes as dnc
where dnc.node_id = nid and dnc.rev <= r
  and dnc.repo_id = 'somerepo'

Get children of node_id nid, as of revision r

select dnc.node_id as node_id
from darcs_node_changes as dnc, (node_rev(r)) as nr
where dnc.node_id = nr.node_id
  and dnc.rev = nr.rev
  and dnc.parent_id = nid
  and dnc.repo_id = nr.repo_id and dnc.repo_id = 'somerepo'
Note: See TracBrowser for help on using the repository browser.