source:
tracdarcs/INTERNALS.rst
@
190
| Revision 190, 5.4 KB checked in by lele@…, 2 years ago (diff) |
|---|
Internals
The entire darcs change history is imported into the database, using the output of darcs changes --xml-output --summary --reverse.
A check for newer patches is performed everytime the DarcsRepository object is created, and any new patches are immediately imported into the database.
After that the darcs repository is used only for fetching the contents of a file: with darcs 2.x we use darcs query contents, while with darcs 1.x we have to do ugly tricks; at the extreme, darcs annotate output is massaged by ann2ascii.py to fetch the contents of a file at any given point in time.
Each changeset is assigned a revision number according to their order in the output of darcs changes --xml-output --summary --reverse. The first patch gets a revision number of 1, and second revision number 2 etc...
This assumes that the patches in a darcs repository NEVER get reordered or deleted. This condition is satisfied as long as commands such as darcs unpull or darcs optimize are not performed.
Cache
For performance reasons, the backend creates and maintains a few other tables, where it keeps darcs specific information. The following tables are automatically created at upgrade time and populated by sync (see components.py).
darcs_changesets
Each row represents a darcs changeset:
create table darcs_changesets (
repo_id text,
rev integer,
hash text,
name text,
primary key (repo_id, rev));
- repo_id
- repository containing this changeset
- rev
- the revision number assigned
- hash
- the unique patch identifier assigned by darcs
- name
- the name of the darcs patch
darcs_nodes
Each row represents a single node: a node is either a file or a directory which has its history stored in the repository.
Note
a node doesn't have a particular name or content but, for a given revision, its name and content will be well defined.
create table darcs_nodes (
repo_id text,
node_id integer,
node_type text,
add_rev integer,
remove_rev integer,
primary key (repo_id, node_id) );
- node_type
- is one of (dbutil.NODE_FILE_TYPE, dbutil.NODE_DIR_TYPE)
- add_rev
- is the revision that added this node
- remove_rev
- is the revision that removed this node (possibly NULL)
darcs_node_changes
Each row represents a node change for a particular revision. Only one entry can exist for a node in each revision. Of course, if there are no changes to the node then no entries will be present! :)
create table darcs_node_changes (
repo_id text,
node_id integer,
rev integer,
path text,
parent_id integer,
the_change text,
primary key (repo_id, node_id,rev) );
- the_change
- one of following (defined in dbutil.py): CHANGE_ADDED, CHANGE_REMOVED, CHANGE_MOVED, CHANGE_EDITED, CHANGE_MOVED_EDITED
- parent_id
- the node id for the node's parent directory
- path
- the path of the node at the end of revision 'rev': when change is CHANGE_REMOVED then 'path' is the previous path.
darcs_cache
A cache of file contents: as soon as the content of any file at any particular revision is requested for the first time, it's computed and stored here, so succeeding requests won't require executing darcs at all.
Warning
this may quickly grow in size! OTOH, you can just delete all the rows at any time, the content will be recomputed when reasked.
create table darcs_cache (
repo_id text,
node_id integer,
rev integer,
content blob,
size integer,
primary key (repo_id, node_id,rev) );
Some sample queries
Get all existing nodes as of revision r
select dnc.node_id as node_id, max(dnc.rev) as rev from darcs_node_changes as dnc, darcs_nodes as dn where dnc.node_id = dn.node_id and dnc.rev <= r and dnc.repo_id = dn.repo_id and dnc.repo_id = 'somerepo' and (dn.remove_rev is null or dn.remove_rev > r) group by dnc.node_id
Get all latest nodes
select dnc.node_id as node_id, max(dnc.rev) as rev from darcs_node_changes as dnc, darcs_nodes as dn where dnc.node_id = dn.node_id and dn.remove_rev is null and dnc.repo_id = dn.repo_id and dnc.repo_id = 'somerepo' group by dnc.node_id
Get node_id of /some/path p, as of revision r
select dnc.node_id as node_id from darcs_node_changes as dnc, (node_rev(r)) as nr where dnc.node_id = nr.node_id and dnc.rev = nr.rev and dnc.repo_id = nr.repo_id and dnc.repo_id = 'somerepo' and dnc.path = p
Get history of node_id nid, till revision r
select * from darcs_node_changes as dnc where dnc.node_id = nid and dnc.rev <= r and dnc.repo_id = 'somerepo'
Get children of node_id nid, as of revision r
select dnc.node_id as node_id from darcs_node_changes as dnc, (node_rev(r)) as nr where dnc.node_id = nr.node_id and dnc.rev = nr.rev and dnc.parent_id = nid and dnc.repo_id = nr.repo_id and dnc.repo_id = 'somerepo'