source: tracdarcs/README.rst @ 189

Revision 189, 8.9 KB checked in by lele@…, 3 years ago (diff)

Cosmetic and little fixes to the documentation

Darcs backend for Trac

This package implements a darcs backend for Trac supporting the new multirepository feature, that has been merged in the current Trac 0.12.

It used to work on 0.11, and even with 0.10, at least up to version 0.8 of the backend... No time to even try, so no guarantee, sorry!

To use the module you can either install it or make an egg and copy it in the right place.

Installation

You can install the module the usual way:

$ python setup.py install [--prefix /usr/local]

Otherwise you can make an egg:

$ python setup.py bdist_egg

and either install it globally with:

$ easy_install dist/TracDarcs-someversion.egg

or manually copy the egg from the "dist" subdir into the environment's "plugins" subdirectory

In general, follow the directions in TracPlugins.

Specific configuration options

Some feature can be altered by using the following trac+darcs specific options in the [darcs] section of the configuration:

command : string
This is the effective darcs executable that will be used. By default its darcs, but you could set it to /usr/local/bin/darcs to use a newer version...
dont_escape_8bit : boolean
False by default, maps to the darcs $DARCS_DONT_ESCAPE_8BIT behaviour.
possible_encodings : string
By default 'utf-8,iso8859-1', its a comma-separated-value list of possible string encodings to try one after the other, should a decode error occur while parsing darcs changesets.
max_concurrent_darcses : integer
By default 0 to mean no limits, otherwise it is the maximum number of concurrent running darcs processes at the same time.
eager_annotations : boolean
False by default, when true the content and the annotation cache of each modified file get computed immediately after a changeset gets added. This will move the heavy computation at pull time, rather than at first visit time. Of course, it will also enlarge the trac database...

Using a postcommit hook

The recommended way to trigger the sync between the repository and the Trac instance is by using a darcs post hook on its apply: in this way the database will be updated as soon as darcs finish applying any new changeset.

This can be accomplished by putting something like the following setting into the repository _darcs/prefs/defaults file:

apply posthook trac-admin TRAC_ENV changeset added $(pwd) $(python -m tracdarcs.changesparser)
apply run-posthook

where of course you should replace TRAC_ENV with the full path of the related trac instance.

Note

python -m tracdarcs.changesparser is just a quick way of extracting the list of changeset hashes from the the darcs changes --xml format: it accepts the input either as the $DARCS_PATCHES_XML or from standard input:

$ darcs changes --xml | python -m tracdarcs.changesparser | head -3
20100611081300-97f81-bc5c1f7acf0c168bbfa9fb911e3cc2a4e71d5eef
20100610150339-97f81-cd1b73f2ba1b1d98c28542ecbd1d5e2bd9052056
20100512164420-97f81-de3fbc73d7c401fb92503ef1b25e19e0f48d2ad1

At that point, you could deactivate the per request sync that Trac still does by default, by setting repository_sync_per_request to an empty value in the [trac] section of the configuration.

Internals

The entire darcs change history is imported into the database, using the output of darcs changes --xml-output --summary --reverse.

A check for newer patches is performed everytime the DarcsRepository object is created, and any new patches are immediately imported into the database.

After that the darcs repository is used only for fetching the contents of a file: with darcs 2.x we use darcs query contents, while with darcs 1.x we have to do ugly tricks; at the extreme, darcs annotate output is massaged by ann2ascii.py to fetch the contents of a file at any given point in time.

Each changeset is assigned a revision number according to their order in the output of darcs changes --xml-output --summary --reverse. The first patch gets a revision number of 1, and second revision number 2 etc...

This assumes that the patches in a darcs repository NEVER get reordered or deleted. This condition is satisfied as long as commands such as darcs unpull or darcs optimize are not performed.

Cache

For performance reasons, the backend creates and maintains a few other tables, where it keeps darcs specific information. The following tables are automatically created at upgrade time and populated by sync (see components.py).

darcs_changesets

Each row represents a darcs changeset:

create table darcs_changesets (
    repo_id text,
    rev integer,
    hash text,
    name text,
    primary key (repo_id, rev));
repo_id
repository containing this changeset
rev
the revision number assigned
hash
the unique patch identifier assigned by darcs
name
the name of the darcs patch

darcs_nodes

Each row represents a single node: a node is either a file or a directory which has its history stored in the repository.

Note

a node doesn't have a particular name or content but, for a given revision, its name and content will be well defined.

create table darcs_nodes (
    repo_id text,
    node_id integer,
    node_type text,
    add_rev integer,
    remove_rev integer,
    primary key (repo_id, node_id) );
node_type
is one of (dbutil.NODE_FILE_TYPE, dbutil.NODE_DIR_TYPE)
add_rev
is the revision that added this node
remove_rev
is the revision that removed this node (possibly NULL)

darcs_node_changes

Each row represents a node change for a particular revision. Only one entry can exist for a node in each revision. Of course, if there are no changes to the node then no entries will be present! :)

create table darcs_node_changes (
    repo_id text,
    node_id integer,
    rev integer,
    path text,
    parent_id integer,
    the_change text,
    primary key (repo_id, node_id,rev) );
the_change
one of following (defined in dbutil.py): CHANGE_ADDED, CHANGE_REMOVED, CHANGE_MOVED, CHANGE_EDITED, CHANGE_MOVED_EDITED
parent_id
the node id for the node's parent directory
path
the path of the node at the end of revision 'rev': when change is CHANGE_REMOVED then 'path' is the previous path.

darcs_cache

A cache of file contents: as soon as the content of any file at any particular revision is requested for the first time, it's computed and stored here, so succeeding requests won't require executing darcs at all.

Warning

this may quickly grow in size! OTOH, you can just delete all the rows at any time, the content will be recomputed when reasked.

create table darcs_cache (
    repo_id text,
    node_id integer,
    rev integer,
    content blob,
    size integer,
    primary key (repo_id, node_id,rev) );

Some sample queries

Get all existing nodes as of revision r

select dnc.node_id as node_id, max(dnc.rev) as rev
from darcs_node_changes as dnc, darcs_nodes as dn
where dnc.node_id = dn.node_id
  and dnc.rev <= r
  and dnc.repo_id = dn.repo_id and dnc.repo_id = 'somerepo'
  and (dn.remove_rev is null or dn.remove_rev > r)
group by dnc.node_id

Get all latest nodes

select dnc.node_id as node_id, max(dnc.rev) as rev
from darcs_node_changes as dnc, darcs_nodes as dn
where dnc.node_id = dn.node_id
  and dn.remove_rev is null
  and dnc.repo_id = dn.repo_id and dnc.repo_id = 'somerepo'
group by dnc.node_id

Get node_id of /some/path p, as of revision r

select dnc.node_id as node_id
from darcs_node_changes as dnc, (node_rev(r)) as nr
where dnc.node_id = nr.node_id
  and dnc.rev = nr.rev
  and dnc.repo_id = nr.repo_id and dnc.repo_id = 'somerepo'
  and dnc.path = p

Get history of node_id nid, till revision r

select * from darcs_node_changes as dnc
where dnc.node_id = nid and dnc.rev <= r
  and dnc.repo_id = 'somerepo'

Get children of node_id nid, as of revision r

select dnc.node_id as node_id
from darcs_node_changes as dnc, (node_rev(r)) as nr
where dnc.node_id = nr.node_id
  and dnc.rev = nr.rev
  and dnc.parent_id = nid
  and dnc.repo_id = nr.repo_id and dnc.repo_id = 'somerepo'
Note: See TracBrowser for help on using the repository browser.