Ticket #171 (new defect)

Opened 21 months ago

Last modified 21 months ago

converting darcs' darcs to git results in a corrupted repo

Reported by: vmiklos Owned by: lele
Priority: major Milestone: VersionOne
Component: tailor Version: 0.9
Keywords: Cc:

Description (last modified by lele) (diff)

Hi,

Here is the config I used:

$ cat config

[DEFAULT]
encoding-errors-policy = replace

[sandbox]
source = darcs:sandbox
target = git:sandbox

[darcs:sandbox]
subdir = darcs
repository = /path/to/sandbox

[git:sandbox]
subdir = git
repository = /path/to/sandbox.git

Where sandbox is http://code.haskell.org/darcs/big-zoo/darcs-repo-2008-10-31.tar.bz2

I started the conversion in non-verbose mode, it converted 6422 of 6548 changesets and exited without any error. As I guessed, the result does not match the original repo.

Given that the repo is public, I hope you can reproduce the error.

Sadly I'm not sure where the error occures, the conversion took 373 minutes on my machine.

$ darcs --version 2.1.2 (+ 266 patches)

$ git --version git version 1.6.0.1

I am using darcs from the darcs repo (and not the latest release) as I had other problems and the suggested fix on the mailing list was to use the version from the repo.

If I missed any important info, please let me know.

Thanks.

Change History

Changed 21 months ago by lele

  • description modified (diff)

I tried this out, and I clearly see something went wrong, although I can't say in which way...

First of all, it took more than 20 hours here to complete the migration, on a dual core AMD64 with 2Gb of RAM...

Tailor completed its task without errors, effectively producing 6422 git changesets out of 6548 darcs changesets. I do not have time right now to investigate further, but I bet that many of the "missing" changes are related to darcs operations that have no impact on the source tree, such as setprefs for example.

What makes me sad is seeing the huge difference in the resulting trees: looking at the root directory alone, the darcs side contains just 20 entries while there are 45 in the git side. Quickly inspecting one entry, "/darcs-createrepo.lhs" I see that

$ darcs cha --count --match 'touch darcs-createrepo.lhs'
27

while

$ git log darcs-createrepo.lhs |grep '^commit ' | wc -l
23

and apparently the last patch that touched it, effectively removing the file, did not have the right effect on the git side:

2008-11-30 11:20:42     INFO: Upstream revision "resolve conflicts" by Tommy Pettersson <ptp@lysator.liu.se>, 2006-02-19 21:32:18+00:00
2008-11-30 11:20:42     INFO: /tmp/t171/darcs $ darcs pull --all --quiet --match "hash 20060219213218-145ad-ce338c2967cf6eef00b0907cfa0f35d52005e547.gz" 2>&1
2008-11-30 11:20:57     INFO: [Ok]
2008-11-30 11:20:57     INFO: /tmp/t171/darcs $ darcs changes --match "hash 20060219213218-145ad-ce338c2967cf6eef00b0907cfa0f35d52005e547.gz" --xml-output --summ
2008-11-30 11:20:57     INFO: [Ok]
2008-11-30 11:20:57     INFO: $ rsync --archive --exclude _darcs --exclude .git /tmp/t171/darcs/ /tmp/t171/git
2008-11-30 11:20:57     INFO: [Ok]
2008-11-30 11:20:57     INFO: /tmp/t171/git $ git update-index Add.lhs Apply.lhs Depends.lhs PatchMatch.lhs RepoFormat.lhs best_practices.tex
2008-11-30 11:20:57     INFO: [Ok]
2008-11-30 11:20:57     INFO: /tmp/t171/git $ git update-index --remove darcs-createrepo.lhs
2008-11-30 11:20:57     INFO: [Ok]
2008-11-30 11:20:57     INFO: /tmp/t171/git $ git update-index darcs.lhs
2008-11-30 11:20:57     INFO: [Ok]
2008-11-30 11:20:57     INFO: /tmp/t171/git $ git status
2008-11-30 11:20:57     INFO: [Ok]
2008-11-30 11:20:57     INFO: /tmp/t171/git $ git add -u
2008-11-30 11:20:57     INFO: [Ok]
2008-11-30 11:20:57     INFO: /tmp/t171/git $ git write-tree
2008-11-30 11:20:57     INFO: [Ok]
2008-11-30 11:20:57     INFO: /tmp/t171/git $ git rev-parse HEAD 2>&1
2008-11-30 11:20:57     INFO: [Ok]
2008-11-30 11:20:57     INFO: /tmp/t171/git $ git commit-tree 27bda32db3e19eaa4c9fe05a06b63c8861bc8505 -p c84c434ef261a767924e9eccf39d31e3b1842bc0
2008-11-30 11:20:57     INFO: [Ok]
2008-11-30 11:20:57     INFO: /tmp/t171/git $ git update-ref HEAD 6f324cea4291855525d71cd978fb186454177d6a c84c434ef261a767924e9eccf39d31e3b1842bc0
2008-11-30 11:20:57     INFO: [Ok]

Changed 21 months ago by vmiklos

OK, here is a way I think I can quickly reproduce a similar problem:

git clone git://vmiklos.hu/darcs-fast-export

cd darcs-fast-export/t

sh test2-git.sh

this will create a darcs2 repo under 'test2'.

when I convert this to git with the same config as above, the result differs as well.

Hope this helps. :-)

Changed 21 months ago by vmiklos

OK, I have a bad and a good news.

The good one is that I figured out what is the problem in a small testcase.

The bad one is that I really have no idea how to solve it.

Create the following repo:

dr init echo a > a dr add a dr rec -a -m i dr mv a b rm b dr rec echo a > a echo b > b dr add a b dr rec -a -m "add a b" rm b dr rec dr mv a b dr amend-rec

and if now you do a dr chan --xml -s, you get the same output for two totally different cases:

1) rename A B, and remove B

2) remove A, rename A B

and there is no way to figure out which one did you want to do. In other words as long as darcs puts those "move" lines on top of the xml output and tailor uses only the xml output for info, it can't properly convert this repo.

Feel free to prove me wrong. :-S

Thanks.

Changed 21 months ago by vmiklos

Just before I forget it, let me add that in fact this seem to be a darcs bug, so a method could be to fix it in darcs, then no workaround will be needed in tailor.

The relevant darcs bug is http://bugs.darcs.net/issue1281.

Note: See TracTickets for help on using tickets.