Ticket #28 (closed defect: worksforme)
Encoding problems when converting svn -> darcs (or bzr).
| Reported by: | Luca <luca@…> | Owned by: | lele |
|---|---|---|---|
| Priority: | critical | Milestone: | VersionOne |
| Component: | svn | Version: | 0.9 |
| Keywords: | svn svndump non ascii error í | Cc: |
Description
I'm having some problems with the encoding of the character í ('i' with acute accent, í as HTML entity). When converting a svn repository to darcs I get this message:
00:37:07 [I] Changeset "43" 00:37:07 [I] Log message: - Nuevo nivel de logging CRITICAL (L_CRI) para concordar con python. - Mínimo cambio en el formato de logging. - Cambio de sección de configuración de DB_DataObject a DBO. 00:37:07 [I] 110 pending changesets in state file 00:37:07 [C] Upstream change application failed Configuration error: 'ascii' codec can't encode character u'\xed' in position 216: ordinal not in range(128): it seems that current encoding "UTF-8" cannot properly represent at least one of the characters in the upstream changelog. You need to use a wider character set, using "encoding" option.
My locale is UTF-8, but I even used the encoding option with no results. The weir thing is other non-ascii characters seems to work fine (á, é, ó, ú). When I use the svndump as the source, I've got no errors, but 'í' characters are not encoded properly:
Fri Feb 4 12:19:47 ART 2005 luca * - Nuevo nivel de logging CRITICAL (L_CRI) para concordar con python. - MÃ\adnimo cambio en el formato de logging. - Cambio de sección de configuración de DB_DataObject a DBO.
As you can see, ó in configuración is just fine, but í in Mínimo is encoded as MÃ\adnimo, which is wrong.
It's easy to reproduce the problem:
cd /tmp svnadmin create testrepo svn co file:///tmp/testrepo testwc touch testwc/test svn add testwc/test svn ci -m 'í' testwc
Now you can tailor this repository to convert it to darcs with svn as repo and you'll get the error, or 'svnadmin dump'it and use svndump as repo to get the wrong encoding.
Versions:
- Subversion: 1.2.3 (r15833)
- Darcs: 1.0.4
- Tailor: 0.9.19
Change History
comment:2 Changed 7 years ago by blindglobe@…
- Summary changed from Encoding problems when converting svn -> darcs to Encoding problems when converting svn -> darcs (or bzr).
I'm getting the same problem, but it's ae (a-umlaut), from a swiss keyboard setting. I tried setting LANG to a UTF-8 setting, but it still gives a similar error to the above (it thinks I'm trying to use an ASCII codec). So I think it's the same problem. Is there anyway to force python/tailor to use a UTF-8 codec?
Tailor 0.9.19 Debian unstable (upgraded today)
repository: https://svn.r-project.org/ESS/trunk/ revision: 1641 is the bad one...
(I'm converting from SVN to BZR).
comment:3 Changed 7 years ago by lele
Uhm, so maybe it's still that old feature of Subversion of not escaping properly its own XML output that led to the filter-badchars (see option first option).
Could you try to enable such option, and report back the result?
comment:5 Changed 7 years ago by lele
- Status changed from new to closed
- Resolution set to worksforme
I'm closing this, as the following config file works for me
[DEFAULT] encoding=utf-8 [project] source = svn:source target = darcs:target root-directory = /tmp/test#28 start-revision = 1640 [svn:source] repository=https://svn.r-project.org/ESS module=/trunk [darcs:target]

Uhm, I did a quick test, and everything is working smooth here. With this configuration
I obtained a darcs repository where the following happens:
My environment says:
So, it must be something in your setup: I'll do whatever needed to make it easier spotting this kind of problems, that are the most annoying misfeature of the millenium. In particular I'd like to understand what causes the selection of the ascii codec instead of the utf8 one...