Personal Video Database

English => Support => Topic started by: svenne on June 07, 2010, 12:36:48 am

Title: Bug: Export templates and UTF8BOM
Post by: svenne on June 07, 2010, 12:36:48 am
This is about export templates and the charset encoding UTF8BOM, PVD v0.9.9.21:
It seems to me that something goes wrong when exporting with:
encoding="UTF8BOM"
Every template using UTF8BOM generates three characters (the byte order mark) at the beginning of the exported file. These should be invisible, but every program I tested does not interpret them. They are displayed instead. Special characters within the text are broken. As an example: "ü" will be "ü". Notepad (unicode capable, WinXP) reports the files to be ANSI-encoded instead of UTF-8. Different files (not generated with PVD) saved as UTF8 with and without byte order mark are handled correctly by Notepad.
Title: Re: Bug: Export templates and UTF8BOM
Post by: buah on June 07, 2010, 01:11:18 am
Another unicode issue?
Title: Re: Bug: Export templates and UTF8BOM
Post by: rick.ca on June 07, 2010, 01:39:27 am
Quote
Every template using UTF8BOM generates three characters (the byte order mark) at the beginning of the exported file.

So that's why I was getting "?»?" at the beginning of my files. I'm glad I'm not alone. :-\
Title: Re: Bug: Export templates and UTF8BOM
Post by: nostra on July 05, 2010, 10:41:10 pm
Strange, but really all apps I tried did interpreted the BOM.
You are using Windows 7 as well, Rick, where do you see these characters?
Title: Re: Bug: Export templates and UTF8BOM
Post by: buah on July 05, 2010, 11:18:50 pm
Rick, where do you see these characters?

I believe in html files edited with notepad, or some other "pad". I can confirm appearance of those characters unrelated to PVD but with earlier unicode issues in html files exported from "WhereIsIt?"
Title: Re: Bug: Export templates and UTF8BOM
Post by: nostra on July 05, 2010, 11:33:28 pm
Unfortunately I can't reproduce the problem neither with notepad nor with Notepad++
Title: Re: Bug: Export templates and UTF8BOM
Post by: rick.ca on July 06, 2010, 12:04:19 am
You are using Windows 7 as well, Rick, where do you see these characters?

Windows 7. If, for example, I export using the "plain list" template and open the result in Notepad++, the first three characters displayed are ?»?.
Title: Re: Bug: Export templates and UTF8BOM
Post by: nostra on July 06, 2010, 12:16:17 am
Hmm, I do not know what am I doing wrong, but it does not work for me :(
Title: Re: Bug: Export templates and UTF8BOM
Post by: Ivek23 on July 06, 2010, 05:30:01 am
Quote
Every template using UTF8BOM generates three characters (the byte order mark) at the beginning of the exported file.

So that's why I was getting "?»?" at the beginning of my files. I'm glad I'm not alone. :-\
With me is the same as using in XP Pro SP 3.
Title: Re: Bug: Export templates and UTF8BOM
Post by: svenne on July 07, 2010, 01:03:24 pm
Quote
Unfortunately I can't reproduce the problem
Just a guess: perhaps something is different with your database that has an effect on this issue? Did you try it with a newly generated database, too?
Title: Re: Bug: Export templates and UTF8BOM
Post by: nostra on July 07, 2010, 03:38:35 pm
Quote
Unfortunately I can't reproduce the problem
Just a guess: perhaps something is different with your database that has an effect on this issue? Did you try it with a newly generated database, too?

I have tried it with a new database with the same result. I think I must try it on a Win XP machiene...
Title: Re: Bug: Export templates and UTF8BOM
Post by: mgpw4me@yahoo.com on July 07, 2010, 05:14:42 pm
Vista also shows the header, regardless of the app (ie. notepad++).
Title: Re: Bug: Export templates and UTF8BOM
Post by: svenne on July 08, 2010, 12:18:36 am
Ok, to narrow it down... the problem seems to be: all files exported by PVD with utf8bom (at least on my computer with WinXP) start with the following three bytes:
3F BB 3F
Displayed as ISO 8851-1 (or ANSI) this is: ?»?

But the correct BOM for UTF8 should be:
EF BB BF
Displayed as ISO 8851-1 (or ANSI) this is: 
This one works fine and is recognized as UTF8.

I attached a zip containing a text file exported with PVD (template: "Plain list.ptm") using UTF8BOM (called "file as exported by PVD.txt") and a second file where I (manually) changed the first three bytes to EF BB BF ("same file with BOM corrected manually.txt")...

Plain list.ptm:
Code: [Select]
%OPTIONS%
filter="Text Files|*.txt"
encoding="UTF8BOM"
%OPTIONS%{%value=203}. {%value=title} / {%value=origtitle} ({%value=year}) [{%value=genre}]

Quite often programs change characters to "?" when the character that is exported does not fit the encoding. Might be a hint, but I'm wondering if it makes any sense, because ï and ¿ are both present in the ANSI character table.


[attachment deleted by admin]
Title: Re: Bug: Export templates and UTF8BOM
Post by: daddydave on August 11, 2010, 01:16:17 am
I have this problem as well and was able to reproduce svenne's troubleshooting. Is the fix on the to-do list for Version 1?
Title: Re: Bug: Export templates and UTF8BOM
Post by: nostra on August 11, 2010, 06:37:16 pm
A fix will be available soon