Bug 157929 - Lost PUA chars with xlsx/ods formats
Summary: Lost PUA chars with xlsx/ods formats
Status: NEEDINFO
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.6.2.1 release
Hardware: x86-64 (AMD64) Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-10-26 08:57 UTC by yukiguo
Modified: 2024-04-30 03:14 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
PUAtest.xls (137.00 KB, application/octet-stream)
2023-10-26 09:01 UTC, yukiguo
Details
PUAtest2.png (166.14 KB, image/png)
2023-10-26 09:05 UTC, yukiguo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description yukiguo 2023-10-26 08:57:04 UTC
Description:
Calc opening/saving as xlsx/ods format will lose PUA characters.

Unicode PUA zone has 3 area:

    U+00E000-U+00F8FF Private Use Area
    U+0F0000-U+0FFFFF Supplementary Private Use Area-A
    U+100000-U+10FFFF Supplementary Private Use Area-B

but from original file saved as ODS, the PUA chars will be removed. (after opening file)

I lost a lot of data because of this issue

The same problem occurs when accessing XLSX files

evo: Windows10/LibreOffice 7.6.2.1 x64

Steps to Reproduce:
1. Open the xls/xlsx file containing PUA characters.
2. Save as ods/xlsx format file.
3. Close all files with Calc windows.
4. Open the newly saved ods/xlsx file.
5. All PUA characters are deleted and cannot be recovered.

Actual Results:
PUA characters are deleted.

Expected Results:
should be keep the original characters.


Reproducible: Always


User Profile Reset: Yes

Additional Info:
test file:
https://ask.libreoffice.org/t/lost-pua-chars-with-xlsx-ods-formats/97477
Comment 1 yukiguo 2023-10-26 09:01:36 UTC
Created attachment 190434 [details]
PUAtest.xls

PUAtest.xls is a file contain PUA chars.
Comment 2 yukiguo 2023-10-26 09:05:46 UTC
Created attachment 190435 [details]
PUAtest2.png

PUAtest2.png is the xls format convert to ods format result.
Comment 3 ajlittoz 2023-10-26 11:51:42 UTC
Before concluding that something is wrong in Calc, it is necessary to check what is in the test file.

From experiment (using Alt-X), problematic cells contain a pair of Unicode codepoints taken from the Surrogate block, but order in these pairs is incorrect. First codepoint is low surrogate instead of high.

When members of the pair are switched, the surrogate pair is recognised as such and Calc displays an X-crossed rectangle (missing glyph in font) in my 7.5.7.1 under Fedora 38, KDE Plasma desktop.

OP claims the characters are taken from a PUA block but decoding the surrogate pairs (at least in A9 and A10) shows they are somewhere in Plane 2.

I'd first suspect an incorrect designation for the intended characters.

More information in needed about the intended characters.
Comment 4 Buovjaga 2023-11-01 18:04:03 UTC
(In reply to ajlittoz from comment #3)
> Before concluding that something is wrong in Calc, it is necessary to check
> what is in the test file.
> 
> From experiment (using Alt-X), problematic cells contain a pair of Unicode
> codepoints taken from the Surrogate block, but order in these pairs is
> incorrect. First codepoint is low surrogate instead of high.
> 
> When members of the pair are switched, the surrogate pair is recognised as
> such and Calc displays an X-crossed rectangle (missing glyph in font) in my
> 7.5.7.1 under Fedora 38, KDE Plasma desktop.
> 
> OP claims the characters are taken from a PUA block but decoding the
> surrogate pairs (at least in A9 and A10) shows they are somewhere in Plane 2.
> 
> I'd first suspect an incorrect designation for the intended characters.
> 
> More information in needed about the intended characters.

NEEDINFO while we wait for the reporter to respond.
Comment 5 QA Administrators 2024-04-30 03:14:21 UTC
Dear yukiguo,

This bug has been in NEEDINFO status with no change for at least
6 months. Please provide the requested information as soon as
possible and mark the bug as UNCONFIRMED. Due to regular bug
tracker maintenance, if the bug is still in NEEDINFO status with
no change in 30 days the QA team will close the bug as INSUFFICIENTDATA
due to lack of needed information.

For more information about our NEEDINFO policy please read the
wiki located here:
https://wiki.documentfoundation.org/QA/Bugzilla/Fields/Status/NEEDINFO

If you have already provided the requested information, please
mark the bug as UNCONFIRMED so that the QA team knows that the
bug is ready to be confirmed.
 
Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-NeedInfo-Ping