Maniphest T71273

Unicode Block Not Working on Windows
Closed, Resolved

Assigned To
Germano Cavalcante (mano-wii)
Authored By
Mehmet Oguz Derin (mehmetoguzderin)
Nov 1 2019, 7:44 PM
Tags
  • BF Blender
  • Modeling
Subscribers
Bastien Montagne (mont29)
Dalai Felinto (dfelinto)
Germano Cavalcante (mano-wii)
Mehmet Oguz Derin (mehmetoguzderin)
Milan Jaros (jar091)
Ray Molenkamp (LazyDodo)

Description

System Information
Operating system: Windows 10

Blender Version
Broken: 2.80 Release
Worked: Never

Short description of error
Text objects do not display Old Turkic Unicode block (Old Turkic, U+10C00 - U+10C4F) when using Segoe UI Historic or Google Noto Sans Old Turkic.

Exact steps for others to reproduce the error

Open:

Or follow the steps:

  1. Create a text object and name it "Text"
  2. Load either ttf or otf version of either Segoe UI Historic or Google Noto Sans Old Turkic
  3. Run the following code
import bpy
bpy.data.objects["Text"].data.body = u'\U00010c45'
  1. The result is a box instead of the desired ? character

Expected result (obtained on Linux):

Developer Notes
The code assumes wchar_t is 32 bit, which is not the case on Windows. See comments for more investigation details.

Revisions and Commits

rB Blender
D6198

Event Timeline

Mehmet Oguz Derin (mehmetoguzderin) created this task.Nov 1 2019, 7:44 PM
Mehmet Oguz Derin (mehmetoguzderin) updated the task description.
Mehmet Oguz Derin (mehmetoguzderin) updated the task description.
Mehmet Oguz Derin (mehmetoguzderin) updated the task description.
Germano Cavalcante (mano-wii) added subscribers: Bastien Montagne (mont29), Germano Cavalcante (mano-wii).Nov 4 2019, 7:50 PM

Analyzing the code I realize that, at least in windows, the supported character encoding is utf-16.
u'\U00010c45' is from a 32-bit hex value that is only supported by utf-32.
A possible solution would be to change the value of the built-in type wchar_t to 4-byte.

If I'm not mistaken this is the size of wchar_t in linux.

I think this is a known limitation.
@Bastien Montagne (mont29), what do you think?

Ray Molenkamp (LazyDodo) added a subscriber: Ray Molenkamp (LazyDodo).Nov 4 2019, 8:02 PM

sizeof(wchar_t) is implementation dependent and cannot be changed for some implementations, C11 added char16_t and char32_t types for if you have size requirements.

Ray Molenkamp (LazyDodo) added a comment.Nov 5 2019, 8:33 PM

Yeahhh this is a mess....

Debugged a little into this, the input to vfont_to_curve is correctly UTF8, however, it then feeds this to BLI_strncpy_wchar_from_utf8 which internally assumes wchar_t is 32 bits which is decisively not the case on windows (16 bits there, cannot be changed. specs say implementation is free to do whatever with wchar_t) so the upper 16 bits get lost and you and up with the wrong codepoint.

Only way to fix this is to replace the parts of the codebase where the assumption is made that wchar_t is 32 bit to use uint32_t instead (I really would have preferred char32_t but msvc does not support this type in c mode)

Dalai Felinto (dfelinto) added a subscriber: Dalai Felinto (dfelinto).Nov 5 2019, 10:20 PM

Here (Ubuntu Linux, official Blender 2.80) it works fine:

Ray Molenkamp (LazyDodo) added a comment.Nov 5 2019, 10:23 PM

wchar_t is 32 bit on gcc, so yeah you wouldn't see the issue there.

Dalai Felinto (dfelinto) renamed this task from Unicode Block Not Working to Unicode Block Not Working on Windows.Nov 5 2019, 10:31 PM
Dalai Felinto (dfelinto) lowered the priority of this task from 90 to 50.
Dalai Felinto (dfelinto) updated the task description.
Dalai Felinto (dfelinto) added a project: Modeling.
Germano Cavalcante (mano-wii) changed the task status from Unknown Status to Resolved by committing rB177dfc6384b9: Fix T71273: Bad encoding of utf-8 for Text objects.Nov 22 2019, 4:28 PM
Germano Cavalcante (mano-wii) claimed this task.
Germano Cavalcante (mano-wii) added a commit: rB177dfc6384b9: Fix T71273: Bad encoding of utf-8 for Text objects.
Milan Jaros (jar091) added a subscriber: Milan Jaros (jar091).Feb 17 2020, 4:56 PM

I have problem with building of this patch on Linux (CentOS) with GCC/4.9 or GCC/7.1 with blender_lite.cmake. I have got the error message: "uchar.h: No such file or directory" in BLI_sys_types.h and in wcwidth.h. After fix it I have got the next error message: "unknown type name size_t" in wcwidth.h and wcwidth.c. After changes on my site in BLI_sys_types.h (uchar.h -> add typedef like for macos), wcwidth.h (size_t -> unsigned long long int) and wcwidth.c (size_t -> unsigned long long int) it works fine.