Maniphest T93425

(Again) makesdna crashes during the build with LTO on s390x architecture (Linux) :
Closed, Resolved

Assigned To
Campbell Barton (campbellbarton)
Authored By
Mamoru TASAKA (mtasaka)
Nov 27 2021, 3:21 PM
Tags
  • BF Blender
Subscribers
Campbell Barton (campbellbarton)
Mamoru TASAKA (mtasaka)
Pratik Borhade (PratikPB2123)
Richard Antalik (ISS)

Description

As the original bug https://developer.blender.org/T80639 is already closed, creating new bug - but this time detailed analysis is added.

System Information
Operating system: Linux (Fedora)
Graphics card: N/A

Blender Version
Broken: 2.90 and master
Worked: without link time optimization (LTO)

Short description of error
makesrna crashes during the build with enabled LTO. See https://bugzilla.redhat.com/show_bug.cgi?id=1874398#c6
Linux distribution like Fedora have enabled LTO by default (https://fedoraproject.org/wiki/LTOByDefault) exposing the failure.

Exact steps for others to reproduce the error
Yes, it does happen with blender 2.90 as well. I believe this is another case when enabled LTO reveals some real bug in the source code.

build output

[ 82%] Built target makesrna
make  -f source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/build.make source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/depend
make[2]: Entering directory '/builddir/build/BUILD/blender-2.90.0/s390x-redhat-linux-gnu'
[ 82%] Generating rna_ID_gen.c, rna_action_gen.c, rna_animation_gen.c, rna_animviz_gen.c, rna_armature_gen.c, rna_boid_gen.c, rna_brush_gen.c, rna_cachefile_gen.c, rna_camera_gen.c, rna_cloth_gen.c, rna_collection_gen.c, rna_color_gen.c, rna_constraint_gen.c, rna_context_gen.c, rna_curve_gen.c, rna_curveprofile_gen.c, rna_depsgraph_gen.c, rna_dynamicpaint_gen.c, rna_fcurve_gen.c, rna_fluid_gen.c, rna_gpencil_gen.c, rna_gpencil_modifier_gen.c, rna_image_gen.c, rna_key_gen.c, rna_lattice_gen.c, rna_layer_gen.c, rna_light_gen.c, rna_lightprobe_gen.c, rna_linestyle_gen.c, rna_main_gen.c, rna_mask_gen.c, rna_material_gen.c, rna_mesh_gen.c, rna_meta_gen.c, rna_modifier_gen.c, rna_movieclip_gen.c, rna_nla_gen.c, rna_nodetree_gen.c, rna_object_gen.c, rna_object_force_gen.c, rna_packedfile_gen.c, rna_palette_gen.c, rna_particle_gen.c, rna_pose_gen.c, rna_render_gen.c, rna_rigidbody_gen.c, rna_rna_gen.c, rna_scene_gen.c, rna_screen_gen.c, rna_sculpt_paint_gen.c, rna_sequencer_gen.c, rna_shader_fx_gen.c, rna_sound_gen.c, rna_space_gen.c, rna_speaker_gen.c, rna_test_gen.c, rna_text_gen.c, rna_texture_gen.c, rna_timeline_gen.c, rna_tracking_gen.c, rna_ui_gen.c, rna_userdef_gen.c, rna_vfont_gen.c, rna_volume_gen.c, rna_wm_gen.c, rna_wm_gizmo_gen.c, rna_workspace_gen.c, rna_world_gen.c, rna_xr_gen.c, rna_prototypes_gen.h
cd /builddir/build/BUILD/blender-2.90.0/s390x-redhat-linux-gnu/source/blender/makesrna/intern && ../../../../bin/makesrna /builddir/build/BUILD/blender-2.90.0/s390x-redhat-linux-gnu/source/blender/makesrna/intern/
Attempt to free NULL pointer
make[2]: *** [source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/build.make:84: source/blender/makesrna/intern/rna_ID_gen.c] Aborted (core dumped)
make[2]: Leaving directory '/builddir/build/BUILD/blender-2.90.0/s390x-redhat-linux-gnu'
make[1]: *** [CMakeFiles/Makefile2:5901: source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/all] Error 2

running under gdb gives:

<mock-chroot> sh-5.0# gdb ../../../../bin/makesrna
GNU gdb (GDB) Fedora 9.2-6.fc33
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "s390x-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../../../../bin/makesrna...
(gdb) set args /builddir/build/BUILD/blender-2.90.0/s390x-redhat-linux-gnu/source/blender/makesrna/intern/
(gdb) run
Starting program: /builddir/build/BUILD/blender-2.90.0/s390x-redhat-linux-gnu/bin/makesrna /builddir/build/BUILD/blender-2.90.0/s390x-redhat-linux-gnu/source/blender/makesrna/intern/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Attempt to free NULL pointer

Program received signal SIGABRT, Aborted.
0x000003fffda4ab86 in raise () from /lib64/libc.so.6
(gdb) where
#0  0x000003fffda4ab86 in raise () from /lib64/libc.so.6
#1  0x000003fffda2b808 in abort () from /lib64/libc.so.6
#2  0x000002aa0018f5fa in MEM_lockfree_freeN (vmemh=<optimized out>) at /builddir/build/BUILD/blender-2.90.0/intern/guardedalloc/intern/mallocn_lockfree_impl.c:114
#3  MEM_lockfree_freeN (vmemh=0x0) at /builddir/build/BUILD/blender-2.90.0/intern/guardedalloc/intern/mallocn_lockfree_impl.c:102
#4  0x000002aa00190a6c in DNA_sdna_free (sdna=0x2aa002e9808) at /builddir/build/BUILD/blender-2.90.0/source/blender/makesdna/intern/dna_genfile.c:146
#5  0x000002aa0005f67e in DNA_sdna_from_data (data=<optimized out>, do_endian_swap=false, data_alloc=false, r_error_message=<synthetic pointer>, data_len=103476)
    at /builddir/build/BUILD/blender-2.90.0/source/blender/makesdna/intern/dna_genfile.c:335
#6  RNA_create () at /builddir/build/BUILD/blender-2.90.0/source/blender/makesrna/intern/rna_define.c:708
#7  0x000002aa0005ab78 in rna_preprocess (outfile=0x3fffffff47c "/builddir/build/BUILD/blender-2.90.0/s390x-redhat-linux-gnu/source/blender/makesrna/intern/")
    at /builddir/build/BUILD/blender-2.90.0/source/blender/makesrna/intern/makesrna.c:5016
#8  0x000002aa0005442a in main (argc=<optimized out>, argv=0x3fffffff1a8) at /builddir/build/BUILD/blender-2.90.0/source/blender/makesrna/intern/makesrna.c:5174
(gdb)

Revisions and Commits

rB Blender

Event Timeline

Mamoru TASAKA (mtasaka) created this task.EditedNov 27 2021, 3:21 PM

So the backtrace shows that:

  • In DNA_sdna_from_data() (on source/blender/makesdna/intern/dna_genfile.c), DNA_sdna_free() is called from the function. That means that init_structDNA() (in DNA_sdna_from_data()) failed (returned 0), which is more likely an error.
  • Also, after DNA_sdna_free() is called, backtrace shows that sdna->names_array_len is nullptr and it complains that nullptr is going to be free'd, then abort() is called. This means that when init_structDNA() is executed (in DNA_sdna_from_data()), before names_array_len buffer is created, some error happened and init_structDNA() returned false.

Then I tried debugging where failure happened in init_structDNA(). Then:

  • Actually it is found that the part *r_error_message = "TLEN error in SDNA file"; *r_error_message = "TYPE error in SDNA file"; line is executed. That means that at the line *data == MAKE_ID('T', 'L', 'E', 'N') if (*data == MAKE_ID('T', 'Y', 'P', 'E')), the 'data' pointer didn't point to the expected address.
  • --- and it is found that at the line, 'data' pointer actually pointed to 2 bytes before the expected 'MAKE_ID('T', 'L', 'E', 'N')' MAKE_ID('T', 'Y', 'P', 'E') address.
  • Then I've found that on the above line cp = pad_up_4(cp); , before this line is executed, cp points to the expected address (compared to x86_64 results), but after this line is executed, on x86_64 cp is moved to the expected address, but on s390x, the pointer "cp" does not move compared before this line is executed.

So: finally, this means that pad_up_4() is not doing what is expected here. Looking at pad_up_4(), this is to round up the given address to 4-byte aligned address - so this means that the given sdna->data is at first expected to be 4-byte aligned. But actually on s390x, it is found that sdna->data is only 2 bytes aligned, but not 4 bytes aligned.
By the way,

  • This "sdna->data" is actually "DNAstr" global memory (see: RNA_create() in rna_define.c, the first argument of DNA_sdna_from_data()) and DNA_sdna_from_data() in dna_genfile.c: where the input first argument data is assigned as "sdna->data = data;").
  • Then this "DNAstr" global memory is ... defined in "dna.c" and
  • This "dna.c" is created by "makesdna" program:
DEBUG: gmake[2]: Entering directory '/builddir/build/BUILD/blender-2.93.5/redhat-linux-build'
DEBUG: [ 60%] Generating dna.c, dna_type_offsets.h, dna_verify.c
DEBUG: cd /builddir/build/BUILD/blender-2.93.5/redhat-linux-build/source/blender/makesdna/intern && ../../../../bin/makesdna /builddir/build/BUILD/blender-2.93.5/redhat-linux-build/source/blender/makesdna/intern/dna.c /builddir/build/BUILD/blender-2.93.5/redhat-linux-build/source/blender/makesdna/intern/dna_type_offsets.h /builddir/build/BUILD/blender-2.93.5/redhat-linux-build/source/blender/makesdna/intern/dna_verify.c /builddir/build/BUILD/blender-2.93.5/source/blender/makesdna/

So looking at makesdna.c: in main():

fprintf(file_dna, "extern const unsigned char DNAstr[];\n");
fprintf(file_dna, "const unsigned char DNAstr[] = {\n");

DNAstr is only defined as const unsigned char buffer. The alignment requirement for char buffer is only 1 byte on all architecture, so there is no guarantee that DNAstr is put as 4 bytes aligned - this is up to linker or so. On x86_64, it seems that DNAstr is always put on 4 bytes aligned address, but it seems on s390x + LTO (link time optimization), linker + something else seems to put DNAstr on only 2 bytes aligned address - and AFAIK we cannot complain about this.

So the correct way is perhaps force DNAstr to be put on 4 bytes-aligned address - this is toolchain-dependent method.

Mamoru TASAKA (mtasaka) added a comment.Nov 27 2021, 3:23 PM


Suggestion patch

Pratik Borhade (PratikPB2123) added a project: BF Blender.Nov 27 2021, 4:35 PM
Pratik Borhade (PratikPB2123) added a subscriber: Pratik Borhade (PratikPB2123).
Mamoru TASAKA (mtasaka) added a comment.Nov 27 2021, 4:58 PM

Oops.. actually the executed line was *r_error_message = "TYPE error in SDNA file"; , corrected.

Richard Antalik (ISS) added subscribers: Campbell Barton (campbellbarton), Richard Antalik (ISS).Feb 4 2022, 8:48 AM

@Mamoru TASAKA (mtasaka) Sorry for late answer, not sure if this is still an issue, but I would suggest to send patch via https://developer.blender.org/differential/diff/create/ (or big submit code addon on main page on this site)

CC @Campbell Barton (campbellbarton)

Campbell Barton (campbellbarton) renamed this task from (Again) makesrna crashes during the build with LTO on s390x architecture (Linux) : to (Again) makesdna crashes during the build with LTO on s390x architecture (Linux) :.Feb 4 2022, 9:45 AM
Campbell Barton (campbellbarton) closed this task as Resolved by committing rB2d429bfdf8a4: Fix T93425: makesdna crashes during build with LTO on s390x Linux.Feb 4 2022, 9:50 AM
Campbell Barton (campbellbarton) claimed this task.
Campbell Barton (campbellbarton) added a commit: rB2d429bfdf8a4: Fix T93425: makesdna crashes during build with LTO on s390x Linux.