Maniphest T73406

Navigating 3D Viewport while Armature is selected, causes AMDGPU crash
Closed, Resolved

Assigned To
Richard Antalik (ISS)
Authored By
Fureloka
Jan 26 2020, 3:01 PM
Tags
  • BF Blender
Subscribers
Fureloka
Germano Cavalcante (mano-wii)
Richard Antalik (ISS)

Description

System Information
Operating system: Linux-5.5.0-rc7-x86_64-AMD_Ryzen_5_2600X_Six-Core_Processor-with-gentoo-2.6 64 Bits
Graphics card: AMD Radeon RX 5700 XT (NAVI10, DRM 3.36.0, 5.5.0-rc7, LLVM 11.0.0) X.Org 4.6 (Core Profile) Mesa 20.0.0-devel (git-4d03e53127)

Blender Version
Broken: version: 2.82 (sub 6), branch: master, commit date: 2020-01-24 17:48, hash: rBfc1f5bded46a
Worked: version 2.81a (also tested 2.82 somewhere between 2020-01-07 and 2020-01-21)

Short description of error
Navigating the 3D Viewport while the Armature is selected, will cause the AMDGPU driver to crash. Full system reset becomes necessary.
I've been unable to reproduce said crash with other scenes.

Exact steps for others to reproduce the error

  • Open attached .blend file
  • Rotate the 3D Viewport
  • Enjoy artifacts!

Since this is a driver crash, I've also submitted a issue over at mesa: https://gitlab.freedesktop.org/mesa/mesa/issues/2426

Event Timeline

Fureloka created this task.Jan 26 2020, 3:01 PM
Richard Antalik (ISS) added a subscriber: Richard Antalik (ISS).Jan 27 2020, 4:36 PM

seems like same issue as T71458: System hangs/crashes randomly when entering/exiting edit mode.

Not sure if situation with driver support has changed

Fureloka added a comment.Jan 27 2020, 6:27 PM

A slight difference, I'm not using the AMD Pro drivers, but the mesa drivers only.
That said, I'm not experiencing any crashes while entering/exiting edit mode in general. It's only said Armature, with the recent 2.82 build, 2.81a works wonders and I can mess around as much as I want without any crashes.

Unfortunately to my knowledge, older builds of Blender 2.82 aren't available, else I would have performed a binary search to figure out where it started misbehaving.
I guess I could try compiling from source and figure it out that way, will take a while though. I'll report back if I find anything interesting.

Germano Cavalcante (mano-wii) added a subscriber: Germano Cavalcante (mano-wii).Jan 27 2020, 7:14 PM

I can't reproduce the problem in my end.

Operating system: Windows-10-10.0.18941 64 Bits
Graphics card: Radeon (TM) RX 480 Graphics ATI Technologies Inc. 4.5.13559 Core Profile Context 26.20.12028.2

So really it seems to be a specific driver problem.
I don't think we can do much without being able to reproduce it.

Fureloka added a comment.Jan 28 2020, 5:31 AM

Managed to find the commit making the radeonsi driver hang, which is 9516921c05bd9fee5c94942eb8e38f47ba7e4351.
Commits after this will result in a build that hangs, commits before this will result in a build that works.

I'm unfortunately not familiar with the source code of Blender, so this is sadly as far as I can personally take it.
Though I'll gladly do whatever I can to help further.

Germano Cavalcante (mano-wii) added a comment.Jan 28 2020, 12:04 PM

@Fureloka, thanks for bisect!
Can you provide a backlog indicating where crashes?
It could be very useful as well.

Germano Cavalcante (mano-wii) changed the task status from Needs Triage to Needs Information from Developers.Jan 29 2020, 6:33 PM
Fureloka added a comment.Jan 29 2020, 7:53 PM

Sorry for the late update!

Unfortunately a backlog(backtrace?) isn't possible, it hangs at GPU level (e.g the entire system goes into a halt).
That said, Pierre-Eric Pelloux-Prayer over at AMD was able to reproduce the hang, and fixed it with a radeonsi patch.

Since this is technically solved (though not yet committed to mesa master), I'm not sure what you guys wish to do?

By the way, this only seems to effect gfx10 GPU's (RX 5000+), since it happens within the NGG culling part of the radeonsi driver.

Richard Antalik (ISS) closed this task as Resolved.Feb 12 2020, 6:53 PM
Richard Antalik (ISS) claimed this task.

Looking at https://gitlab.freedesktop.org/mesa/mesa/issues/2426#note_397006

Seems that this issue has been resolved, so I will close this report.