Maniphest T48422

Blender freezes in multi-threaded tasks since recent rB98123ae91680, on windows - atomic ops issue?
Closed, Resolved

Assigned To
Bastien Montagne (mont29)
Authored By
Denis Belov (dihotom)
May 13 2016, 5:29 PM
Tags
  • BF Blender
  • Platform: Windows
Subscribers
Adam Friesen (ace_dragon)
Bastien Montagne (mont29)
Campbell Barton (campbellbarton)
Cédric (Clarkx)
Denis Belov (dihotom)
Germano Cavalcante (mano-wii)
Massimiliano Puliero (mmaaxx)
Vertex (VertexPainter)

Description

System Information
Win 7x64, Nvidia GTX 580

Blender Version
broken: blender-2.77.0-git.b72aef9-AMD64
Working: blender-2.77.0-git.898d040-AMD64

Blender freezes while trying to do vertex snapping with subdivision modifier enabled both in object and edit mode.

  1. Create Suzanne
  2. Add Subsurface modifier, duplicate mesh
  3. Try to manipulate meshes with vertex snapping enabled

Sometimes i can reproduce this instantly, sometime not, so try open attached file, it freezes all the time.

Revisions and Commits

rB Blender

Related Objects

Mentioned In
rB93240ba79888: Fix an error in new lockfree parallel_range_next_iter_get() helper.
T48448: insert edge make blender freeze in some cases
T48445: Blender crashes when move or edit a mask object
T48432: Blender locks up almost instantly when starting to sculpt on a multires object.
T48437: Blender hangs when entering in Sculpt Mode
rBa83bc4f59707: Fix an error in new lockfree parallel_range_next_iter_get() helper.
Mentioned Here
T48437: Blender hangs when entering in Sculpt Mode
rB98123ae91680: BLI_task: nano-optimizations to BLI_task_parallel_range feature.
rBb72aef92c4fc: install_deps: Avoid conflicts on Arch-based systems when gcc-multilib is…

Event Timeline

Denis Belov (dihotom) created this task.May 13 2016, 5:29 PM
Denis Belov (dihotom) raised the priority of this task from to 90.
Denis Belov (dihotom) updated the task description.
Denis Belov (dihotom) added a project: BF Blender.
Denis Belov (dihotom) edited a custom field.
Denis Belov (dihotom) added a subscriber: Denis Belov (dihotom).
Campbell Barton (campbellbarton) lowered the priority of this task from 90 to 50.May 13 2016, 8:39 PM
Campbell Barton (campbellbarton) added a subscriber: Campbell Barton (campbellbarton).

This is caused by rB98123ae91680289255f5fa6cf6ae0ff6dcba251b

Bastien Montagne (mont29) added a subscriber: Bastien Montagne (mont29).May 14 2016, 10:15 AM

Cannot reproduce that here… @Campbell Barton (campbellbarton) you just opened the file and did some snapped transform in Object or Edit mode, and got the freeze?

Anyway, if this commit causes issues, it can be reverted, gave nearly no speed gain anyway…

Germano Cavalcante (mano-wii) added a subscriber: Germano Cavalcante (mano-wii).EditedMay 14 2016, 4:01 PM

I don't like to revert optimizations (no matter how small) :(
@Bastien Montagne (mont29) if the first time has not frozen, try again, there are times when it works.
(Just make the snap to vertices in object mode. Edit mode also freezes)

Bastien Montagne (mont29) mentioned this in rBa83bc4f59707: Fix an error in new lockfree parallel_range_next_iter_get() helper..May 14 2016, 6:06 PM
Bastien Montagne (mont29) added a comment.May 14 2016, 6:07 PM

I tried it several times of course, with both release and debug builds. Am on linux though, not sure on which OS Campbell reproduced it.

I do found an error in new code that could create issues, committed a fix, please give it a try. :)

Germano Cavalcante (mano-wii) added a comment.May 14 2016, 7:45 PM

ops, saying to "try again" I meant close and open Blender (but you also must have already tried this way).

At first, it seemed that was fixed. But the problem came back on the second try :( (no fix)

Still cannot reproduce at all…

Might be related to T48437, can you please try and see if you can reproduce it?

Germano Cavalcante (mano-wii) added a comment.May 15 2016, 4:08 PM

Yes I can reproduce it.
And the race condition also occurs in the loop "while (UNLIKELY(previter != olditer)".

I found a strange thing - after the end of the loop execution, suddenly, out of nowhere, it runs again without passing by the expected sequence of the function.

(I do not understand this atomic thing however)

Bastien Montagne (mont29) added a comment.May 15 2016, 4:28 PM

Grumph… atomic means the operation is done in 'a single step' from CPU point of view, i.e. you cannot have thread 1 start an atomic operation, then thread 2 modify one of its operands, then thread 1 finish the operation.

Those atomic ops are implemented in all modern CPUs, and are much cheaper than using regular thread synchronization primitives like mutex or spinlock.

That looping func is a way to perform an operation that does not exists in atomic primitives, idea is to:

  1. read current value of the shared data we want to modify (32bit data, reading is assumed atomic, i.e. you cannot read part of the value, then get it changed by another thread, then read the remaining part).
  2. do the operation and store its value in a local variable (this can take any amount of time, since it only uses local or read-only variables).
  3. do an atomic CAS to set the shared variable we want to modify.
  4. Repeat as long a value returned by CAS is not the same as the one we stored at the beginning (meaning the shared variable has been modified by another thread in-between).

The atomic CAS (compare and swap) compares the value of the data to modify with a given 'reference', only sets the former with the new value if it equals to the reference value, and then return the old value of modified data.

So in theory, this is perfectly safe and no deadlock should happen. Actually, there is no actual deadlock possible here, since there is no lock - I’d rather think of an inifite loop due to something messed up in msvc version of our atomic primitives.

I would suspect some stupid conversion mismatch between signed and unsigned integers (though afaik uint32 < INT_MAX should not be an issue here :| ).
Can you please try to replace line 76 of intern/atomic/intern/atomic_ops_msvc.h file with that one, and check again?

	return InterlockedCompareExchange((long *)v, *(long *)(&_new), *(long *)(&old));
Germano Cavalcante (mano-wii) added a comment.May 15 2016, 5:04 PM

(I'm still trying to understand all the explanation ...)

However did the change you requested (line 76 of atomic_ops_msvc.h), and the problem persists :(

Bastien Montagne (mont29) added a comment.May 15 2016, 6:00 PM

Guess I’ll have to go and debug this myself on my win VM (provided blender still runs on it), looks like our win atomics is broken somehow (unless I miss something else, maybe we'd need some kind of memory fence here, not sure why or where though)…

Bastien Montagne (mont29) renamed this task from Blender freezes while trying to do vertex snapping with subdivision modifier enabled. to Blender freezes in multi-threaded tasks since recent rB98123ae91680, on windows - atomic ops issue?.May 15 2016, 8:40 PM
Bastien Montagne (mont29) claimed this task.
Bastien Montagne (mont29) added a project: Platform: Windows.
Bastien Montagne (mont29) added subscribers: Vertex (VertexPainter), Adam Friesen (ace_dragon).
Bastien Montagne (mont29) added subscribers: Cédric (Clarkx), Massimiliano Puliero (mmaaxx).
Bastien Montagne (mont29) changed the task status from Unknown Status to Resolved by committing rBbb7da630bacf: Fix T48422: Revert "BLI_task: nano-optimizations to BLI_task_parallel_range….May 15 2016, 9:15 PM
Bastien Montagne (mont29) added a commit: rBbb7da630bacf: Fix T48422: Revert "BLI_task: nano-optimizations to BLI_task_parallel_range….
Jeroen Bakker (jbakker) mentioned this in rB93240ba79888: Fix an error in new lockfree parallel_range_next_iter_get() helper..Jun 8 2016, 9:47 PM