Blender now has the ability to render using both CPU and GPU in a single pass (https://developer.blender.org/D2873). However, cpus desire smaller tile sizes (32x32) and gpus prefer larger (256x256). On several machines I've tested, it's often the case that the mixed mode of rendering yields to no benefit to using either pure cpu or gpu (on the same machine) .I'd like to propose a major optimization where the tile size in a single render pass are different between cpu and gpu.
One method could be to divide the rendering area first into larger tiles (like 256x256) that would be optimal for the GPU to render each tile. However, tiles can become owned by the cpu renderer, which would subdivide these large blocks into subdivisions (32x32 or smaller) to process.