This MR adds most of the changes from my GPU fork to algo, in particular:
algo
Note: This MR depends on !1152 (closed) and !1159 (merged) .