Probably the most directly relevant recent paper for censoring task fMRI datasets, and the one whose recommendations we're broadly following, is Siegel, et. al (2014). They explored the effects of censoring on three datasets at various FD thresholds. As is reasonable, given the great variations in experiments, they refrain from making "universal recommendations", but do provide useful summaries and guidelines.
As in most things, there's no free lunch with censoring: increasing the amount of censoring reduces the number of trials available for response estimation, but hopefully lets those estimates be more reliable. Siegel, et. al (2014) found that a threshold of FD > 0.9 did well in many cases, and generally suggest fairly light censoring - removing the highest-motion frames, not every frame with any movement (see the Discussion section, page 1994). Further, they suggest removing only the above-threshold frames, not adjacent frames (page 1992):
"In a one-factor ANOVA, the FD > 0.9, (f0, b0) mask produced significantly higher zscores than all of the other masks except FD > 1.1 mm (f0,b1) which was not significantly different. On the basis of these results, we do not recommend removing volumes proceeding or following high-motion volumes ..."Siegel, et. al (2014) didn't attempt to interpolate censored frames, citing the difficulty in accurately interpolating gaps of more than one TR. This strikes me as reasonable, particularly in task designs, where, depending on the analysis, it may be best to simply omit trials with above-threshold movement.
Setting the censoring threshold for any particular study is at least partially subjective, which is unfortunate, given the already-too-many experimenter degrees of freedom. We decided to see if the FD > 0.9 threshold suggested by Siegel, et. al (2014) seemed reasonable: did it capture rare spiky motion, but not oscillations? What percentage of frames were censored? This effort is what let to the images in the previous post: I marked the censored frames on plots of each run's motion, and we judged whether the marked frames seemed reasonable. In our case, no run had more than 12% of the frames censored, and most had less than 2%, so we decided to proceed with the FD > 0.9 threshold.
Looking at papers citing Siegel, et. al (2014), I found one using FD > 0.9 for censoring (Davis, Goldwater, & Giron, 2017), one with FD > 0.8 (O'Hearn, et. al, 2016), and one with FD > 0.5 (Bakkour et. al, 2017). Others mention censoring for motion, but without giving details, and I've heard people mention censoring based on standard deviations of the estimates within the particular dataset. Censoring based on enorm values is pretty similar to the FD used by Siegel, though afni tends to recommend a smaller threshold, such as 0.3, for adult task fMRI. I don't have time to compile a summary of common enorm-based thresholds, but would be interested if someone else finds or creates one!
A final consideration is whether to use only censoring, or censoring plus having the motion estimates as nuisance regressors in the GLM. As summarized in Siegel, et. al (2014), page 1992:
"Motion censoring generally outperformed motion regressions. Censoring at FD > 0.9 mm performed significantly better than the best regression .... To see whether a combination of censoring and regression might most benefit the data, a GLM was created using the default censoring settings and regressions of the derivatives of realignment estimates. The changes in z-score produced by this GLM were not significantly different from censoring alone ..."We are currently including 6 motion regressors in the (3dREMLfit) GLMs, plus omitting the censored frames. We might need to reevaluate that choice at some point; we did a bit of looking at including more regressors, but haven't previously considered using only censoring.