Authors
Maier O, Baete SH, Fyrdahl A, Hammernik K, Harrevelt S, Kasper L, Karakuzu A, Loecher M, Patzig F, Tian Y, Wang K, Gallichan D, Uecker M, Knoll F
Journal
Magnetic Resonance in Medicine
Citation
Magn Reson Med. 2020 Nov 12.
Abstract
Purpose: The aim of this work is to shed light on the issue of reproducibility in MR image reconstruction in the context of a challenge. Participants had to recreate the results of “Advances in sensitivity encoding with arbitrary k-space trajectories” by Pruessmann et al.
Methods: The task of the challenge was to reconstruct radially acquired multicoil k-space data (brain/heart) following the method in the original paper, reproducing its key figures. Results were compared to consolidated reference implementations created after the challenge, accounting for the two most common programming languages used in the submissions (Matlab/Python).
Results: Visually, differences between submissions were small. Pixel-wise differences originated from image orientation, assumed field-of-view, or resolution. The reference implementations were in good agreement, both visually and in terms of image similarity metrics.
Discussion and conclusion: While the description level of the published algorithm enabled participants to reproduce CG-SENSE in general, details of the implementation varied, for example, density compensation or Tikhonov regularization. Implicit assumptions about the data lead to further differences, emphasizing the importance of sufficient metadata accompanying open datasets. Defining reproducibility quantitatively turned out to be nontrivial for this image reconstruction challenge, in the absence of ground-truth results. Typical similarity measures like NMSE of SSIM were misled by image intensity scaling and outlier pixels. Thus, to facilitate reproducibility, researchers are encouraged to publish code and data alongside the original paper. Future methodological papers on MR image reconstruction might benefit from the consolidated reference implementations of CG-SENSE presented here, as a benchmark for methods comparison.
DOI