Monday, August 3, 2015

R? python? MATLAB?

I've been asked a few times why I use R for MVPA, and what I think people just getting into MVPA should use. I don't think that there is a universally "best" package for MVPA (or neuroimaging, or statistics), but here are some musings.

The question as to why I personally started using R for MVPA is easy: I started before MVPA packages were available, so I had to write my own scripts, and I prefer scripting in R (then and now). Whether to keep using my own scripts or switch to pyMVPA (or some other package) is something I reconsider occasionally.

A very big reason to an established package is that it's a known quantity: coding bugs have hopefully been caught, and analyses can be reproduced. Some packages are more open (and have more stringent tests) than others, but in general, the more eyes that have studied the code and tried the routines, the better. This need for openness was one of my motivations for starting this blog: to post bits of code and detailed methods descriptions. I think the more code and details we share (blog, OSF, github, whatever), the better, regardless of what software we use (and I wish code was hosted by journals, but that's another issue).

I'm a very, very big fan of using R for statistical analyses, and of knitr (sweave and RMarkdown are also viable options in R) for documenting the various analyses, results, impressions, and decisions as the research progresses (see my demo here), regardless of the program that generated the raw data. My usual workflow is to switch to knitr once an analysis reaches the "what happened?" stage, regardless of the program that generated the data being analyzed (e.g., I have knitr files summarizing the motivation, procedure, and calculating results from cvMANOVA analyses run in MATLAB). Python has the iPython Notebook, which is sort of similar to knitr (I don't think as aesthetically pleasing, but that's a matter of taste); I don't think MATLAB has anything equivalent. Update 12 August 2015 (Thanks, Dov!): MATLAB comes with Publishing Markup, which (at a quick glance) looks similar to RMarkdown.

All neuroimaging (and psychology, neuroscience, ...) graduate students should expect to learn a proper statistical analysis language, by which I mostly mean R, with MATLAB and python coming in as secondary options. In practice, if you have proficiency in one of these programs you can use the others as needed (the syntax isn't that different), or have them work together (e.g., calling MATLAB routines from R; calling R functions from python). The exact same MVPA can be scripted in all three languages (e.g., read in NIfTI images, fit a linear SVM, write the results into a file), and I don't see that any one of the three languages is clearly best or worst. MATLAB has serious licensing issues (and expense); python dependencies can be a major headache, but which program is favored seems to go more with field (engineers for MATLAB, statisticians for R) and personal preference than intrinsic qualities.

So, what should a person getting started with MVPA use? I'd say an R, python, or MATLAB-based package/set of scripts, with which exact one depending on (probably most important!) what your colleagues are using, personal preference and experience (e.g, if you know python in and out, try pyMVPA), and what software you're using for image preprocessing (e.g., if SPM, try PRoNTO). Post-MVPA (and non-MVPA) investigations will likely involve R at some point (e.g., for fitting mixed models or making publication-quality graphs), since it has the most comprehensive set of functions (statisticians favor R), but that doesn't mean everything needs to be run in R.

But don't start from scratch; use existing scripts/programs/functions as much as possible. You should mostly be writing code for analysis-specific things (e.g., the cross-validation scheme, which subjects are patients, which ROIs to include), not general things (like reading NIfTI images, training a SVM, fitting a linear model). Well-validated functions exist for those more general things (e.g., oro.nifti, libsvm); use them.

5 comments:

  1. Hi Jo, nice topic as always, allow me to add how I do something similar, but via Matlab.

    Well, I think Matlab has something similar to Kniter, you could write a program to read your results and do whatever (statistical) analysis you like and graph them. However, the reason I stopped doing this is that one cannot change the graphs once they are embedded inside word, and you'll have to change the code if you want to change something. Currently, I prefer to import all the results into an excel file, then do most of the statistical analysis, e.g. correlations, t-tests, as well as the graphs. I like using MS excel since the graphs can be adjusted easily inside MS word, e.g. changing the graph style, colors, markers, font relative to/and-or size, etc. And one can at any time do further analysis if needed, by adding more results, clinical or psychological tests.

    Cheers,
    -Rawi

    ReplyDelete
    Replies
    1. I just updated the post to mention MATLAB's "Publishing Markup", which indeed looks somewhat similar to RMarkdown and knitr. I've never personally been a fan of microsoft's graphics; I find it much more convenient to create graphs in R, then paste them into Word if necessary. Perhaps you'd like R even more than Excel/Word for your post-MATLAB analyses. ;)

      Delete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete