Setting up Eclipse to use MKL-Linked R
In my previous post, I showed how to compile R 2.10.1 using Intel's Math Kernel Library for the BLAS/LAPACK interface. Even though it takes a bit of time to setup, I think the noticeably improved calculation speed justifies the effort. Although I'm happy to use R from the command line for basic stuff, I prefer to develop my code in Eclipse and there is one configuration step needed for Eclipse to talk properly to MKL-linked R (assuming you followed the instructions in the last post).
If you haven't used Eclipse before, I would strongly recommend it as an Integrated Development Environment (IDE) for R. It's free, has a large user base, and is regularly updated. The editor is very flexible and allows for multiple tabs and easy arrangement of several windows. The StatET plugin links R with Eclipse through the rJava package, thereby allowing the user to browse objects in the R environment using the object browser. You can also get auto-completion suggestions for functions in the current environment by pressing Ctrl+Space. See Jeremy Anglim's post for some useful links about StatET and Eclipse. Before proceeding, if you haven't done so already, install the rJava package in R using install.packages("rJava").
Because I compiled R as a shared library linked against Intel's MKL, Eclipse struggles to locate libraries for MKL (and the shared R library). To fix this problem, one needs to set the LD_LIBRARY_PATH environment variable to include the MKL library directory, as well as the R shared library directory. By default, Java will not look for libraries in these directories, leading to errors such as this one (assuming you've enabled debugging as described on the StatET site:
java.lang.UnsatisfiedLinkError: /usr/local/lib64/R/library/rJava/jri/libjri.so: libmkl_gf_lp64.so: cannot open shared object file: No such file or directory
java.lang.UnsatisfiedLinkError: /usr/local/lib64/R/library/rJava/jri/libjri.so: libR.so: cannot open shared object file: No such file or directory
Although some documentation suggested that I set the java.library.path variable in the VM arguments in the JRE tab of the Eclipse run configuration, this is problematic because this setting only helps to load libraries directly requested by the application, not those libraries that reference other libraries (as is the case for MKL). Anyhow, the trick is to set the LD_LIBRARY_PATH environment variable in the Environment tab of the Eclipse run configuration, as seen here:
If you've installed things to the same place as me, in the Environment tab, you'll want to click "New" to add a new environment variable, call it LD_LIBRARY_PATH, and give it a value of /opt/intel/mkl/10.2.4.032/lib/em64t;/usr/local/lib64/R/lib.
The only other thing to be done is to verify that you've setup an R environment that points to the MKL-based installation. R environments are setup within the StatET preferences: Windows > Preferences > StatET > Run/Debug > R Environments. In my case, I installed R to /usr/local/lib64/R, so I specified that as the target and named the R installation R 2.10.1 x64 MKL. Then, under the R Config tab in your R run configuration (the same one as above), make sure you select the environment you just created. If you haven't setup StatET for Eclipse before, I would strongly encourage you to read Longhow Lam's guide, which provides more details (see especially the Configuring R section).
That's basically it! The major tweak is to tell Java where to find linked libraries using the LD_LIBRARY_PATH environment variable. Happy coding!
Monday, April 12, 2010
Saturday, April 10, 2010
Compiling 64-bit R 2.10.1 with MKL in Linux
The rationale for compiling R using the Intel Math Kernel Library
Recently, there has been a surge in the use of Intel's Math Kernel Library (MKL; http://software.intel.com/en-us/intel-mkl/) among data analysis packages. MKL is a highly optimized set of linear algebra libraries that includes full Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) implementations, as well as fast Fourier transforms and vector math. I think the folk interpretation is that Intel engineers have inside knowledge on how to exploit fully the number crunching powers of Intel CPUs, thereby allowing them to produce a remarkably fast math library. REvolution Computing has developed a version of R that is linked against MKL with impressive speedups in many functions that rely on complex algebraic manipulation. Recently, Enthought Inc. has also begun to provide Python binaries linked against MKL, with similarly improved performance. And although it's not emphasized directly, Matlab links to MKL in most recent releases.
The good news for R users on Linux is that Intel provides a free license for MKL, assuming that it is used for personal, non-commercial purposes. I set out to compile R 2.10.1 from source on 64-bit Gentoo Linux, linking to the latest version of MKL (10.2.4.032). My major goal was to create a super-fast version of R to be used within the StatET plugin for Eclipse, my favorite IDE for R development. Given that the process was bumpy and took the better part of an afternoon, I thought I would post my experiences in hopes that they might be useful to others. The notes below should mostly apply to 32-bit Linux OSs, but I need 64-bit R to process some rather large psychophysiology datasets, so I'll assume you're running 64-bit Linux, too.
Getting ready
The R Installation and Administration guide is the best place to start when learning to compile R. It gives a good listing of prerequisites for installation, configuration options, suggestions for compilation, linking to BLAS and LAPACK libraries, and even good starting points for linking to MKL. I would recommend at least skimming this guide before you try to compile R. Pay particular attention to Appendix A, which details the programs and libraries that need to be present prior to compiling R.
I think Gentoo is an awesome Linux distribution: all packages are compiled from source and are optimized for your processor. Plus, the basic installation is fairly bare-bones and the package management system (emerge) is very smart. Because of Gentoo's preference for compiling packages from source, all of the required tools for compiling R (detailed in the R Installation guide) were already in place on my machine, including gcc (4.3.4), libiconv, and make. Thus, other than downloading MKL, I didn't have to install anything. If you don't have prerequisite packages installed on your Linux distribution, you should be able to track them down easily.
You'll need to get a license for MKL and download the latest version. Extract the archive, then install it using the install.sh script provided by Intel. Read the Install.txt file for details on the MKL installation and licensing process. In the instructions below, I'll assume that MKL has successfully been installed to: /opt/intel/mkl/10.2.4.032. By default, MKL installs to the /opt directory.
Configuring and compiling R with MKL
Download the R 2.10.1 source from CRAN. Extract the archive to a directory of your choice using tar xvzf R-2.10.1.tar.gz.
Before you run the configure script in the R-2.10.1 directory, you'll want to setup the environment variables to ensure that R is compiled with the best code and linking optimizations and that it is linked against MKL. I've adapted the commands below from the R Installation and Administration guide. I would suggest using a bash script to automate this (i.e., paste all of the commands together in a single .sh file to be executed using the source command), but you could also just type in the commands at a bash shell:
make compiles the source and make check runs some basic tests of the compiled program to ensure that R is functioning properly. Note that the lapack.R test from make check will differ from the expected output and may be flagged as an error. At least on my machine, the differences result because the MKL-linked R finds a different, but valid, set of solutions to a system of equations, relative to R's internal LAPACK routines, so I'm not worried.
If you've come this far, all that's left is to type
How much does MKL improve R performance relative to the built-in BLAS/LAPACK?
After someone asked in a comment below, I ran a few quick tests to determine how much MKL sped up my particular installation of R. To do this, I compiled a version of R 2.10.1 using the default settings (just ./configure; make). I then ran an established set of R benchmarks (also used in REvolution's calculations) from here: http://r.research.att.com/benchmarks/. The benchmark script was run in a fresh R session each time, and the benchmarks were repeated 15 times for each R distribution. My computer is a Intel Core 2 Quad 9550 (2.83GHz) with 7GB RAM. The results are impressive and very similar to those reported by REvolution. (The means below represent the number of seconds required to run the full R-benchmark-25.R script.)
Default R:
- Mean=64.95s; SD=11.83s
R with MKL:
- Mean: 11.84; SD=0.13
In other words, the MKL version was around 5.5 times faster than R using the built-in BLAS/LAPACK. Caveat: The speedups may have been due, in part, to the use of -O3 and -march flags, as well as the linker optimizations, but I bet that the vast majority is due to MKL.
----
Next up, I'll write a quick post on how to use your MKL-supercharged R installation within Eclipse. I hope that this guide proves useful and stimulates more people to try out MKL for R.
Recently, there has been a surge in the use of Intel's Math Kernel Library (MKL; http://software.intel.com/en-us/intel-mkl/) among data analysis packages. MKL is a highly optimized set of linear algebra libraries that includes full Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) implementations, as well as fast Fourier transforms and vector math. I think the folk interpretation is that Intel engineers have inside knowledge on how to exploit fully the number crunching powers of Intel CPUs, thereby allowing them to produce a remarkably fast math library. REvolution Computing has developed a version of R that is linked against MKL with impressive speedups in many functions that rely on complex algebraic manipulation. Recently, Enthought Inc. has also begun to provide Python binaries linked against MKL, with similarly improved performance. And although it's not emphasized directly, Matlab links to MKL in most recent releases.
The good news for R users on Linux is that Intel provides a free license for MKL, assuming that it is used for personal, non-commercial purposes. I set out to compile R 2.10.1 from source on 64-bit Gentoo Linux, linking to the latest version of MKL (10.2.4.032). My major goal was to create a super-fast version of R to be used within the StatET plugin for Eclipse, my favorite IDE for R development. Given that the process was bumpy and took the better part of an afternoon, I thought I would post my experiences in hopes that they might be useful to others. The notes below should mostly apply to 32-bit Linux OSs, but I need 64-bit R to process some rather large psychophysiology datasets, so I'll assume you're running 64-bit Linux, too.
Getting ready
The R Installation and Administration guide is the best place to start when learning to compile R. It gives a good listing of prerequisites for installation, configuration options, suggestions for compilation, linking to BLAS and LAPACK libraries, and even good starting points for linking to MKL. I would recommend at least skimming this guide before you try to compile R. Pay particular attention to Appendix A, which details the programs and libraries that need to be present prior to compiling R.
I think Gentoo is an awesome Linux distribution: all packages are compiled from source and are optimized for your processor. Plus, the basic installation is fairly bare-bones and the package management system (emerge) is very smart. Because of Gentoo's preference for compiling packages from source, all of the required tools for compiling R (detailed in the R Installation guide) were already in place on my machine, including gcc (4.3.4), libiconv, and make. Thus, other than downloading MKL, I didn't have to install anything. If you don't have prerequisite packages installed on your Linux distribution, you should be able to track them down easily.
You'll need to get a license for MKL and download the latest version. Extract the archive, then install it using the install.sh script provided by Intel. Read the Install.txt file for details on the MKL installation and licensing process. In the instructions below, I'll assume that MKL has successfully been installed to: /opt/intel/mkl/10.2.4.032. By default, MKL installs to the /opt directory.
Configuring and compiling R with MKL
Download the R 2.10.1 source from CRAN. Extract the archive to a directory of your choice using tar xvzf R-2.10.1.tar.gz.
Before you run the configure script in the R-2.10.1 directory, you'll want to setup the environment variables to ensure that R is compiled with the best code and linking optimizations and that it is linked against MKL. I've adapted the commands below from the R Installation and Administration guide. I would suggest using a bash script to automate this (i.e., paste all of the commands together in a single .sh file to be executed using the source command), but you could also just type in the commands at a bash shell:
export FFLAGS="-march=core2 -O3"These set the gcc compiler flags to compile for a particular architecture (here, Intel Core 2 processors) and to use the highest level of code optimization (O3, that's an "o" not a "zero"). Note that core2 is a supported option for -march as of gcc 4.3. In gcc 4.2, Core 2 processors were optimized using -march=nocona. If you're using a different processor, look here, or try -march=native, which should detect your setup. Some Linux programs won't compile correctly using -O3, which nominally provides the most optimized code, but R compiled perfectly on my box -- and using O3 may lead to noticeable performance enhancements over O2. So, I recommend that you use it.
export CFLAGS="-march=core2 -O3"
export CXXFLAGS="-march=core2 -O3"
export FCFLAGS="-march=core2 -O3"
MKL_LIB_PATH=/opt/intel/mkl/10.2.4.032/lib/em64tThese lines define the location of the 64-bit MKL libraries (MKL_LIB_PATH) and tell the gcc linker where to look for the MKL libraries when compiling R (LD_LIBRARY_PATH).
export LD_LIBRARY_PATH=$MKL_LIB_PATH
export LDFLAGS="-L${MKL_LIB_PATH},-Bdirect,--hash-style=both,-Wl,-O1"This line instructs the linker to look in the MKL_LIB_PATH directory for relevant libraries throughout the compile process and it optimizes the way in which linked libraries are loaded, as discussed here.
export SHLIB_LDFLAGS="-lpthread"These lines are only relevant if you want to compile R as a shared library. In my case, I want to use R within Eclipse, which relies on the JRI package within rJava. If you want to run within an embedded program, such as Eclipse, you will want to compile it as a shared library. Otherwise, it's probably better not to compile R as a shared library (see here for details). The SHLIB_LDFLAGS line above requests that the shared library is linked against the pthread library, which supports multithreading (useful for speeding up R through MKL). If you don't have this line but use the configuration below, the compilation will break.
export MAIN_LDFLAGS="-lpthread"
MKL="-L${MKL_LIB_PATH} -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_lapack -lmkl_core -liomp5 -lpthread"This specifies how to dynamically link MKL to R (i.e., use MKL as the BLAS for R). MKL has numerous linking options. I've adopted the recommendations provided in the R Installation and Administration guide. Intel provides a link advisor tool for MKL here. Interestingly, the link advisor gives a different result than the recommendation above, but I haven't tried compiling R with a different link to MKL.
./configure --enable-R-shlib --with-blas="$MKL" --with-lapackThe configure line requests that R be compiled as a shared library (in my case, so that I can use it within Eclipse) and that it use MKL for the BLAS, as defined by the $MKL environment variable above. Note that the inclusion of --with-lapack indicates that the specified BLAS (MKL) also contains a LAPACK library.
make
make check
make compiles the source and make check runs some basic tests of the compiled program to ensure that R is functioning properly. Note that the lapack.R test from make check will differ from the expected output and may be flagged as an error. At least on my machine, the differences result because the MKL-linked R finds a different, but valid, set of solutions to a system of equations, relative to R's internal LAPACK routines, so I'm not worried.
If you've come this far, all that's left is to type
make installR will be installed in the /usr/local directory by default and the primary R library structure is located in /usr/local/lib64/R. You can now run R by typing: /usr/local/bin/R. Or, if /usr/local is in your PATH, just type R. On Gentoo, you'll want to type env-update && source /etc/profile for the R program to be accessible in your PATH.
How much does MKL improve R performance relative to the built-in BLAS/LAPACK?
After someone asked in a comment below, I ran a few quick tests to determine how much MKL sped up my particular installation of R. To do this, I compiled a version of R 2.10.1 using the default settings (just ./configure; make). I then ran an established set of R benchmarks (also used in REvolution's calculations) from here: http://r.research.att.com/benchmarks/. The benchmark script was run in a fresh R session each time, and the benchmarks were repeated 15 times for each R distribution. My computer is a Intel Core 2 Quad 9550 (2.83GHz) with 7GB RAM. The results are impressive and very similar to those reported by REvolution. (The means below represent the number of seconds required to run the full R-benchmark-25.R script.)
Default R:
- Mean=64.95s; SD=11.83s
R with MKL:
- Mean: 11.84; SD=0.13
In other words, the MKL version was around 5.5 times faster than R using the built-in BLAS/LAPACK. Caveat: The speedups may have been due, in part, to the use of -O3 and -march flags, as well as the linker optimizations, but I bet that the vast majority is due to MKL.
----
Next up, I'll write a quick post on how to use your MKL-supercharged R installation within Eclipse. I hope that this guide proves useful and stimulates more people to try out MKL for R.
Wednesday, April 7, 2010
What this blog is about
I recently decided to start this blog to discuss my experiences as a former computer programmer who is pursuing a career in psychological research. I completed my Ph.D. in clinical psychology in 2009 and am currently a postdoc in the Department of Psychiatry at the University of Pittsburgh. My research focuses on improving the classification of personality disorders using advanced statistical methods, particularly latent variable modeling. I am also interested in the development of personality problems during childhood and adolescence and am starting to pursue developmental neuroimaging training to understand the role of cognitive control in personality dysfunction.
This blog is primarily intended to discuss my experiences programming in R (http://www.r-project.org), which is the main statistical package I use for data analysis. I am passionate about data visualization, so you're likely to see some of my explorations using ggplot2, an awesome R plotting package. I also use SAS, Mplus, and Matlab for some projects. And I use AFNI and FSL to analyze neuroimaging data, so those may float into the mix at times. My work computer dual boots Windows 7 or Gentoo Linux, depending on the task.
I am especially interested to discuss best coding practices in R and am always eager to learn better methods in data analysis. I look forward to your feedback!
Cheers!
Michael
This blog is primarily intended to discuss my experiences programming in R (http://www.r-project.org), which is the main statistical package I use for data analysis. I am passionate about data visualization, so you're likely to see some of my explorations using ggplot2, an awesome R plotting package. I also use SAS, Mplus, and Matlab for some projects. And I use AFNI and FSL to analyze neuroimaging data, so those may float into the mix at times. My work computer dual boots Windows 7 or Gentoo Linux, depending on the task.
I am especially interested to discuss best coding practices in R and am always eager to learn better methods in data analysis. I look forward to your feedback!
Cheers!
Michael
Subscribe to:
Posts (Atom)