tag:blogger.com,1999:blog-5482711974230096808.post6693942931132702476..comments2012-07-27T18:20:14.697-04:00Comments on Computing in Psychological Research: Compiling 64-bit R 2.10.1 with MKL in LinuxMichaelhttp://www.blogger.com/profile/16966694492052508244noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-5482711974230096808.post-57340345856333033472012-07-27T18:19:58.831-04:002012-07-27T18:19:58.831-04:00This comment has been removed by the author.GeorgeSalthttp://www.blogger.com/profile/12427838070015106862noreply@blogger.comtag:blogger.com,1999:blog-5482711974230096808.post-58786548305426919662012-07-27T17:13:17.042-04:002012-07-27T17:13:17.042-04:00Great post. I'll give this a try.
You state:...Great post. I'll give this a try.<br /><br />You state: "Recently, Enthought Inc. has also begun to provide Python binaries linked against MKL, with similarly improved performance."<br /><br />I have the epd bundle and I've experimented with their numpy linked against MKL. Surprisingly, my version of numpy which is linked against my locally compiled and tuned ATLAS libraries outperforms the epd numpy by approximately 15%. By taking the time to tweak ATLAS for your environment you can get pretty good performance -- perhaps as good or better than that provided by precompiled MKL.GeorgeSalthttp://www.blogger.com/profile/12427838070015106862noreply@blogger.comtag:blogger.com,1999:blog-5482711974230096808.post-23247253672891146032011-09-08T20:36:29.274-04:002011-09-08T20:36:29.274-04:00Hey have you done this with 2.13?
Thanks!Hey have you done this with 2.13?<br />Thanks!aaronhttp://www.blogger.com/profile/00559382937771068350noreply@blogger.comtag:blogger.com,1999:blog-5482711974230096808.post-6655542960463502212010-08-24T13:59:28.033-04:002010-08-24T13:59:28.033-04:00Just comments to my previos post.
'MKL_NUM_THR...Just comments to my previos post.<br />'MKL_NUM_THREADS' and 'OMP_NUM_THREADS' both work. I have should check it with small thread number .And look at specific tests that are supposed to gain from BLAS/LAPACK optimization. At the specific task, the speedup is proportional to number of threads. The gain goes to plateau with more than 4 threads. And the patched version on ~40% faster than the official release...<br />[rstats:R-2.10.1] grep ^Linear ~/tmp/Rbench.2.1*<br />/home/vmorozov/tmp/Rbench.2.10.1:Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 16.6253333333333<br />/home/vmorozov/tmp/Rbench.2.11.1:Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.845666666666664<br />/home/vmorozov/tmp/Rbench.2.11.1.patch:Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.495666666666665<br />/home/vmorozov/tmp/Rbench.2.11.1.patch.NT1:Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 2.12966666666667<br />/home/vmorozov/tmp/Rbench.2.11.1.patch.NT2:Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 1.13433333333334<br />/home/vmorozov/tmp/Rbench.2.11.1.patch.NT4:Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.708333333333333<br />/home/vmorozov/tmp/Rbench.2.11.1.patch.NT4OMP:Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 0.690000000000001<br /><br />VladVladhttp://www.blogger.com/profile/16376440544328928741noreply@blogger.comtag:blogger.com,1999:blog-5482711974230096808.post-75099076693492240062010-08-24T13:19:26.342-04:002010-08-24T13:19:26.342-04:00Hi Michael,
Thanks for the great instructions! I ...Hi Michael,<br /><br />Thanks for the great instructions! I was able to get similar speed up (5-6 time) on SUSE 11.0 with dual core quad, Intel(R) Xeon(R) CPU E5310 @ 1.60GHz. <br />Interestingly the patched version,R-patched_2010-08-17, run ~10% faster than R-2.11.1.<br /><br />/home/vmorozov/tmp/Rbench.2.10.1:Total time for all 15 tests_________________________ (sec): 113.439333333333<br />/home/vmorozov/tmp/Rbench.2.11.1:Total time for all 15 tests_________________________ (sec): 22.697<br />/home/vmorozov/tmp/Rbench.2.11.1.patch:Total time for all 15 tests_________________________ (sec): 20.435<br />/home/vmorozov/tmp/Rbench.2.11.1.patch.NT6:Total time for all 15 tests_________________________ (sec): 20.5283333333333<br />/home/vmorozov/tmp/Rbench.2.11.1.patch.NT8:Total time for all 15 tests_________________________ (sec): 21.2033333333333<br /><br />"NT" stands for 6 and 8 threads via "OMP_NUM_THREADS" variable. Apperently it doesn't have effect. That I don't understand<br /><br />VladVladhttp://www.blogger.com/profile/16376440544328928741noreply@blogger.comtag:blogger.com,1999:blog-5482711974230096808.post-64705032933617338252010-04-11T20:58:29.688-04:002010-04-11T20:58:29.688-04:00Hi Dirk, Thanks for pointing out that Debian uses ...Hi Dirk, Thanks for pointing out that Debian uses a shared BLAS and Ubuntu packages are now using MKL. I wasn't aware of that, but it's great that Intel has been flexible about distributing these libraries along with R. You're right that one can easily switch out the reference BLAS/LAPACK libraries, but only if R is compiled with BLAS as a shared library (using the --enable-BLAS-shlib configure option). Here, I wasn't interested in that route (I'm not planning to switch away from MKL), but I can see its potential advantages.Michaelhttp://www.blogger.com/profile/16966694492052508244noreply@blogger.comtag:blogger.com,1999:blog-5482711974230096808.post-53388556127238918132010-04-11T12:11:12.390-04:002010-04-11T12:11:12.390-04:00You misunderstood the BLAS interfaces. Do use MKL,...You misunderstood the BLAS interfaces. Do use MKL, you do NOT have to rebuild R. You simply replace your reference blas, atlas, ... libraries. We have been doing that transparently on Debian for over five years. Plus, these MKLs were actually included with REvolution R in Ubuntu 9.10.eddhttp://www.blogger.com/profile/03846389833417464920noreply@blogger.comtag:blogger.com,1999:blog-5482711974230096808.post-54665962944689379642010-04-11T08:17:03.613-04:002010-04-11T08:17:03.613-04:00How much faster is it compared to a non-MKL R?How much faster is it compared to a non-MKL R?Shigehttp://www.blogger.com/profile/05137341187022320380noreply@blogger.com