tag:blogger.com,1999:blog-54827119742300968082024-03-20T21:43:36.503-04:00Computing in Psychological ResearchMichaelhttp://www.blogger.com/profile/16966694492052508244noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-5482711974230096808.post-74068691908474727282010-05-04T10:00:00.001-04:002010-05-04T11:08:59.118-04:00Developing a user-friendly regular expression function (easyGregexpr)In the past few months, I've developed a set of functions for automating model estimation and interpretation using <a href="http://www.statmodel.com/">Mplus</a>, an outstanding latent variable modeling program that has unparalleled flexibility for complex models (e.g., factor mixture models). I recently rolled these functions into an R package called <a href="http://cran.r-project.org/web/packages/MplusAutomation/">MplusAutomation</a>. Because the package focuses on extracting various parameters from text output files, I've learned a lot about regular expressions, particularly Perl-compatible regular expressions using <a href="http://www.pcre.org/">PCRE</a>. R provides a handful of useful regular expression routines that are Perl-compatible (including <span style="font-family:courier new;">perl=TRUE<span style="font-family:georgia;"> as a </span></span><span style="font-family:georgia;"> </span>parameter) and I've made frequent use of <span style="font-family:courier new;">grep</span>, <span style="font-family:courier new;">regexpr</span>, and <span style="font-family:courier new;">gregexpr</span>.<br /><br />The problem with <span style="font-family:courier new;">regexpr </span>and <span style="font-family:courier new;">gregexpr</span>, in particular, is that their output is wonky and does not lend itself to easy string manipulations. The rest of the post will focus on <span style="font-family:courier new;">gregexpr</span>, which is identical to <span style="font-family:courier new;">regexpr</span>, except that it returns all matches for a regular expression, whereas <span style="font-family:courier new;">regexpr </span>returns only the first. So, if you're searching for all instances of the letter "a" in the line "abcacdabb", regexpr would only match the first a, whereas gregexpr would find all three a's.<br /><br />Let's take a simple example. We want R to extract all HTML tags from a text file read into a character vector using the <span style="font-family:courier new;">scan</span> function. So that it's easy to follow, I've just defined a character vector with a simple HTML example.<br /><br /><blockquote style="font-family: courier new;">> exampleText <- c("<html>< head>", "<title>This is a title.</title>", "</head>", "<body>", "<h1>This is an example header.</h1><p>And here is some basic text.</p>", "A line without any tags.", "</body>", "</html>")<br /><br />> exampleText<br />[1] "<html>< head>" <br />[2] "<title>This is a title.</title>" <br />[3] "</head>" <br />[4] "<body>" <br />[5] "<h1>This is an example header.</h1><p>And here is some basic text.</p>"<br />[6] "A line without any tags." <br />[7] "</body>" <br />[8] "</html>"</blockquote><br />Our goal is to locate all of the opening and closing HTML tags in this file with the intention of processing them further in some way. In a real-world example, we might want to compute quantities based on certain numbers extracted from text or to replace certain strings after making some changes. Here is the output from <span style="font-family:courier new;">gregexpr</span> using a simple regular expression that matches all HTML tags in the source above.<br /><blockquote style="font-family: courier new;">> gregexpr("<\\s*/*\\s*\\w+\\s*>", exampleText, perl=TRUE)<br />[[1]]<br />[1] 1 7<br />attr(,"match.length")<br />[1] 6 7<br /><br />[[2]]<br />[1] 1 24<br />attr(,"match.length")<br />[1] 7 8<br /><br />[[3]]<br />[1] 1<br />attr(,"match.length")<br />[1] 7<br /><br />[[4]]<br />[1] 1<br />attr(,"match.length")<br />[1] 6<br /><br />[[5]]<br />[1] 1 31 36 67<br />attr(,"match.length")<br />[1] 4 5 3 4<br /><br />[[6]]<br />[1] -1<br />attr(,"match.length")<br />[1] -1<br /><br />[[7]]<br />[1] 1<br />attr(,"match.length")<br />[1] 7<br /><br />[[8]]<br />[1] 1<br />attr(,"match.length")<br />[1] 7</blockquote><br />As can be seen above, <span style="font-family:courier new;">gregexpr </span>returns a list where each element in the character vector is represented as a list element and the starting positions for each match on a line are returned as numeric vectors within the list elements. The length of the match (i.e., the number of characters) is stored as an attribute "match.length".<br /><br />There are a few things that irk me about the output of <span style="font-family:courier new;">gregexpr</span>. First, the matched string itself is not returned (one would need to use <span style="font-family:courier new;">substr </span>to obtain this). If nothing else, this makes it very difficult to know if you've written a regular expression correctly (i.e., debugging). Second, storing vectors within lists makes it more difficult to extract the lines and positions of matches relative to a simple data.frame. I suppose one could use some combination of <span style="font-family:courier new;">lapply </span>and <span style="font-family:courier new;">sapply </span>to extract values of interest, but it seems unintuitive to me. Third, if one were interested in the matched string, the <span style="font-family:courier new;">substr</span> function expects to receive a <span style="font-family:courier new;">start </span>and <span style="font-family:courier new;">stop </span>character within the string, but <span style="font-family:courier new;">gregexpr </span>returns a match length, so one must resort to expressions like<br /><br /><span style="font-family:courier new;">result <- gregexpr("<\\s*/*\\s*\\w+\\s*>", exampleText, perl=TRUE)</span><br /><br /><span style="font-family:courier new;">matchedString <- substr(exampleText[1], result[[1]][2], result[[1]][2] + attr(result[[1]], "match.length")[2] - 1)</span><br /><br />Now that is some painful code!<br /><br />My goal was to develop a simple wrapper for <span style="font-family:courier new;">gregexpr </span>that would return a <span style="font-family:courier new;">data.frame</span> with the starting and stop positions of each match, as well as the matched string itself. The wrapper is useful for parsing character vectors (although it would be easy to extend it to <span style="font-family:courier new;">lists</span> or <span style="font-family:courier new;">data.frames</span>).<br /><br />Here's the code:<br /><pre style="font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; color: #000000; background-color: #eee;font-size: 12px;border: 1px dashed #999999;line-height: 14px;padding: 5px; overflow: auto; width: 100%"><code>easyGregexpr <- function(pattern, charvector, ...) {<br /> require(plyr)<br /><br /> if (storage.mode(charvector) != "character") stop("easyGregexpr expects charvector to be a character vector.")<br /><br /> #identify all matches<br /> regexpMatches <- gregexpr(pattern, charvector, ...)<br /> <br /> convertMatches <- c()<br /> for (i in 1:length(regexpMatches)) {<br /> thisLine <- regexpMatches[[i]]<br /> #only append if there is at least one match on this line<br /> if (thisLine[1] != -1) {<br /> convertMatches <- rbind(convertMatches, data.frame(element=i, start=thisLine, end=thisLine + attr(thisLine, "match.length") - 1))<br /> }<br /> }<br /> <br /> #if no matches exist, return null (otherwise, will break adply)<br /> if (is.null(convertMatches)) return(NULL)<br /> <br /> #We now have a data frame with the line, starting position, and ending position of every match<br /> #Add the matched string to the data.frame<br /> #Use adply to iterate over rows and apply substr func<br /> convertMatches <- adply(convertMatches, 1, function(row) {<br /> row$match <- substr(charvector[row$element], row$start, row$end)<br /> return(as.data.frame(row))<br /> })<br /><br /> #need to convert from factor to character because adply uses stringsAsFactors=TRUE even when overridden<br /> convertMatches$match <- as.character(convertMatches$match)<br /> return(convertMatches)<br />}<br /></code></pre><br />Any option for <span style="font-family:courier new;">gregexpr</span> (e.g., <span style="font-family:courier new;">perl=TRUE</span>) can be passed identically to <span style="font-family:courier new;">easyGregexpr</span> (because of the ... parameter). The code relies on Hadley Wickham's useful <a href="http://cran.r-project.org/web/packages/plyr/index.html">plyr</a> package, although I wish that <span style="font-family:courier new;">ddply</span> would accept an argument of 1 for <span style="font-family:courier new;">.variables</span> to allow for processing by row (I've seen workarounds like the adply above or using transform, but these seem less intuitive). I also know that the memory management for the function isn't ideal because of the repeated <span style="font-family:courier new;">rbind </span>calls to an initially empty variable. That said, I developed a version that preallocated the <span style="font-family:courier new;">convertMatches</span><span style="font-family:courier new;"><span style="font-family:georgia;"> data.frame starting with the call <span style="font-family:courier new;">numMatches <- sum(sapply(regexpMatches, function(x) sum(x > 0)))</span></span></span> and it was no faster in my profiling runs and the code was less readable.<br /><br />When we use the <span style="font-family:courier new;">easyGregexpr</span> function to parse the <span style="font-family:courier new;">exampleText</span> above, here is the output:<br /><br /><blockquote style="font-family: courier new;"><br />> easyGregexpr("<\\s*/*\\s*\\w+\\s*>", exampleText, perl=TRUE)<br /> element start end match<br />1 1 1 6 <html><br />2 1 7 13 < head><br />3 2 1 7 <title><br />4 2 24 31 </title><br />5 3 1 7 </head><br />6 4 1 6 <body><br />7 5 1 4 <h1><br />8 5 31 35 </h1><br />9 5 36 38 <p><br />10 5 67 70 </p><br />11 7 1 7 </body><br />12 8 1 7 </html><br /></blockquote><br /><br />A friendly, simple, clean <span style="font-family: courier new;">data.frame</span> with the elements, positions, and strings matched! I hope that this function proves useful to others. Maybe in future iterations of R, the built-in functions will provide more useful output. For now, I find myself using this wrapper frequently for parsing text.Michaelhttp://www.blogger.com/profile/16966694492052508244noreply@blogger.com4tag:blogger.com,1999:blog-5482711974230096808.post-27632332465808751192010-04-12T21:06:00.010-04:002010-05-04T11:09:06.937-04:00Using MKL-Linked R in Eclipse<span style="font-weight: bold;">Setting up Eclipse to use MKL-Linked R</span><br /><br />In my previous post, I showed how to compile R 2.10.1 using Intel's Math Kernel Library for the BLAS/LAPACK interface. Even though it takes a bit of time to setup, I think the noticeably improved calculation speed justifies the effort. Although I'm happy to use R from the command line for basic stuff, I prefer to develop my code in <a href="http://www.eclipse.org/">Eclipse</a> and there is one configuration step needed for Eclipse to talk properly to MKL-linked R (assuming you followed the instructions in the last post).<br /><br />If you haven't used Eclipse before, I would strongly recommend it as an Integrated Development Environment (IDE) for R. It's free, has a large user base, and is regularly updated. The editor is very flexible and allows for multiple tabs and easy arrangement of several windows. The <a href="http://www.walware.de/goto/statet">StatET plugin</a> links R with Eclipse through the rJava package, thereby allowing the user to browse objects in the R environment using the object browser. You can also get auto-completion suggestions for functions in the current environment by pressing Ctrl+Space. See <a href="http://jeromyanglim.blogspot.com/2009/03/user-interface-for-r-statet-and-eclipse.html">Jeremy Anglim's post</a> for some useful links about StatET and Eclipse. Before proceeding, if you haven't done so already, install the rJava package in R using <span style="font-family:courier new;">install.packages("rJava")</span>.<br /><br />Because I compiled R as a shared library linked against Intel's MKL, Eclipse struggles to locate libraries for MKL (and the shared R library). To fix this problem, one needs to set the LD_LIBRARY_PATH environment variable to include the MKL library directory, as well as the R shared library directory. By default, Java will not look for libraries in these directories, leading to errors such as this one (assuming you've enabled debugging as described on the <a href="http://www.walware.de/it/statet/installation.html">StatET</a> site:<br /><br /><span style="font-family:courier new;">java.lang.UnsatisfiedLinkError: /usr/local/lib64/R/library/rJava/jri/libjri.so: libmkl_gf_lp64.so: cannot open shared object file: No such file or directory<br /><br />java.lang.UnsatisfiedLinkError: /usr/local/lib64/R/library/rJava/jri/libjri.so: libR.so: cannot open shared object file: No such file or directory<br /><br /><span style="font-family:georgia;">Although some documentation suggested that I set the<span style="font-family:georgia;"> </span></span></span><span style=";font-family:courier new;font-size:100%;" >java.library.path</span> variable in the VM arguments in the JRE tab of the Eclipse run configuration, this is problematic because this setting only helps to load libraries directly requested by the application, <a href="http://kalblogs.blogspot.com/2009/01/java.html">not those libraries that reference other libraries</a> (as is the case for MKL). Anyhow, the trick is to set the LD_LIBRARY_PATH environment variable in the Environment tab of the Eclipse run configuration, as seen here:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxrTkGI6kfT6fz4ZVrXL6snGm0bIv81lef0yMc1EIZ8cu397VPHuFC29c7nvLqtx8AyvRy6hC8MRW_oxx8kqp7NtSdo1GFv-dbcE1l7STgdmxPQH1x_9X2yBJKWCLCHYN9N6sdS8735DQ/s1600/2010-04-19-150502_1680x1050_scrot.png"><img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 225px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxrTkGI6kfT6fz4ZVrXL6snGm0bIv81lef0yMc1EIZ8cu397VPHuFC29c7nvLqtx8AyvRy6hC8MRW_oxx8kqp7NtSdo1GFv-dbcE1l7STgdmxPQH1x_9X2yBJKWCLCHYN9N6sdS8735DQ/s320/2010-04-19-150502_1680x1050_scrot.png" alt="" id="BLOGGER_PHOTO_ID_5461927430377446050" border="0" /></a><br />If you've installed things to the same place as me, in the Environment tab, you'll want to click "New" to add a new environment variable, call it LD_LIBRARY_PATH, and give it a value of <span style="font-family:courier new;">/opt/intel/mkl/10.2.4.032/lib/em64t;/usr/local/lib64/R/lib</span>.<br /><br />The only other thing to be done is to verify that you've setup an R environment that points to the MKL-based installation. R environments are setup within the StatET preferences: Windows > Preferences > StatET > Run/Debug > R Environments. In my case, I installed R to <span style="font-family:courier new;">/usr/local/lib64/R</span>, so I specified that as the target and named the R installation R 2.10.1 x64 MKL. Then, under the R Config tab in your R run configuration (the same one as above), make sure you select the environment you just created. If you haven't setup StatET for Eclipse before, I would strongly encourage you to read <a href="http://www.splusbook.com/R_Eclipse_StatET.pdf">Longhow Lam's guide</a>, which provides more details (see especially the Configuring R section).<br /><br />That's basically it! The major tweak is to tell Java where to find linked libraries using the LD_LIBRARY_PATH environment variable. Happy coding!Michaelhttp://www.blogger.com/profile/16966694492052508244noreply@blogger.com4tag:blogger.com,1999:blog-5482711974230096808.post-66939429311327024762010-04-10T23:03:00.005-04:002010-04-22T13:34:03.640-04:00Compiling 64-bit R 2.10.1 with MKL in Linux<span style="font-weight: bold;">The rationale</span><span style="font-weight: bold;"> for compiling R using the Intel Math Kernel Library</span><br /><br />Recently, there has been a surge in the use of Intel's Math Kernel Library (MKL; <a href="http://software.intel.com/en-us/intel-mkl/">http://software.intel.com/en-us/intel-mkl/</a>) among data analysis packages. MKL is a highly optimized set of linear algebra libraries that includes full Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) implementations, as well as fast Fourier transforms and vector math. I think the folk interpretation is that Intel engineers have inside knowledge on how to exploit fully the number crunching powers of Intel CPUs, thereby allowing them to produce a remarkably fast math library. REvolution Computing has developed a version of R that is linked against MKL with <a href="http://www.revolution-computing.com/products/r-performance.php">impressive speedups</a> in many functions that rely on complex algebraic manipulation. Recently, Enthought Inc. has also begun to provide Python binaries linked against MKL, with <a href="http://www.enthought.com/epd/mkl/">similarly improved performance</a>. And although it's not emphasized directly, Matlab links to MKL in most <a href="http://www.mathworks.com/access/helpdesk/help/techdoc/rn/f14-998197.html">recent releases</a>.<br /><br />The good news for R users on Linux is that Intel provides a <a href="http://software.intel.com/en-us/articles/non-commercial-software-download/">free license</a> for MKL, assuming that it is used for <a href="http://software.intel.com/en-us/articles/non-commercial-software-development/">personal, non-commercial purposes</a>. I set out to compile <a href="http://cran.r-project.org/">R 2.10.1</a> from source on 64-bit Gentoo Linux, linking to the latest version of MKL (10.2.4.032). My major goal was to create a super-fast version of R to be used within the <a href="http://www.walware.de/goto/statet">StatET plugin</a> for <a href="http://eclipse.org/">Eclipse</a>, my favorite IDE for R development. Given that the process was bumpy and took the better part of an afternoon, I thought I would post my experiences in hopes that they might be useful to others. The notes below should mostly apply to 32-bit Linux OSs, but I need 64-bit R to process some rather large psychophysiology datasets, so I'll assume you're running 64-bit Linux, too.<br /><br /><span style="font-weight: bold;">Getting ready</span><br /><br />The <a href="http://cran.r-project.org/doc/manuals/R-admin.html">R Installation and Administration guide</a> is the best place to start when learning to compile R. It gives a good listing of <a href="http://cran.r-project.org/doc/manuals/R-admin.html#Essential-and-useful-other-programs-under-Unix">prerequisites for installation</a>, configuration options, suggestions for compilation, linking to BLAS and LAPACK libraries, and even good starting points for <a href="http://cran.r-project.org/doc/manuals/R-admin.html#MKL">linking to MKL</a>. I would recommend at least skimming this guide before you try to compile R. Pay particular attention to Appendix A, which details the programs and libraries that need to be present prior to compiling R.<br /><br />I think <a href="http://www.gentoo.org/">Gentoo</a> is an awesome Linux distribution: all packages are compiled from source and are optimized for your processor. Plus, the basic installation is fairly bare-bones and the package management system (emerge) is very smart. Because of Gentoo's preference for compiling packages from source, all of the required tools for compiling R (detailed in the R Installation guide) were already in place on my machine, including gcc (4.3.4), libiconv, and make. Thus, other than downloading MKL, I didn't have to install anything. If you don't have prerequisite packages installed on your Linux distribution, you should be able to track them down easily.<br /><br />You'll need to get a license for MKL and <a href="http://software.intel.com/en-us/articles/non-commercial-software-download/">download the latest version</a>. Extract the archive, then install it using the <span style="font-family:courier new;">install.sh</span> script provided by Intel. Read the <span style="font-family:courier new;">Install.txt</span> file for details on the MKL installation and licensing process. In the instructions below, I'll assume that MKL has successfully been installed to: <span style="font-family:courier new;">/opt/intel/mkl/10.2.4.032</span>. By default, MKL installs to the <span style="font-family:courier new;">/opt</span> directory.<br /><br /><span style="font-weight: bold;">Configuring and compiling </span><span style="font-weight: bold;">R</span><span style="font-weight: bold;"> with MKL</span><br /><br />Download the <a href="http://cran.r-project.org/src/base/R-2/R-2.10.1.tar.gz">R 2.10.1 source</a> from CRAN. Extract the archive to a directory of your choice using tar xvzf R-2.10.1.tar.gz.<br /><br />Before you run the configure script in the R-2.10.1 directory, you'll want to setup the environment variables to ensure that R is compiled with the best code and linking optimizations and that it is linked against MKL. I've adapted the commands below from the R Installation and Administration guide. I would suggest using a bash script to automate this (i.e., paste all of the commands together in a single .sh file to be executed using the source command), but you could also just type in the commands at a bash shell:<br /><br /><blockquote style="font-family: courier new;">export FFLAGS="-march=core2 -O3"<br />export CFLAGS="-march=core2 -O3"<br />export CXXFLAGS="-march=core2 -O3"<br />export FCFLAGS="-march=core2 -O3"</blockquote>These set the gcc compiler flags to compile for a particular architecture (here, Intel Core 2 processors) and to use the highest level of code optimization (O3, that's an "o" not a "zero"). Note that <span style="font-family:courier new;">core2 </span>is a supported option for <span style="font-family:courier new;">-march</span> as of gcc 4.3. In gcc 4.2, Core 2 processors were optimized using <span style="font-family:courier new;">-march=nocona</span>. If you're using a different processor, <a href="http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html">look here</a>, or try <span style="font-family:courier new;">-march=native</span>, which should detect your setup. Some Linux programs won't compile correctly using <span style="font-family:courier new;">-O3</span>, which nominally provides the most optimized code, but R compiled perfectly on my box -- and using O3 may lead to noticeable performance enhancements over O2. So, I recommend that you use it.<br /><br /><blockquote style="font-family: courier new;">MKL_LIB_PATH=/opt/intel/mkl/10.2.4.032/lib/em64t<br /><br />export LD_LIBRARY_PATH=$MKL_LIB_PATH<br /></blockquote>These lines define the location of the 64-bit MKL libraries (<span style="font-family:courier new;">MKL_LIB_PATH</span>) and tell the gcc linker where to look for the MKL libraries when compiling R (<span style="font-family:courier new;">LD_LIBRARY_PATH</span>).<br /><br /><span style="font-family:courier new;"></span><blockquote><span style="font-family:courier new;">export LDFLAGS="-L${MKL_LIB_PATH},-Bdirect,--hash-style=both,-Wl,-O1"</span></blockquote>This line instructs the linker to look in the MKL_LIB_PATH directory for relevant libraries throughout the compile process and it optimizes the way in which linked libraries are loaded, as discussed <a href="http://lwn.net/Articles/192624/">here</a>.<br /><br /><span style="font-family:courier new;"></span><blockquote><span style="font-family:courier new;">export SHLIB_LDFLAGS="-lpthread"<br />export MAIN_LDFLAGS="-lpthread"<br /></span></blockquote>These lines are only relevant if you want to compile R as a shared library. In my case, I want to use R within Eclipse, which relies on the JRI package within <a href="http://www.rforge.net/rJava/">rJava</a>. If you want to run within an embedded program, such as Eclipse, you will want to compile it as a shared library. Otherwise, it's probably better not to compile R as a shared library (see <a href="http://cran.r-project.org/doc/manuals/R-admin.html#Configuration-options">here</a> for details). The SHLIB_LDFLAGS line above requests that the shared library is linked against the pthread library, which supports multithreading (useful for speeding up R through MKL). If you don't have this line but use the configuration below, the compilation will break.<br /><br /><blockquote style="font-family: courier new;">MKL="-L${MKL_LIB_PATH} -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_lapack -lmkl_core -liomp5 -lpthread"<br /></blockquote>This specifies how to dynamically link MKL to R (i.e., use MKL as the BLAS for R). MKL has numerous linking options. I've adopted the recommendations provided in the R Installation and Administration guide. Intel provides a link advisor tool for MKL <a href="http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/">here</a>. Interestingly, the link advisor gives a different result than the recommendation above, but I haven't tried compiling R with a different link to MKL.<br /><br /><blockquote><span style="font-family:courier new;">./configure --enable-R-shlib --with-blas="$MKL" --with-lapack<br /><br />make<br /><br />make check<br /></span></blockquote>The configure line requests that R be compiled as a shared library (in my case, so that I can use it within Eclipse) and that it use MKL for the BLAS, as defined by the $MKL environment variable above. Note that the inclusion of <span style="font-family:courier new;">--with-lapack</span> indicates that the specified BLAS (MKL) also contains a LAPACK library.<br /><br /><span style="font-family:courier new;">make</span> compiles the source and <span style="font-family:courier new;">make check</span> runs some basic tests of the compiled program to ensure that R is functioning properly. Note that the lapack.R test from <span style="font-family:courier new;">make check</span> will differ from the expected output and may be flagged as an error. At least on my machine, the differences result because the MKL-linked R finds a different, <span style="font-style: italic;">but valid</span>, set of solutions to a system of equations, relative to R's internal LAPACK routines, so I'm not worried.<br /><br />If you've come this far, all that's left is to type<br /><br /><blockquote><span style="font-family:courier new;">make install</span><br /></blockquote>R will be installed in the <span style="font-family:courier new;">/usr/local</span> directory by default and the primary R library structure is located in <span style="font-family:courier new;">/usr/local/lib64/R</span>. You can now run R by typing: <span style="font-family:courier new;">/usr/local/bin/R</span>. Or, if <span style="font-family:courier new;">/usr/local</span> is in your <span style="font-family:courier new;">PATH</span>, just type <span style="font-family:courier new;">R.<span style="font-family:georgia;"> On Gentoo, you'll want to type <span style="font-family:courier new;">env-update && source /etc/profile<span style="font-family:georgia;"> for the R program to be accessible in your <span style="font-family:courier new;">PATH.</span></span></span></span><br /><br /><span style="font-weight: bold;font-family:georgia;" >How much does MKL improve R performance relative to the built-in BLAS/LAPACK?</span><br /><span style="font-weight: bold;"><br /></span><span style="font-family:georgia;">After someone asked in a comment below, I ran a few quick tests to determine how much MKL sped up my particular installation of R. To do this, I compiled a version of R 2.10.1 using the default settings (just ./configure; make). I then ran an established set of R benchmarks (also used in REvolution's calculations) from here: <a href="http://r.research.att.com/benchmarks/">http://r.research.att.com/benchmarks/</a>. The benchmark script was run in a fresh R session each time, and the benchmarks were repeated 15 times for each R distribution. My computer is a Intel Core 2 Quad 9550 (2.83GHz) with 7GB RAM. The results are impressive and very similar to those reported by REvolution. (The means below represent the number of seconds required to run the full </span></span><a href="http://r.research.att.com/benchmarks/R-benchmark-25.R">R-benchmark-25.R</a> script.)<br /><span style="font-family:courier new;"><span style="font-family:georgia;"><br />Default R:<br /> - Mean=64.95s; SD=11.83s<br /><br />R with MKL:<br /> - Mean: 11.84; SD=0.13<br /><br />In other words, the MKL version was around 5.5 times faster than R using the built-in BLAS/LAPACK. Caveat: The speedups may have been due, in part, to the use of -O3 and -march flags, as well as the linker optimizations, but I bet that the vast majority is due to MKL.<br /><br /></span><span style="font-weight: bold;"></span>----<br /><span style="font-family:georgia;">Next up, I'll write a quick post on how to use your MKL-supercharged R installation within Eclipse. I hope that this guide proves useful and stimulates more people to try out <span style="font-family:georgia;">MKL</span> for R.</span><br /></span>Michaelhttp://www.blogger.com/profile/16966694492052508244noreply@blogger.com9tag:blogger.com,1999:blog-5482711974230096808.post-70210086243292649742010-04-07T13:43:00.000-04:002010-04-07T14:01:26.454-04:00What this blog is aboutI recently decided to start this blog to discuss my experiences as a former computer programmer who is pursuing a career in psychological research. I completed my Ph.D. in clinical psychology in 2009 and am currently a postdoc in the Department of Psychiatry at the University of Pittsburgh. My research focuses on improving the classification of personality disorders using advanced statistical methods, particularly latent variable modeling. I am also interested in the development of personality problems during childhood and adolescence and am starting to pursue developmental neuroimaging training to understand the role of cognitive control in personality dysfunction.<br /><br />This blog is primarily intended to discuss my experiences programming in R (<a href="http://www.r-project.org">http://www.r-project.org</a>), which is the main statistical package I use for data analysis. I am passionate about data visualization, so you're likely to see some of my explorations using ggplot2, an awesome R plotting package. I also use SAS, Mplus, and Matlab for some projects. And I use AFNI and FSL to analyze neuroimaging data, so those may float into the mix at times. My work computer dual boots Windows 7 or Gentoo Linux, depending on the task.<br /><br />I am especially interested to discuss best coding practices in R and am always eager to learn better methods in data analysis. I look forward to your feedback!<br /><br />Cheers!<br />MichaelMichaelhttp://www.blogger.com/profile/16966694492052508244noreply@blogger.com2