enabling data science: February 2018

As an enhancement to machine learning servers built on AWS or Azure, it is often necessary to set up R development environment to meet the needs of data science community.

Adapt for your specific environment. Here we assume we to use AWS deep learning conda image (ubuntu). Specially we use "python3" virtual environment (source activate python3). One of the reasons to use this environment is it is already set up to run Jupyter Notebook (see auto start jupyter), we can therefore add an additional R kernel to it. Then we have a consolidated image that can be offered to both Python and R users.

The easiest method to install R is using conda:
conda install r r-essentials

RStudio is a popular development environment. Follow instructions to install RStudio, for example:
sudo apt-get install gdebi-core
wget https://download2.rstudio.org/rstudio-server-1.1.419-i386.deb
sudo gdebi rstudio-server-1.1.419-i386.deb

The above procedure also sets up auto start of R studio server by adding /etc/systemd/system/rstudio-server.service. However, because the only available procedure installs RStudio with "sudo" into the default system environment, it cannot find R which has been installed into a different environment. As a result, RStudio fails to start with error indicating RStudio cannot find R.
rstudio-server verify-installation
Unable to find an installation of R on the system (which R didn't return valid output); Unable to locate R binary by scanning standard locations

This can be easily fixed by specifying the exact path to R for RStudio, replace path with your installation of R:
sudo sh -c 'echo "rsession-which-r=/home/ubuntu/anaconda3/envs/python3/bin/R" >> /etc/rstudio/rserver.conf'

Restart instance, now RStudio Server starts successfully. Login with Linux credential at:
"http:(server IP):8787"

enabling data science

Sunday, February 4, 2018

Auto starting R Studio on AWS Deep Learning server