Download and Install Apache Spark to a Local Directory
install.spark.Rd
install.spark
downloads and installs Spark to a local directory if
it is not found. If SPARK_HOME is set in the environment, and that directory is found, that is
returned. The Spark version we use is the same as the SparkR version. Users can specify a desired
Hadoop version, the remote mirror site, and the directory where the package is installed locally.
Arguments
- hadoopVersion
Version of Hadoop to install. Default is
"3"
. IfhadoopVersion = "without"
, "Hadoop free" build is installed. See "Hadoop Free" Build for more information. Other patched version names can also be used.- mirrorUrl
base URL of the repositories to use. The directory layout should follow Apache mirrors.
- localDir
a local directory where Spark is installed. The directory contains version-specific folders of Spark packages. Default is path to the cache directory:
Mac OS X:
~/Library/Caches/spark
Unix:
$XDG_CACHE_HOME
if defined, otherwise~/.cache/spark
Windows:
%LOCALAPPDATA%\Apache\Spark\Cache
.
- overwrite
If
TRUE
, download and overwrite the existing tar file in localDir and force re-install Spark (in case the local directory or file is corrupted)
Details
The full url of remote file is inferred from mirrorUrl
and hadoopVersion
.
mirrorUrl
specifies the remote path to a Spark folder. It is followed by a subfolder
named after the Spark version (that corresponds to SparkR), and then the tar filename.
The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz.
For example, the full path for a Spark 3.3.1 package from
https://archive.apache.org
has path:
http://archive.apache.org/dist/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz
.
For hadoopVersion = "without"
, [Hadoop version] in the filename is then
without-hadoop
.
See also
See available Hadoop versions: Apache Spark