3
3
The whole project is built using Maven. The build also includes a Docker image, which requires that Docker
4
4
is installed on the build machine.
5
5
6
- ## Build with Maven
6
+ ## Prerequisites
7
+
8
+ You need the following tools installed on your machine:
9
+ * JDK 1.8 or later. If you build a variant with Scala 2.11, you have to use JDK 1.8 (and not anything newer like
10
+ Java 11). This mainly affects builds with Spark 2.x
11
+ * Apache Maven (install via package manager download from https://maven.apache.org/download.cgi )
12
+ * npm (install via package manager or download from https://www.npmjs.com/get-npm )
13
+ * Windows users also need Hadoop winutils installed. Those can be retrieved from https://github.com/cdarlint/winutils
14
+ and later. See some additional details for building on Windows below.
15
+
16
+
17
+ # Build with Maven
7
18
8
19
Building Flowman with the default settings (i.e. Hadoop and Spark version) is as easy as
9
20
@@ -22,9 +33,11 @@ in a complex environment with Kerberos. You can find the `tar.gz` file in the di
22
33
23
34
## Build on Windows
24
35
25
- Although you can normally build Flowman on Windows, you will need the Hadoop WinUtils installed. You can download
26
- the binaries from https://github.com/steveloughran/winutils and install an appropriate version somewhere onto your
27
- machine. Do not forget to set the HADOOP_HOME environment variable to the installation directory of these utils!
36
+ Although you can normally build Flowman on Windows, it is recommended to use Linux instead. But nevertheless Windows
37
+ is still supported to some extend, but requires some extra care. You will need the Hadoop WinUtils installed. You can
38
+ download the binaries from https://github.com/cdarlint/winutils and install an appropriate version somewhere onto
39
+ your machine. Do not forget to set the HADOOP_HOME or PATH environment variable to the installation directory of these
40
+ utils!
28
41
29
42
You should also configure git such that all files are checked out using "LF" endings instead of "CRLF", otherwise
30
43
some unittests may fail and Docker images might not be useable. This can be done by setting the git configuration
@@ -46,24 +59,23 @@ the `master` branch really builds clean with all unittests passing on Linux.
46
59
47
60
## Build for Custom Spark / Hadoop Version
48
61
49
- Per default, Flowman will be built for fairly recent versions of Spark (2.4.5 as of this writing) and Hadoop (2.8.5 ).
62
+ Per default, Flowman will be built for fairly recent versions of Spark (3.0.2 as of this writing) and Hadoop (3.2.0 ).
50
63
But of course you can also build for a different version by either using a profile
51
64
52
65
``` shell
53
- mvn install -Pspark2.3 -Phadoop2.7 -DskipTests
66
+ mvn install -Pspark2.4 -Phadoop2.7 -DskipTests
54
67
```
55
68
56
69
This will always select the latest bugfix version within the minor version. You can also specify versions explicitly
57
70
as follows:
58
71
59
72
``` shell
60
- mvn install -Dspark.version=2.2.1 -Dhadoop.version=2.7.3
73
+ mvn install -Dspark.version=2.4.3 -Dhadoop.version=2.7.3
61
74
```
62
75
63
76
Note that using profiles is the preferred way, as this guarantees that also dependencies are selected
64
77
using the correct version. The following profiles are available:
65
78
66
- * spark-2.3
67
79
* spark-2.4
68
80
* spark-3.0
69
81
* spark-3.1
@@ -73,37 +85,12 @@ using the correct version. The following profiles are available:
73
85
* hadoop-2.9
74
86
* hadoop-3.1
75
87
* hadoop-3.2
76
- * CDH-5.15
77
88
* CDH-6.3
78
89
79
90
With these profiles it is easy to build Flowman to match your environment.
80
91
81
92
## Building for Open Source Hadoop and Spark
82
93
83
- ### Spark 2.3 and Hadoop 2.6:
84
-
85
- ``` shell
86
- mvn clean install -Pspark-2.3 -Phadoop-2.6
87
- ```
88
-
89
- ### Spark 2.3 and Hadoop 2.7:
90
-
91
- ``` shell
92
- mvn clean install -Pspark-2.3 -Phadoop-2.7
93
- ```
94
-
95
- ### Spark 2.3 and Hadoop 2.8:
96
-
97
- ``` shell
98
- mvn clean install -Pspark-2.3 -Phadoop-2.8
99
- ```
100
-
101
- ### Spark 2.3 and Hadoop 2.9:
102
-
103
- ``` shell
104
- mvn clean install -Pspark-2.3 -Phadoop-2.9
105
- ```
106
-
107
94
### Spark 2.4 and Hadoop 2.6:
108
95
109
96
``` shell
@@ -148,13 +135,7 @@ mvn clean install -Pspark-3.1 -Phadoop-3.2
148
135
149
136
## Building for Cloudera
150
137
151
- The Maven project also contains preconfigured profiles for Cloudera.
152
-
153
- ``` shell
154
- mvn clean install -Pspark-2.3 -PCDH-5.15 -DskipTests
155
- ```
156
-
157
- Or for Cloudera 6.3
138
+ The Maven project also contains preconfigured profiles for Cloudera CDH 6.3.
158
139
159
140
``` shell
160
141
mvn clean install -Pspark-2.4 -PCDH-6.3 -DskipTests
0 commit comments