Reading source code is a great way to learn opensource projects. I used to read Java projects’ source code on GrepCode for it is online and has very nice cross reference features. As for Scala projects such as Apache Spark, though its source code can be found on GitHub, it’s quite necessary to setup an IDE to view the code more efficiently. Here’s a howto of viewing Spark source code in Eclipse.
Install Eclipse and Scala IDE Plugin
One can download Eclipse from here. I recommend the “Eclipse IDE for Java EE Developers”, which contains a lot of daily-used features.
Then go to Scala IDE’s official site and install the plugin through update site or zip archive.
Generate Project File with Maven
Spark is mainly built with Maven, so make sure you have Maven installed on your box, and download the latest Spark source code from here, unarchive it, and execute the following command:
1 | $ mvn -am -pl core dependency:resolve eclipse:eclipse |
This command does a bunch of things. First, it indicates what modules should be built. Spark is a large project with multiple modules. Currently we’re only interested in its core module, so -pl
or --projects
is used. -am
or --also-make
tells Maven to build core module’s dependencies as well. We can see the module list in output:
1 | [INFO] Scanning for projects... |
dependency:resolve
tells Maven to download all dependencies. eclipse:eclipse
will generate the .project
and .classpath
files for Eclipse. But the result is not perfect, both files need some fixes.
Edit core/.classpath
, change the following two lines:
1 | <classpathentry kind="src" path="src/main/scala" including="**/*.java"/> |
to
1 | <classpathentry kind="src" path="src/main/scala" including="**/*.java|**/*.scala"/> |
Edit core/.project
, make it looks like this:
1 | <buildSpec> |
Now you can import “Existing Projects into Workspace”, including core
, launcher
, network
, and unsafe
.
Miscellaneous
Access restriction: The type ‘Unsafe’ is not API
For module spark-unsafe
, Eclipse will report an error “Access restriction: The type ‘Unsafe’ is not API (restriction on required library /path/to/jre/lib/rt.jar”. To fix this, right click the “JRE System Library” entry in Package Explorer, change it to “Workspace default JRE”.
Download Sources and Javadocs
Add the following entry into pom’s project / build / plugins:
1 | <plugin> |
build-helper-maven-plugin
Since Spark is a mixture of Java and Scala code, and the maven-eclipse-plugin only knows about Java source files, so we need to use build-helper-maven-plugin to include the Scala sources, as is described here. Fortunately, Spark’s pom.xml has already included this setting.