Reading source code is a great way to learn opensource projects. I used to read Java projects’ source code on GrepCode for it is online and has very nice cross reference features. As for Scala projects such as Apache Spark, though its source code can be found on GitHub, it’s quite necessary to setup an IDE to view the code more efficiently. Here’s a howto of viewing Spark source code in Eclipse.
One can download Eclipse from here. I recommend the “Eclipse IDE for Java EE Developers”, which contains a lot of daily-used features.
Then go to Scala IDE’s official site and install the plugin through update site or zip archive.
Spark is mainly built with Maven, so make sure you have Maven installed on your box, and download the latest Spark source code from here, unarchive it, and execute the following command:
$ mvn -am -pl core dependency:resolve eclipse:eclipse
This command does a bunch of things. First, it indicates what modules should be built. Spark is a large project with multiple modules. Currently we’re only interested in its core module, so
--projects is used.
--also-make tells Maven to build core module’s dependencies as well. We can see the module list in output:
[INFO] Scanning for projects...
dependency:resolve tells Maven to download all dependencies.
eclipse:eclipse will generate the
.classpath files for Eclipse. But the result is not perfect, both files need some fixes.
core/.classpath, change the following two lines:
<classpathentry kind="src" path="src/main/scala" including="**/*.java"/>
<classpathentry kind="src" path="src/main/scala" including="**/*.java|**/*.scala"/>
core/.project, make it looks like this:
Now you can import “Existing Projects into Workspace”, including
spark-unsafe, Eclipse will report an error “Access restriction: The type ‘Unsafe’ is not API (restriction on required library /path/to/jre/lib/rt.jar”. To fix this, right click the “JRE System Library” entry in Package Explorer, change it to “Workspace default JRE”.
Add the following entry into pom’s project / build / plugins:
Since Spark is a mixture of Java and Scala code, and the maven-eclipse-plugin only knows about Java source files, so we need to use build-helper-maven-plugin to include the Scala sources, as is described here. Fortunately, Spark’s pom.xml has already included this setting.