Use WebJars in Scalatra Project

As I’m working with my first Scalatra project, I automatically think of using WebJars to manage Javascript library dependencies, since it’s more convenient and seems like a good practice. Though there’s no official support for Scalatra framework, the installation process is not very complex. But this doesn’t mean I didn’t spend much time on this. I’m still a newbie to Scala, and there’s only a few materials on this subject.

Add WebJars Dependency in SBT Build File

Scalatra uses .scala configuration file instead of .sbt, so let’s add dependency into project/build.scala. Take Dojo for example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
object DwExplorerBuild extends Build {
...
lazy val project = Project (
...
settings = Defaults.defaultSettings ++ ScalatraPlugin.scalatraWithJRebel ++ scalateSettings ++ Seq(
...
libraryDependencies ++= Seq(
...
"org.webjars" % "dojo" % "1.9.3"
),
...
)
)
}

To view this dependency in Eclipse, you need to run sbt eclipse again. In the Referenced Libraries section, you can see a dojo-1.9.3.jar, and the library lies in META-INF/resources/webjars/.

Read More

Generate Auto-increment Id in Map-reduce Job

In DBMS world, it’s easy to generate a unique, auto-increment id, using MySQL’s AUTO_INCREMENT attribute on a primary key or MongoDB’s Counters Collection pattern. But when it comes to a distributed, parallel processing framework, like Hadoop Map-reduce, it is not that straight forward. The best solution to identify every record in such framework is to use UUID. But when an integer id is required, it’ll take some steps.

Solution A: Single Reducer

This is the most obvious and simple one, just use the following code to specify reducer numbers to 1:

1
job.setNumReduceTasks(1);

And also obvious, there are several demerits:

  1. All mappers output will be copied to one task tracker.
  2. Only one process is working on shuffel & sort.
  3. When producing output, there’s also only one process.

The above is not a problem for small data sets, or at least small mapper outputs. And it is also the approach that Pig and Hive use when they need to perform a total sort. But when hitting a certain threshold, the sort and copy phase will become very slow and unacceptable.

Read More

Manage Leiningen Project Configuration

In Maven projects, we tend to use .properties files to store various configurations, and use Maven profiles to switch between development and production environments. Like the following example:

1
2
# database.properties
mydb.jdbcUrl=${mydb.jdbcUrl}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<!-- pom.xml -->
<profiles>
<profile>
<id>development</id>
<activation><activeByDefault>true</activeByDefault></activation>
<properties>
<mydb.jdbcUrl>jdbc:mysql://127.0.0.1:3306/mydb</mydb.jdbcUrl>
</properties>
</profile>
<profile>
<id>production</id>
<!-- This profile could be moved to ~/.m2/settings.xml to increase security. -->
<properties>
<mydb.jdbcUrl>jdbc:mysql://10.0.2.15:3306/mydb</mydb.jdbcUrl>
</properties>
</profile>
</profiles>

As for Leiningen projects, there’s no variable substitution in profile facility, and although in profiles we could use :resources to compact production-wise files into Jar, these files are actually replacing the original ones, instead of being merged. One solution is to strictly seperate environment specific configs from the others, so the replacement will be ok. But here I take another approach, to manually load files from difference locations, and then merge them.

Read More