A Hibernate Optimization Not Quite Available in JPA

Object Relational Mapping tools (ORM), especially Hibernate, are very widely deployed, so they’re not exactly controversial. However, from time to time we hear complaints about performance of these tools. A common complaint is that every field and every relationship are loaded when only specific fields, relationships, and aggregations are required to be loaded. This post will address a potential cause of performance issues common to database growth and will show a simple Hibernate trick that can greatly improve the performance of this application.

Dashboard

In this example, we are building a system for tracking fitness activities. We are building a screen that will display future workouts as of a given date. This is a pretty straight forward query in and of itself. However, multiple tables must be joined together to fully represent a workout. In our case, we have a WorkoutDaySignup, a WorkoutDayResult, a user, a WorkoutRoutineWorkoutDay, and a UserStats aggregation table. There are multiple relationships between these tables, so for a given upcoming workout, we may need upwards of 40 queries if we are pre-fetching all related tables. Obviously we can toy with lazy loading, and in many cases this may work fantastically to reduce the overhead of a query. But in this case we need to eagerly fetch for several other use cases. There are ways to handle this problem, but this post shows a solution that Hibernate offers that can greatly enhance performance. The following ER model snippet roughly demonstrates the above relationship.

ER Model

The following DAO code can be used to retrieve all workout signups for a specified period of time.

public List findNextNWorkoutSignupsForUser(
Integer userId, Integer numWorkouts, Date startDate) {
String hql = "select distinct wds from WorkoutDaySignup wds " +
" left join wds.workoutDayResult wdr " +
" where wds.user.userId = ? " + 
" and (wdr is null or (wdr.isSkipped = 0 and wdr.complete = 0))";
if (startDate != null)
hql += " and wds.workoutDayDate >= ? ";
hql += " order by wds.workoutDayDate, wds.workoutDaySignupId ";
logger.debug(String.format("hql is:%s", hql));
Query query = null;
if (startDate == null)
query = doQuery(hql, userId);
else
query = doQuery(hql, userId, startDate);
return query.setMaxResults(numWorkouts).list();
}

As can be seen in the above code, we built a query that relies on HQL to filter and to hydrate the WorkoutDaySignup on the underlying Hibernate infrastructure. The method doQuery() is a shortcut method that builds an org.hibernate. Query from the current Hibernate Session. However, there are plenty of other ways to build this query object. The code builds the query and then executes it, mapping the query results into a List of WorkoutDaySignups. This generated over 40 sql calls to process (more on the performance later). Consider the following code, which is similar but slightly more verbose.

public List findNextNLightweightWorkoutSignupsForUser(Integer userId, Integer numWorkouts, Date date) {
String hql = "select distinct wds.id as id, " + 
" case when wd is not null then wd.workoutDayId else wd2.workoutDayId end as workoutDayId, " +
" wds.user.userId as userId,case when wd is not null then wd.name else wd2.name end as name, case when wd is not null then wd.description else wd2.description end as description, " + 
" wds.workoutDayDate as workoutDate, wds.user.fullName as fullName, wds.user.username as username, case when wdr is null then false elsecase when (wdr.complete = 1 or wdr.isSkipped = 1) then true else false end end as isComplete, "+ 
" case when wd is not null then case when wd.hasImage = 1 then true else false end else case when wd2.hasImage = 1 then true else false end end as hasImage, " + 
" case when wd is null then wd2.estimatedDurationSeconds else wd.estimatedDurationSeconds end as durationSeconds " +
" from WorkoutDaySignup wds " +
" left join wds.workoutDayResult wdr " +
" left join wds.workoutDay wd left join wds.workoutRoutineWorkoutDay.workoutDay wd2 " + 
" where wds.user.userId = ? " + 
" and (wdr is null or (wdr.isSkipped = 0 and wdr.complete = 0)) and wds.workoutDayDate >= ? " + 
" order by wds.workoutDayDate, wds.workoutDaySignupId ";
return doQuery(hql, userId, date)
.setMaxResults(numWorkouts)
.setResultTransformer(Transformers.aliasToBean(LightweightWorkoutDaySignup.class))
.list();
}

This code has more HQL in it. Instead of relying on Hibernate to decide what fields to retrieve, we explicitly select only the fields that we want to include. We use a case statement to flatten a few multi table joins into single rows. This query utilizes an org.hibernate.Query object as well, but before executing the query it sets an org.hibernate.transform.ResultTransformer. In this case it creates an org.hibernate.transform.AliasToBeanResultTransformer from a Lightweight DTO class we’ve created. This particular transformer matches the aliases defined in the HQL to JavaBean properties on the specified class. We essentially project those fields onto a List of DTO classes. In this case, our 40+ queries were reduced to one, with obvious implications for overhead. When running the above code in a single thread, the first query took an average of 79 ms to execute. The second one, an average of 16 ms, or 5x faster. The results would be even more staggering if we took the further step of serializing the results into a wire format.

It turns out that in the JPA world, you can practically do the same thing, but the code is much more difficult to maintain and is prone towards brittleness. You end up creating HQL queries that create an inline DTO object. But there’s no notion of an alias to bean result transformer, so the developer is left to create constructors.

List result = em.createQuery("select distinct new LightweightWorkoutDaySignup(wds.workoutDaySignupId, wd.workoutDayId, " +
" wds.user.userId,wd.name, wd.description, " + 
" wds.workoutDayDate, wds.user.fullName as fullName, wds.user.username as username, case when wdr is null then false elsecase when (wdr.complete = 1 or wdr.isSkipped = 1) then true else false end end as isComplete, "+ 
" wd.hasImage, wd.estimatedDurationSeconds)" +
" from WorkoutSignup ws " +
" join ws.workoutRoutineSignups wrs join wrs.workoutDaySignups wds join wds.workoutRoutineWorkoutDay wrwd join wrwd.workoutDay wd left join wds.workoutDayResult wdr " + 
" where ws.workoutSignupId = ? and ws.user.userId = ? " + 
" order by wds.workoutDayDate, wds.workoutDaySignupId ")
.setParameter(1, workoutSignupId)
.setParameter(2, userId)
.getResultList();

This is a suboptimal approach because the number of fields that may be represented in the Java object could be large. As fields are added, new fields must be added to the constructor, and any HQL that is not updated appropriately will not break until run time (since the HQL is not compiled until runtime). Furthermore, new code that is written will have to re-create the constructor with all of its arguments in the correct order. This is much more difficult than it sounds.

For this reason, we often find DAOs, that otherwise would be implemented as pure JPA, “dipping” into Hibernate, when this particular optimization is required. We recommend keeping this in mind if large object graphs become a performance issue in your application.