Problem n + 1 in Hibernate

Radosław Kondziołka
Calendar icon
21 stycznia 2021

Hibernate is a very popular ORM (object-relational mapping) framework dedicated to programs written in Java or other languages running in the Java virtual machine, such as Kotlin. These types of tools enable a bidirectional mapping of the world of database relationships to the world of objects. The popularity of relational databases as repositories is reflected in the popularity of ORM solutions. And - as is often the case - this is where oversights are often made, resulting in either incorrect behavior or performance degradation. One such well-known problem is the eponymous n + 1 problem, which this post treats.

The n + 1 problem

The n + 1 problem can arise in the case where one entity (table) references another entity (table). In such a situation, it happens that in order to retrieve the value of the dependent entity, n redundant queries are performed while only one would suffice. No one needs to be convinced that this has a negative impact on system performance and generates unnecessary load on the database. Especially since the number of queries increases with n. The problem itself is often presented as occurring only inone-to-many relationships(javax.persistence.OneToMany) or only in the case of lazy data loading(javax.persistence.FetchType.LAZY). This is not true, and it should be remembered that this problem can also occur in one-to-one relationships and in "greedy" loading of dependent entities.

Let's imagine that we are modeling a box and toys relationship. First, let's create a simple class to retrieve a certain number of boxes from the database:

1class Storage {
2    fun getBoxes(limit: Int): List<Box> {
3        return getEntityManager()
4            .createQuery("select b from Box b order by id",  
5                         Box::class.java)
6   		.setMaxResults(limit)
7   		.resultList
8    }
9
10    private fun getEntityManager(): EntityManager {
11        return Persistence.createEntityManagerFactory("persistence")
12            .createEntityManager()
13    }
14}
15
16fun main(args: Array<String>) {
17    val storage = Storage()
18    println(storage.getBoxes(4))
19}

Now let's model the toy box relationship. Let's assume that multiple toys can belong to one box. The database schema could look like this:

obraz1blogsages.webp

Here we see a one-to-many relationship between tables. Using JPA (Java Persistence API) in the Hibernate implementation, we can write this in Kotlin as follows:

1@Table(name = "box")
2@Entity
3data class Box(
4    @Id
5    @GeneratedValue(strategy = GenerationType.AUTO)
6    val id: Int,
7
8    @Column(name = "name", length = 50, nullable = false)
9    val name: String,
10
11    @OneToMany(fetch = FetchType.EAGER, cascade = [CascadeType.ALL])
12    @JoinColumn(name = "box_id")
13    val toys: List<Toy>
14)
15
16@Table(name = "toy")
17@Entity
18data class Toy(
19    @Id
20    @GeneratedValue(strategy = GenerationType.AUTO)
21    val id: Int,
22
23    @Column(name = "name", length = 50, nullable = false)
24    val name: String
25)

Let's execute a query from the main function to retrieve four boxes from the database and look at the queries executed by Hibernate:

1select box.id, box.name from box order by box.id limit 4
2select toy.box_id, toy.id, toy.name from toy where toy.box_id=4
3select toy.box_id, toy.id, toy.name from toy where toy.box_id=2
4select toy.box_id, toy.id, toy.name from toy where toy.box_id=1
5select toy.box_id, toy.id, toy.name from toy where toy.box_id=3

It is clear that 4 + 1 queries were executed to retrieve four boxes. First, Hibernate retrieved any four boxes, then for each box it executed one query to retrieve the toys belonging to it. This task could be successfully accomplished using only one query, changing the JPQL query itself:

1fun getBoxes(limit: Int): List<Box> {
2    return getEntityManager()
3        .createQuery("select b from Box b join fetch b.toys order by
4                      b.id", Box::class.java)
5	  .setMaxResults(limit)
6   	  .resultList
7}

Now, the number of queries sent to the database has been reduced to one:

1select box.id, toy.id, box.name, toy.name, toy.box_id, from box inner join toy on box.id=toy.box_id order by box.id

However, writing custom queries is not always the best way to perform this type of task. What about the situation when we use, for example, repositories provided by Spring Data? There is another approach to solving this problem, namely using the org.hibernate.annotations.BatchSize annotation, which we can find in the Hibernate ORM library Hibernate Core. We will apply this annotation by placing it above the toys field:

1@Table(name = "box")
2@Entity
3data class Box(
4    @Id
5    @GeneratedValue(strategy = GenerationType.AUTO)
6    val id: Int,
7
8    @Column(name = "name", length = 50, nullable = false)
9    val name: String,
10
11    @OneToMany(fetch = FetchType.EAGER, cascade = [CascadeType.ALL])
12    @JoinColumn(name = "box_id")
13    @BatchSize(size = 256)
14    val toys: List<Toy>
15)

Adding the @BatchSize annotation over the toys field makes it so that Hibernate will retrieve the toy data assigned to a given box in "batches"(batch), i.e. for up to 256 Box instances, Hibernate will retrieve their toys within a single query. Let's look at the queries that were generated for the first version of the getBoxes function:

1select box.id, box.name from box order by box.id limit 4
2select toy.box_id, toy.id, toy.name from toy where toy.box_id in 
3    (4, 1, 2, 3)

It is indisputable that the second query retrieves the entire "package" of boxes. If, on the other hand, the size of the package was limited to two**(@BatchSize(size = 2)**) then we would see two queries, each retrieving two items per package:

1select box.id, box.name from box order by box.id limit 4
2select toy.box_id, toy.id, toy.name from toy where toy.box_id in
3	(4, 1)
4select toy.box_id, toy.id, toy.name from toy where toy.box_id in
5	(2, 3)

Summary

One-to-one or one-to-many relationships are quite natural in relational database systems. For this reason, it is not uncommon to encounter this problem in applications that use Hibernate. The results presented here were obtained using the libraries:

  • Hibernate Core Relocation 5.4.24.Final,
  • Hibernate JPA 2.0 API 1.0.0.Final,
  • MySQL Connector/J 8.0.22, and from the MySQL database 5.7.25. The queries have been slightly simplified for readability, but their meaning and sense have been completely preserved.

Read also

Calendar icon

22 sierpień

A new era of knowledge management: Omega-PSIR at Kozminski University

Kozminski University in Warsaw, one of the leading universities in Poland, has been using the Omega-PSIR system we have implemented t...

Calendar icon

12 sierpień

What is Event-Driven Achitecture and why do you need it?

Event-Driven Architecture (EDA) is a modern approach to IT system design. Learn how EDA can impact your organization's growth!

Calendar icon

31 lipiec

How to use Rust with Python?

Learn how to integrate Rust and Python using PyO3 and Maturin. Learn how to write native Python modules in Rust and how to build and ...