MindIQ Academy

04 - Spring Data JPA Notes

A beginner-to-advanced guide to Spring Data JPA, entity lifecycle, repositories, query methods, fetch strategies, cascade types, and Hibernate integration for Spring Professional Certification candidates. Covers Spring Boot 3 and Hibernate 6 concepts.


Table of Contents

  1. Entity Lifecycle
  2. Fetch Strategies
  3. Cascade Types
  4. Repository Abstraction
  5. Query Methods
  6. Interview Questions
  7. Cheat Sheet

1. Entity Lifecycle

An entity lifecycle describes the state of a JPA entity instance in relation to the persistence context.

Lifecycle States

StateMeaningExample
TransientObject is not associated with a persistence context and has no database identitynew User()
ManagedObject is associated with the persistence context; changes are tracked automaticallyEntity returned by findById()
DetachedObject was managed earlier but is no longer attached to a persistence contextEntity after transaction/session closes
RemovedObject is scheduled for deletion from the databaseEntity passed to delete()

State Transitions

OperationTransition
persist() / save() for a new entityTransient to managed
Entity lookupDatabase row to managed entity
Transaction commit / flushManaged changes synchronized to database
detach() / persistence context closeManaged to detached
merge()Detached state copied into a managed entity
remove() / delete()Managed to removed

Dirty Checking

Dirty checking is the automatic detection of changes made to managed entities. If a managed entity is modified inside a transaction, JPA synchronizes those changes to the database during flush or commit.

@Transactional
public void updateEmail(Long id, String email) {
    User user = userRepository.findById(id).orElseThrow();
    user.setEmail(email);
}

No explicit save() is required here because user is managed.

Flush

Flush synchronizes the persistence context with the database, but it does not necessarily commit the transaction.

Common flush triggers:

  • Transaction commit
  • Explicit flush()
  • Before query execution, depending on flush mode

2. Fetch Strategies

Fetch strategy controls when associated entities are loaded from the database.

Lazy Fetching

Lazy fetching loads an association only when it is accessed.

@OneToMany(mappedBy = "department", fetch = FetchType.LAZY)
private List<Employee> employees;

Advantages:

  • Better initial query performance
  • Avoids loading unnecessary associations
  • Usually preferred for collections

Risks:

  • LazyInitializationException when accessed outside an active persistence context
  • N+1 query problem when associations are accessed repeatedly in loops

Eager Fetching

Eager fetching loads an association immediately with the owning entity.

@ManyToOne(fetch = FetchType.EAGER)
private Department department;

Advantages:

  • Association is available immediately
  • Can avoid lazy loading issues for small, always-needed relationships

Risks:

  • Loads data even when not needed
  • Can create large joins and performance problems
  • Can accidentally fetch deep object graphs

Default Fetch Types

AssociationDefault Fetch Type
@OneToOneEAGER
@ManyToOneEAGER
@OneToManyLAZY
@ManyToManyLAZY

Handling N+1 Queries

N+1 occurs when one query loads parent records and then one additional query is executed for each parent to load children.

Common solutions:

  • Use JOIN FETCH
  • Use @EntityGraph
  • Use DTO projections
  • Tune batch fetching with Hibernate-specific settings
@Query("select d from Department d join fetch d.employees where d.id = :id")
Optional<Department> findByIdWithEmployees(Long id);

3. Cascade Types

Cascade defines which entity operations should propagate from a parent entity to its associated child entities.

Cascade TypeMeaning
PERSISTPropagates persist operation
MERGEPropagates merge operation
REMOVEPropagates remove operation
REFRESHPropagates refresh operation
DETACHPropagates detach operation
ALLIncludes all cascade operations

Example

@OneToMany(mappedBy = "order", cascade = CascadeType.ALL, orphanRemoval = true)
private List<OrderItem> items = new ArrayList<>();

When the Order is saved, updated, or deleted, the related OrderItem entities are affected as well.

Cascade vs Orphan Removal

FeaturePurpose
Cascade removeDeletes child entities when the parent is deleted
Orphan removalDeletes child entities when they are removed from the parent collection

Use orphanRemoval = true when a child should not exist without its parent.

Best Practices

  • Use cascades carefully on aggregate boundaries.
  • Avoid CascadeType.REMOVE on @ManyToMany.
  • Prefer CascadeType.ALL only when child lifecycle is fully owned by the parent.
  • Always keep both sides of bidirectional relationships in sync.

4. Repository Abstraction

Spring Data JPA repositories reduce boilerplate data access code by generating implementations at runtime.

Common Repository Interfaces

InterfaceDescription
Repository<T, ID>Marker interface
CrudRepository<T, ID>Basic CRUD operations
PagingAndSortingRepository<T, ID>Pagination and sorting support
JpaRepository<T, ID>JPA-specific operations and batch methods

Example

public interface UserRepository extends JpaRepository<User, Long> {
}

This provides methods such as:

  • save(entity)
  • findById(id)
  • findAll()
  • delete(entity)
  • count()
  • existsById(id)
  • flush()

Pagination and Sorting

Page<User> page = userRepository.findAll(PageRequest.of(0, 20, Sort.by("name")));

Custom Repository Methods

Use custom repository implementations when query methods or @Query are not enough.

public interface UserRepositoryCustom {
    List<User> findActivePremiumUsers();
}

5. Query Methods

Spring Data JPA can derive queries from repository method names.

Derived Query Examples

List<User> findByLastName(String lastName);

Optional<User> findByEmail(String email);

List<User> findByAgeGreaterThan(int age);

List<User> findByStatusAndCreatedAtAfter(Status status, LocalDateTime createdAt);

boolean existsByEmail(String email);

long countByStatus(Status status);

void deleteByStatus(Status status);

Common Keywords

KeywordExample
AndfindByStatusAndType
OrfindByStatusOrType
BetweenfindByCreatedAtBetween
LessThanfindByAgeLessThan
GreaterThanfindByAgeGreaterThan
LikefindByNameLike
ContainingfindByNameContaining
StartingWithfindByNameStartingWith
EndingWithfindByNameEndingWith
IsNullfindByDeletedAtIsNull
IsNotNullfindByDeletedAtIsNotNull
InfindByStatusIn
OrderByfindByStatusOrderByCreatedAtDesc

JPQL with @Query

@Query("select u from User u where u.status = :status")
List<User> findUsersByStatus(@Param("status") Status status);

Native Query

@Query(value = "select * from users where email = :email", nativeQuery = true)
Optional<User> findByEmailNative(@Param("email") String email);

Modifying Query

@Modifying
@Transactional
@Query("update User u set u.status = :status where u.id = :id")
int updateStatus(Long id, Status status);

Projections

Interface-based projection:

public interface UserSummary {
    String getName();
    String getEmail();
}

List<UserSummary> findByStatus(Status status);

DTO projection:

@Query("select new com.example.UserDto(u.id, u.name) from User u")
List<UserDto> findUserDtos();

6. Interview Questions

1. What is the difference between JPA, Hibernate, and Spring Data JPA?

JPA is a specification for ORM in Java. Hibernate is an implementation of the JPA specification. Spring Data JPA is an abstraction over JPA that reduces boilerplate repository code.

2. What are the entity lifecycle states?

The main states are transient, managed, detached, and removed.

3. What is dirty checking?

Dirty checking is the automatic detection and persistence of changes made to managed entities inside a transaction.

4. What is the difference between persist() and merge()?

persist() makes a new transient entity managed. merge() copies the state of a detached entity into a managed entity and returns that managed instance.

5. What is the N+1 query problem?

N+1 occurs when one query loads a list of parent entities and then one additional query is executed for each parent to load related data.

6. How can N+1 be fixed?

Use fetch joins, entity graphs, DTO projections, or batch fetching.

7. What is the difference between lazy and eager loading?

Lazy loading loads associations when accessed. Eager loading loads associations immediately with the owning entity.

8. What are default fetch types in JPA?

@OneToOne and @ManyToOne are eager by default. @OneToMany and @ManyToMany are lazy by default.

9. What is cascade in JPA?

Cascade propagates entity operations from one entity to associated entities.

10. What is the difference between cascade remove and orphan removal?

Cascade remove deletes children when the parent is deleted. Orphan removal deletes a child when it is removed from the parent relationship.

11. Why should CascadeType.REMOVE be avoided on many-to-many relationships?

Because both sides are independent aggregate roots. Removing one entity should usually delete only join table rows, not the related entity itself.

12. What is the persistence context?

The persistence context is the first-level cache where managed entities are tracked by the entity manager.

13. What is the first-level cache?

The first-level cache is the persistence-context-level cache. Within the same persistence context, the same entity ID maps to the same managed object instance.

14. What is the difference between getReferenceById() and findById()?

findById() immediately queries the database and returns an Optional. getReferenceById() returns a lazy proxy and may query the database only when the proxy is accessed.

15. What is the purpose of @Transactional?

@Transactional defines a transaction boundary. It allows multiple database operations to succeed or fail as one unit and keeps managed entities attached during the transaction.

16. What is the difference between JPQL and native SQL?

JPQL works with entity names and fields. Native SQL works directly with database tables and columns.

17. What are projections?

Projections allow queries to return selected fields instead of full entity objects.

18. What is optimistic locking?

Optimistic locking uses a version field, usually annotated with @Version, to detect concurrent updates.

@Version
private Long version;

19. What is the difference between save() and saveAndFlush()?

save() persists or merges an entity and flushes later. saveAndFlush() immediately synchronizes pending changes to the database.

20. When should custom repository implementations be used?

Use custom repositories when derived methods, specifications, query annotations, or projections are not expressive enough.

7. Cheat Sheet

Entity Lifecycle

ConceptQuick Note
TransientNew object, not tracked
ManagedTracked by persistence context
DetachedPreviously tracked, now outside context
RemovedScheduled for deletion
Dirty checkingAuto-persist changes to managed entities
FlushSynchronizes persistence context with database

Fetching

TopicQuick Note
LazyLoad on access
EagerLoad immediately
Collection defaultLazy
To-one defaultEager
N+1 fixFetch join, entity graph, projection, batching

Cascades

CascadePropagates
PERSISTSave new child
MERGEMerge detached child
REMOVEDelete child
REFRESHReload child
DETACHDetach child
ALLAll cascade operations

Repositories

MethodPurpose
save()Insert or update
findById()Find by primary key
findAll()Find all rows
delete()Delete entity
existsById()Check existence
count()Count rows
flush()Force synchronization

Query Method Patterns

PatternExample
EqualityfindByEmail
Multiple conditionsfindByStatusAndType
RangefindByCreatedAtBetween
ComparisonfindByAgeGreaterThan
Null checkfindByDeletedAtIsNull
Collection matchfindByStatusIn
SortingfindByStatusOrderByCreatedAtDesc

Annotations

AnnotationPurpose
@EntityMarks a JPA entity
@IdPrimary key
@GeneratedValuePrimary key generation
@TableMaps entity to table
@ColumnMaps field to column
@OneToOneOne-to-one relationship
@ManyToOneMany-to-one relationship
@OneToManyOne-to-many relationship
@ManyToManyMany-to-many relationship
@JoinColumnForeign key column
@JoinTableJoin table mapping
@QueryCustom JPQL/native query
@ModifyingUpdate/delete query
@TransactionalTransaction boundary
@VersionOptimistic locking

Best Practices

  • Prefer lazy loading by default.
  • Use DTO projections for read-heavy screens.
  • Avoid exposing entities directly from REST APIs.
  • Keep transactions short.
  • Do not use CascadeType.ALL unless the parent truly owns the child lifecycle.
  • Avoid bidirectional relationships unless needed.
  • Use fetch joins or entity graphs intentionally, not globally.
  • Monitor SQL logs when tuning performance.