使用JPA和Hibernate时DISTINCT是如何工作的

8gsdolmq  于 2022-11-24  发布在  其他
关注(0)|答案(7)|浏览(179)

DISTINCT在JPA中使用什么列?可以更改它吗?
下面是一个使用DISTINCT的JPA查询示例:

select DISTINCT c from Customer c

这没有多大意义--distinct基于什么列?是因为我找不到注解而在实体上将其指定为注解吗?
我想指定要进行区分的列,如下所示:

select DISTINCT(c.name) c from Customer c

我使用MySQL和Hibernate。

0md85ypi

0md85ypi1#

你很接近了。

select DISTINCT(c.name) from Customer c
kuuvgm7e

kuuvgm7e2#

根据底层JPQL或Criteria API查询类型,DISTINCT在JPA中有两种含义。

标量查询

对于返回标量投影的标量查询,如以下查询:

List<Integer> publicationYears = entityManager.createQuery("""
    select distinct year(p.createdOn)
    from Post p
    order by year(p.createdOn)
    """, Integer.class)
.getResultList();

LOGGER.info("Publication years: {}", publicationYears);

应该将DISTINCT关键字传递给底层SQL语句,因为我们希望DB引擎在返回结果集之前过滤重复项:

SELECT DISTINCT
    extract(YEAR FROM p.created_on) AS col_0_0_
FROM
    post p
ORDER BY
    extract(YEAR FROM p.created_on)

-- Publication years: [2016, 2018]

休眠6
Hibernate 6可以自动删除父实体引用的重复项,因此不需要像Hibernate 5那样使用DISTINCT关键字。
因此,在运行以下查询时:

List<Post> posts = entityManager.createQuery("""
    select p
    from Post p
    left join fetch p.comments
    where p.title = :title
    """, Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();

assertEquals(1, posts.size());
assertEquals(2, posts.get(0).getComments().size());

我们可以看到,提取了单个Post实体,即使它有两个关联的PostComment子实体。

Hibernate 5实体查询

在JPA中,对于实体查询,DISTINCT具有不同的含义。
如果不使用DISTINCT,则执行如下查询:

List<Post> posts = entityManager.createQuery("""
    select distinct p
    from Post p
    left join fetch p.comments
    where p.title = :title
    """, Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

将对postpost_comment表执行JOIN操作,如下所示:

SELECT p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1, 1]

但是,每个相关post_comment数据列的结果集中,父项post记录都是重复的。因此,Post实体的List将包含重复的Post实体指涉。
为了消除Post实体引用,我们需要使用DISTINCT

List<Post> posts = entityManager.createQuery("""
    select distinct p
    from Post p
    left join fetch p.comments
    where p.title = :title
    """, Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();
 
LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

但是DISTINCT也被传递给SQL查询,这是完全不可取的:

SELECT DISTINCT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'
 
-- Fetched the following Post entity identifiers: [1]

通过将DISTINCT传递给SQL查询,EXECUTION PLAN将执行一个额外的Sort阶段,该阶段会增加开销,但不会带来任何值,因为父子组合始终返回唯一的记录,这是因为存在子PK列:

Unique  (cost=23.71..23.72 rows=1 width=1068) (actual time=0.131..0.132 rows=2 loops=1)
  ->  Sort  (cost=23.71..23.71 rows=1 width=1068) (actual time=0.131..0.131 rows=2 loops=1)
        Sort Key: p.id, pc.id, p.created_on, pc.post_id, pc.review
        Sort Method: quicksort  Memory: 25kB
        ->  Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.054..0.058 rows=2 loops=1)
              Hash Cond: (pc.post_id = p.id)
              ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.010..0.010 rows=2 loops=1)
              ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.027..0.027 rows=1 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.017..0.018 rows=1 loops=1)
                          Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
                          Rows Removed by Filter: 3
Planning time: 0.227 ms
Execution time: 0.179 ms

使用HINT_PASS_DISTINCT_THROUGH执行Hibernate 5实体查询

要从执行计划中消除排序阶段,我们需要使用HINT_PASS_DISTINCT_THROUGH JPA查询提示:

List<Post> posts = entityManager.createQuery("""
    select distinct p
    from Post p
    left join fetch p.comments
    where p.title = :title
    """, Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.setHint(QueryHints.HINT_PASS_DISTINCT_THROUGH, false)
.getResultList();
 
LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

现在,SQL查询将不包含DISTINCT,但将删除Post实体引用重复项:

SELECT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'
 
-- Fetched the following Post entity identifiers: [1]

执行计划将确认这次我们不再有额外的排序阶段:

Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.066..0.069 rows=2 loops=1)
  Hash Cond: (pc.post_id = p.id)
  ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.011..0.011 rows=2 loops=1)
  ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.041..0.041 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.036..0.037 rows=1 loops=1)
              Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
              Rows Removed by Filter: 3
Planning time: 1.184 ms
Execution time: 0.160 ms

如果您使用的是Hibernate 6,则不再需要QueryHints.HINT_PASS_DISTINCT_THROUGH,应该在从框架中删除它时将其删除。

0yycz8jy

0yycz8jy3#

更新:请查看得票最多的答案。
我自己的版本已过时。仅出于历史原因保留在此处。

在HQL中,Distinct通常在连接中需要,而不是在像您自己的示例这样的简单示例中。
另请参阅How do you create a Distinct query in HQL

yws3nbqq

yws3nbqq4#

@Entity
@NamedQuery(name = "Customer.listUniqueNames", 
            query = "SELECT DISTINCT c.name FROM Customer c")
public class Customer {
        ...

        private String name;

        public static List<String> listUniqueNames() {
             return = getEntityManager().createNamedQuery(
                   "Customer.listUniqueNames", String.class)
                   .getResultList();
        }
}
ltskdhd1

ltskdhd15#

我同意 kazanaki 的答案,这对我很有帮助。我想选择整个实体,所以我使用

select DISTINCT(c) from Customer c

在我的例子中,我有多对多关系,我想在一个查询中加载带有集合的实体。
我使用了LEFT JOIN FETCH,最后我必须使结果清晰。

0h4hbjxa

0h4hbjxa6#

我会使用JPA的构造函数表达式特性。
JPQL Constructor Expression - org.hibernate.hql.ast.QuerySyntaxException:Table is not mapped
按照问题中的例子,应该是这样的。

SELECT DISTINCT new com.mypackage.MyNameType(c.name) from Customer c
a0zr77ik

a0zr77ik7#

我添加了一个稍微具体的答案,以防有人遇到与我相同的问题并发现这个问题。
我使用了带有查询注解的JPQL(没有查询构建),并且我需要为一个实体获取不同的值,该实体是 * 嵌入 * 到另一个实体中的,关系是通过多对一注解Assert的。
我有两个数据库表:

*MainEntity,我希望它具有不同的值
*LinkEntity,这是MainEntity与另一个表之间的关系表。它有一个由其三列组成的复合主键。

在Java Spring代码中,这将导致实现三个类:

链接实体:

@Entity
@Immutable
@Table(name="link_entity")
public class LinkEntity implements Entity {

    @EmbeddedId
    private LinkEntityPK pk;

    // ... Getter, setter, toString()
}

链接实体主键:

@Embeddable
public class LinkEntityPK implements Entity, Serializable {

    /** The main entity we want to have distinct values of */
    @ManyToOne
    @JoinColumn(name = "code_entity")
    private MainEntity mainEntity;

    /** */
    @Column(name = "code_pk2")
    private String codeOperation;

    /** */
    @Column(name = "code_pk3")
    private String codeFonction;

主要实体:

@Entity
@Immutable
@Table(name = "main_entity")
public class MainEntity implements Entity {

    /** We use this for LinkEntity*/
    @Id
    @Column(name="code_entity")
    private String codeEntity;

    private String name;
    // And other attributes, getters and setters

因此,获取主实体的不同值的最终查询是:

@Repository
public interface EntityRepository extends JpaRepository<LinkEntity, String> {

    @Query(
        "Select " +
            "Distinct linkEntity.pk.intervenant " +
        "From " +
            "LinkEntity as linkEntity " +
            "Join MainEntity as mainEntity On " +
                 "mainEntity = linkEntity.pk.mainEntity ")
    List<MainEntity> getMainEntityList();

}

希望这能对某人有所帮助。

相关问题