java.lang.OutOfMemoryError:Java 堆空间 Hibernate entityIsPersistent

问题描述

我正在使用以下代码读取 5gb 的 XML 文件,并使用 spring dataJpa 将该数据处理到数据库中,以下只是我们正在关闭输入流和 xsr 对象的示例逻辑。

 XMLInputFactory xf=XMLInputFactory.newInstance();
 XMLStreamReader xsr=xf.createXMLStreamReader(new InputStreamReader(new FileInputStream("test.xml"))

我已经配置了最大 8GB(即 -xms7000m 和 -xmx8000m)的堆内存,但是在保存数据时遇到了以下休眠堆问题。它插入了大约 700000 个数据总共 2100000

[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Handler dispatch Failed; nested exception is java.lang.OutOfMemoryError: Java heap space] with root cause

java.lang.OutOfMemoryError: Java heap space
    at java.base/java.util.IdentityHashMap.resize(IdentityHashMap.java:472) ~[na:na]
    at java.base/java.util.IdentityHashMap.put(IdentityHashMap.java:441) ~[na:na]
    at org.hibernate.event.internal.DefaultPersistEventListener.entityIsPersistent(DefaultPersistEventListener.java:159) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.event.internal.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:124) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.internal.SessionImpl$$Lambda$1620/0x00000008010a3040.applyEventToListener(UnkNown Source) ~[na:na]
    at org.hibernate.event.service.internal.EventListenerGroupImpl.fireEventOnEachListener(EventListenerGroupImpl.java:113) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.internal.SessionImpl.persistOnFlush(SessionImpl.java:765) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.spi.CascadingActions$8.cascade(CascadingActions.java:341) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascadetoOne(Cascade.java:492) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascadeAssociation(Cascade.java:416) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascadeProperty(Cascade.java:218) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascadeCollectionElements(Cascade.java:525) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascadeCollection(Cascade.java:456) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascadeAssociation(Cascade.java:419) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascadeProperty(Cascade.java:218) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascade(Cascade.java:151) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.event.internal.AbstractFlushingEventListener.cascadeOnFlush(AbstractFlushingEventListener.java:158) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.event.internal.AbstractFlushingEventListener.prepareEntityFlushes(AbstractFlushingEventListener.java:148) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.event.internal.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:81) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:39) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.internal.SessionImpl$$Lambda$1597/0x0000000801076040.accept(UnkNown Source) ~[na:na]
    at org.hibernate.event.service.internal.EventListenerGroupImpl.fireEventOnEachListener(EventListenerGroupImpl.java:102) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.internal.SessionImpl.doFlush(SessionImpl.java:1362) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.internal.SessionImpl.managedFlush(SessionImpl.java:453) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.internal.SessionImpl.flushBeforeTransactionCompletion(SessionImpl.java:3212) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.internal.SessionImpl.beforeTransactionCompletion(SessionImpl.java:2380) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.beforeTransactionCompletion(JdbcCoordinatorImpl.java:447) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl.beforeCompletionCallback(JdbcResourceLocalTransactionCoordinatorImpl.java:183) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl.access$300(JdbcResourceLocalTransactionCoordinatorImpl.java:40) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.commit(JdbcResourceLocalTransactionCoordinatorImpl.java:281) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.transaction.internal.TransactionImpl.commit(TransactionImpl.java:101) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.springframework.orm.jpa.JpaTransactionManager.doCommit(JpaTransactionManager.java:534) ~[spring-orm-5.2.10.RELEASE.jar:5.2.10.RELEASE]

根据上面的跟踪日志,hibernate save casacade 似乎存在一些问题,但无法弄清楚,以下是用于将数据保存在数据库中的实体类

@Data
@EqualsAndHashCode
@Builder(toBuilder = true)
@NoArgsConstructor(access = AccessLevel.PRIVATE)
@AllArgsConstructor(access = AccessLevel.PRIVATE)
public class UMEnityPK implements Serializable {
    private static final long serialVersionUID=1L;

    private String batchId;
    private Long batchVersion;
    private BigInteger umId;
}

@Data
@Builder(toBuilder = true)
@NoArgsConstructor(access = AccessLevel.PRIVATE)
@AllArgsConstructor(access = AccessLevel.PRIVATE)
@EqualsAndHashCode(of = {"batchId","batchVersion","umId"})
@Entity
@Table(name ="um_base")
@IdClass(UMEnityPK.class)
public class UMBase {

    @Id private String batchId;
    @Id private Long batchVersion;
    @Id private BigInteger umId;

    private String firstName;
    private String lastName;
    private String umType;
    private String umLevel;

    @OnetoMany(mappedBy = "umBase",cascade = CascadeType.ALL)
    private List<UMAddress> umAddresses;

    @OnetoMany(mappedBy = "umBase",cascade = CascadeType.ALL)
    private List<UMIdentifier> umIdentifiers;

    @OnetoOne(mappedBy = "umBase",cascade = CascadeType.ALL)
    private UMHierarchy umHierarchy;
}

@Data
@Builder(toBuilder = true)
@NoArgsConstructor(access = AccessLevel.PRIVATE)
@AllArgsConstructor(access = AccessLevel.PRIVATE)
@EqualsAndHashCode(of = {"id"})
@Entity
@Table(name = "um_identifier")
public class UMIdentifier {
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE,generator = "um_address")
    @SequenceGenerator(name = "um_address",sequenceName = "SEQ_UM_ADDRESS",allocationSize = 1)
    private Long id;

    private String idValue;
    private String idType;
    private String groupType;

    @ManyToOne
    @JoinColumns({
            @JoinColumn(name = "BATCH_ID",referencedColumnName = "batchId"),@JoinColumn(name = "BATCH_VERSION",referencedColumnName = "batchVersion"),@JoinColumn(name = "UM_ID",referencedColumnName = "umId")
    })
    private UMBase umBase;
}

@Data
@Builder(toBuilder = true)
@NoArgsConstructor(access = AccessLevel.PRIVATE)
@AllArgsConstructor(access = AccessLevel.PRIVATE)
@EqualsAndHashCode(of = {"batchId","umId"})
@Entity
@Table(name ="um_hierarchy")
public class UMHierarchy {

    @Id
    private String batchId;
    @Id private Long batchVersion;
    @Id private BigInteger umId;

    private String hierarchyTpe;
    private String umStatusCode;
    private String immediateParentId;
    private Date hierarchyDate;

    @OnetoOne(cascade = CascadeType.ALL)
    @JoinColumns({
            @JoinColumn(name = "BATCH_ID",referencedColumnName = "umId")
    })
    private UMBase umBase;
}

@Data
@Builder(toBuilder = true)
@NoArgsConstructor(access = AccessLevel.PRIVATE)
@AllArgsConstructor(access = AccessLevel.PRIVATE)
@EqualsAndHashCode(of = {"id"})
@Entity
@Table(name = "um_address")
public class UMAddress {
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE,allocationSize = 1)
    private Long id;
    private String addresstype;
    private String addressLine1;
    private String getAddressLine2;
    private String city;
    private String state;
    private String postalCode;
    private String country;

    @ManyToOne
    @JoinColumns({
            @JoinColumn(name = "BATCH_ID",referencedColumnName = "umId")
    })
    private UMBase umBase;
}

休眠实体映射是否存在占用内存的问题

解决方法

检查堆转储后,问题出在 org.hibernate.engine.StatefulPersistenceContext -> org.hibernate.util.IdentityMap 内存泄漏,因此使用以下方法并正常工作,创建自定义 JPARepository 并具有以下示例方法逻辑。

public <S extends T> void saveInBatch(Iterable<S> entities) {

        if (entities == null) {
            return;
        }

        EntityManager entityManager = entityManagerFactory.createEntityManager();
        EntityTransaction entityTransaction = entityManager.getTransaction();

        try {
            entityTransaction.begin();

            int i = 0;
            for (S entity : entities) {
                if (i % batchSize == 0 && i > 0) {
                    entityTransaction.commit();
                    entityTransaction.begin();

                    entityManager.clear();
                }

                entityManager.persist(entity);
                i++;
            }
            entityTransaction.commit();
        } catch (RuntimeException e) {
            if (entityTransaction.isActive()) {
                entityTransaction.rollback();
            }

            throw e;
        } finally {
            entityManager.close();
        }
    }
}
,

在处理转换如此大的数据集时,您需要分批进行。从 xml 中读取 100 条记录,将它们转换为实体,用 em.persist(record) 保存每个记录,然后调用 em.flush()em.clear() 将它们从 Hibernate 中删除,然后从本地集合中清除它们,然后使用 System.gc() 手动调用垃圾收集器。您甚至可能希望使用 Hibernate 的批处理,如 in this tutorial 所述。

在伪代码中,这将是:

boolean finished = false;
List<Entity> locals = new ArrayList<>(100);
while (!finished) {
  for (int records = 0; records < 100; records++) {
    Entity ent = readEntityFrom(xml);
    // readEntity function must return null when no more remain to read
    if (ent == null) {
      finished = true;
      break;
    }
    locals.add(ent);
  }
  for (Entity ent : locals) em.persist(ent);
  em.flush(); // send any that are still waiting to the database
  em.clear(); // remove references Hibernate holds to these entities
  locals.clear(); // remove references we hold to these entities
  // now all these entity references are weak and can be garbage-collected
  System.gc(); // purge them from memory
}

此外,您可能希望围绕每个插入循环手动开始和提交事务,以确保数据库不会保存整个导入的所有内容,否则可能会耗尽内存而不是 Java 应用程序。

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...