导入巨大的550000+行CSV文件到Access

ffscu2ro  于 2023-09-27  发布在  其他
关注(0)|答案(7)|浏览(228)

我有一个CSV文件,有550,000+行。
我需要将此数据导入Access,但当我尝试时,它会抛出文件太大(1.7GB)的错误。
你能推荐一种将此文件导入Access的方法吗?

pgvzfuti

pgvzfuti1#

尝试链接而不是导入(2003年的“获取外部数据”->“链接表”),将数据留在CSV文件中,并直接从文件中读取。它不限制大小(至少不接近1.7 GB)。它可能会限制你的一些读取/更新操作,但它至少会让你开始。

gkl3eglg

gkl3eglg2#

我要么尝试CSVODBC连接器,要么首先将其导入到一个不太受限的数据库(MySQL、SQLServer)中,然后从那里导入。
似乎某些版本的访问对MDB文件有2GB的硬性限制,所以无论如何你都可能会遇到麻烦。
祝你好运

fjnneemd

fjnneemd3#

您也可以使用ETL工具。Kettle是一个开源软件(http://kettle.pentaho.org/),非常容易使用。要将文件导入到数据库中,需要一次转换,包括2个步骤:CSV文本输入和表格输出。

t2a7ltrp

t2a7ltrp4#

你为什么要使用访问巨大的文件?使用sqlexpress或firebird代替

nbnkbykc

nbnkbykc5#

我记得Access在2 Go左右有一些大小限制。使用免费的SQLExpress(限制为4 Go)或免费的MySQL(没有大小限制)可能更容易。

p4rjhz4m

p4rjhz4m6#

另一种选择是放弃标准的导入函数,编写自己的函数。我以前曾经这样做过一次,当时需要在导入之前对数据应用一些特定的逻辑。基本结构是……
打开文件获取第一行循环到行尾如果我们找到逗号然后移动到下一个字段将记录放入数据库获取下一行重复等
我将其 Package 到每100行提交一次的事务中,因为我发现在我的情况下提高了性能,但如果有帮助,则取决于您的数据。
然而,我会说,链接的数据,因为其他人说是最好的解决方案,这只是一个选择,如果你绝对必须有数据在访问

jslywgbw

jslywgbw7#

访问会产生大量开销,因此即使是相对较小的数据集也会使文件膨胀到2GB,然后它将关闭。这里有几种简单的导入方法。我没有在大文件上测试过,但这些概念在普通文件上肯定有效。

Import data from a closed workbook (ADO)

If you want to import a lot of data from a closed workbook you can do this with ADO and the macro below. If you want to retrieve data from another worksheet than the first worksheet in the closed workbook, you have to refer to a user defined named range. The macro below can be used like this (in Excel 2000 or later):
GetDataFromClosedWorkbook "C:\FolderName\WorkbookName.xls", "A1:B21", ActiveCell, False
GetDataFromClosedWorkbook "C:\FolderName\WorkbookName.xls", "MyDataRange", Range ("B3"), True

Sub GetDataFromClosedWorkbook(SourceFile As String, SourceRange As String, _
    TargetRange As Range, IncludeFieldNames As Boolean)
' requires a reference to the Microsoft ActiveX Data Objects library
' if SourceRange is a range reference:
'   this will return data from the first worksheet in SourceFile
' if SourceRange is a defined name reference:
'   this will return data from any worksheet in SourceFile
' SourceRange must include the range headers
'
Dim dbConnection As ADODB.Connection, rs As ADODB.Recordset
Dim dbConnectionString As String
Dim TargetCell As Range, i As Integer
    dbConnectionString = "DRIVER={Microsoft Excel Driver (*.xls)};" & _
        "ReadOnly=1;DBQ=" & SourceFile
    Set dbConnection = New ADODB.Connection
    On Error GoTo InvalidInput
    dbConnection.Open dbConnectionString ' open the database connection
    Set rs = dbConnection.Execute("[" & SourceRange & "]")
    Set TargetCell = TargetRange.Cells(1, 1)
    If IncludeFieldNames Then
        For i = 0 To rs.Fields.Count - 1
            TargetCell.Offset(0, i).Formula = rs.Fields(i).Name
        Next i
        Set TargetCell = TargetCell.Offset(1, 0)
    End If
    TargetCell.CopyFromRecordset rs
    rs.Close
    dbConnection.Close ' close the database connection
    Set TargetCell = Nothing
    Set rs = Nothing
    Set dbConnection = Nothing
    On Error GoTo 0
    Exit Sub
InvalidInput:
    MsgBox "The source file or source range is invalid!", _
        vbExclamation, "Get data from closed workbook"
End Sub

Another method that doesn't use the CopyFromRecordSet-method

With the macro below you can perform the import and have better control over the results returned from the RecordSet.

Sub TestReadDataFromWorkbook()
' fills data from a closed workbook in at the active cell
Dim tArray As Variant, r As Long, c As Long
    tArray = ReadDataFromWorkbook("C:\FolderName\SourceWbName.xls", "A1:B21")
    ' without using the transpose function
    For r = LBound(tArray, 2) To UBound(tArray, 2)
        For c = LBound(tArray, 1) To UBound(tArray, 1)
            ActiveCell.Offset(r, c).Formula = tArray(c, r)
        Next c
    Next r
    ' using the transpose function (has limitations)
'    tArray = Application.WorksheetFunction.Transpose(tArray)
'    For r = LBound(tArray, 1) To UBound(tArray, 1)
'        For c = LBound(tArray, 2) To UBound(tArray, 2)
'            ActiveCell.Offset(r - 1, c - 1).Formula = tArray(r, c)
'        Next c
'    Next r
End Sub

Private Function ReadDataFromWorkbook(SourceFile As String, SourceRange As String) As Variant
' requires a reference to the Microsoft ActiveX Data Objects library
' if SourceRange is a range reference:
'   this function can only return data from the first worksheet in SourceFile
' if SourceRange is a defined name reference:
'   this function can return data from any worksheet in SourceFile
' SourceRange must include the range headers
' examples:
' varRecordSetData = ReadDataFromWorkbook("C:\FolderName\SourceWbName.xls", "A1:A21")
' varRecordSetData = ReadDataFromWorkbook("C:\FolderName\SourceWbName.xls", "A1:B21")
' varRecordSetData = ReadDataFromWorkbook("C:\FolderName\SourceWbName.xls", "DefinedRangeName")
Dim dbConnection As ADODB.Connection, rs As ADODB.Recordset
Dim dbConnectionString As String
    dbConnectionString = "DRIVER={Microsoft Excel Driver (*.xls)};ReadOnly=1;DBQ=" & SourceFile
    Set dbConnection = New ADODB.Connection
    On Error GoTo InvalidInput
    dbConnection.Open dbConnectionString ' open the database connection
    Set rs = dbConnection.Execute("[" & SourceRange & "]")
    On Error GoTo 0
    ReadDataFromWorkbook = rs.GetRows ' returns a two dim array with all records in rs
    rs.Close
    dbConnection.Close ' close the database connection
    Set rs = Nothing
    Set dbConnection = Nothing
    On Error GoTo 0
    Exit Function
InvalidInput:
    MsgBox "The source file or source range is invalid!", vbExclamation, "Get data from closed workbook"
    Set rs = Nothing
    Set dbConnection = Nothing
End Function

对于非常大的文件,您可以尝试这样做。...

INSERT INTO [Table] (Column1, Column2)
SELECT *
FROM [Excel 12.0 Xml;HDR=No;Database=C:\your_path\excel.xlsx].[SHEET1$];

SELECT * INTO [NewTable]
FROM [Excel 12.0 Xml;HDR=No;Database=C:\your_path\excel.xlsx].[SHEET1$];

相关问题