将大型.TXT文件导入laravel中的数据库的最佳方法

问题描述

我正在将awstats文件导入数据库我有大量文件,最大文件大小为10MB,大文件中为200K行。该文件分为几个部分,示例之一如下:

BEGIN_GENERAL 8
LastLine 20150101000000 1379198 369425288 17319453580950
FirstTime 20141201000110
LastTime 20141231235951
LastUpdate 20150101000142 12317 0 12316 0 0
TotalVisits 146425              
TotalUnique 87968               
MonthHostsKNown 0                   
MonthHostsUnkNown 103864              
END_GENERAL

这是一个包含小数据的小部分。有很大的部分包含数千行。 我正在为此项目使用Laravel和MysqL,并在表中以JSON格式保存节。 这是将文件数据保存在数据库中的控制器代码

<?PHP

namespace App\Http\Controllers;

use Validator;
use App\Models\Site;
use Illuminate\Http\Request;
use App\Helpers\AwstatsDataParser;
use App\Jobs\ProcessNewSiteStats;

class SiteController extends Controller
{
    private $dir_path;

    public function __construct(){
        $this->dir_path = config('settings.files_path');
    }
/**
     * Store a newly created resource in storage.
     *
     * @param  \Illuminate\Http\Request  $request
     * @return \Illuminate\Http\Response
     */
    public function store(Request $request)
    {
        $request->validate([
            'title' => 'required|string|','domain' => 'required|regex:/(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]/i|unique:sites,domain',]);


        $site = Site::create([
            'title' => $request->title,'domain' => $request->domain,'status' => true,]);
    
        ProcessNewSiteStats::dispatch($site);            
        return back()->with('success','Site is created Successfully');
    }
}

此控制器的功能可以保存站点,并运行作业,并将当前月份文件的数据导入数据库

namespace App\Jobs;

use Illuminate\Bus\Queueable;
use Illuminate\Queue\SerializesModels;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\dispatchable;
use App\Models\Site;
use App\Models\Webstat;
use App\Helpers\AwstatsDataParser;

class ProcessNewSiteStats implements ShouldQueue
{
    use dispatchable,InteractsWithQueue,Queueable,SerializesModels;

    private $site;
    private $dir_path;


    /**
     * Create a new job instance.
     *
     * @return void
     */
    public function __construct(Site $site)
    {
        $this->site = $site;
        $this->dir_path = config('settings.files_path');
    }

    /**
     * Execute the job.
     *
     * @return void
     */
    public function handle()
    {
        if (is_dir($this->dir_path)) {
            $year = date('Y');
            $month = date('m');

            $fileName = "awstats{$month}{$year}.{$this->site->domain}.txt";

            $files_path = "{$this->dir_path}/$fileName";

            if (file_exists($files_path)) {

                $parser = new awstatsDataParser($files_path);
                $time = collect($parser->TIME);

                $webstat = Webstat::where('file_name',$fileName)->first();

                if(!$webstat){

                    $data = [
                        'file_name' => $fileName,'month' => $month,'year' => $year,'total_visits' => $parser->GENERAL['TotalVisits'],'total_unique' => $parser->GENERAL['TotalUnique'],'total_hosts_kNown' => $parser->GENERAL['MonthHostsKNown'],'total_hosts_unkNown' => $parser->GENERAL['MonthHostsUnkNown'],'page_count' => $time->sum('Pages'),'hit_count' => $time->sum('Hits'),'bandwidth_count' => $time->sum('Bandwidth'),'not_viewed_page_count' => $time->sum('NotViewedPages'),'not_viewed_hit_count' => $time->sum('NotViewedHits'),'not_viewed_bandwidth_count' => $time->sum('NotViewedBandwidth'),'general' => $parser->GENERAL,'time' => $parser->TIME,'day' => $parser->DAY,'login' => $parser->LOGIN,'robot' => $parser->ROBOT,'worms' => $parser->WORMS,'email_sender' => $parser->EMAILSENDER,'email_receiver' => $parser->EMAILRECEIVER,'sider' => $parser->SIDER,'domain' => $parser->DOMAIN,'session' => $parser->SESSION,'file_types' => $parser->FILETYPES,'visitor' => $parser->VISITOR,'downloads' => $parser->DOWNLOADS,'os' => $parser->OS,'browser' => $parser->broWSER,'screen_size' => $parser->SCREENSIZE,'unkNown_referer' => $parser->UNKNowNREFERER,'unkNown_referer_browser' => $parser->UNKNowNREFERERbroWSER,'origin' => $parser->ORIGIN,'se_referrals' => $parser->SEREFERRALS,'page_refs' => $parser->PAGEREFS,'search_words' => $parser->SEARCHWORDS,'keywords' => $parser->KEYWORDS,'misc' => $parser->MISC,'errors' => $parser->ERRORS,'cluster' => $parser->CLUSTER,'sider_404' => $parser->SIDER_404,'plugin_geoip_city_maxmind' => json_encode($parser->PLUGIN_geoip_city_maxmind,JSON_INVALID_UTF8_SUBSTITUTE),'is_sync' => true,];


                    $webstats = $this->site->webstats()->create($data);

                }

            }

        }
    }
}

代码对于小型文件运行良好,但是对于大型数据却运行不佳。通常我会收到有关MysqL服务器消失的错误错误类型为max_allocated_pa​​ckage。

我做了以下改进:

  • 分段保存数据(例如将数据分为三部分,然后保存所需的数据,然后通过剩余数据更新行)
  • 增加内存大小,执行时间等

但是我正在寻找一种保存它们的合适方法,在此基础上,我必须编写调度程序和其他一些作业,这些作业可以在一个请求下导入许多文件。如果有人对这个问题有好的想法或建议,那就太好了。

谢谢

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...