スクレイピングしたデータをデータベースに保存する方法

1. マイグレーションの作成

まず、保存するデータ用のテーブルを作成します。

bash

php artisan make:migration create_scraped_data_table

database/migrations/YYYY_MM_DD_create_scraped_data_table.php に以下を記述します。

php

<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration {
    public function up()
    {
        Schema::create('scraped_data', function (Blueprint $table) {
            $table->id();
            $table->string('title')->nullable();
            $table->string('link_text')->nullable();
            $table->text('url');
            $table->timestamps();
        });
    }

    public function down()
    {
        Schema::dropIfExists('scraped_data');
    }
};

マイグレーションを実行してテーブルを作成します。

bash

php artisan migrate

2. モデルの作成

データベースに保存するためのモデルを作成します。

bash

php artisan make:model ScrapedData

app/Models/ScrapedData.php に以下を記述します。

php

<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Factories\HasFactory;
use Illuminate\Database\Eloquent\Model;

class ScrapedData extends Model
{
    use HasFactory;

    protected $table = 'scraped_data';

    protected $fillable = [
        'title',
        'link_text',
        'url',
    ];
}

3. WebScraperService を更新

スクレイピングしたデータをデータベースに保存するように変更します。

app/Services/WebScraperService.php

php

<?php

namespace App\Services;

use GuzzleHttp\Client;
use DOMDocument;
use DOMXPath;
use App\Models\ScrapedData;

class WebScraperService
{
    protected $client;

    public function __construct()
    {
        $this->client = new Client();
    }

    public function scrape(string $url): array
    {
        // 1. GuzzleHttpでURLにアクセス
        $response = $this->client->request('GET', $url, [
            'headers' => [
                'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
            ]
        ]);

        if ($response->getStatusCode() !== 200) {
            throw new \Exception("Failed to fetch the URL: " . $url);
        }

        $html = (string) $response->getBody();

        // 2. DOMDocumentでHTMLを解析
        libxml_use_internal_errors(true);
        $dom = new DOMDocument();
        $dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
        libxml_clear_errors();

        // 3. DOMXPathでデータを取得
        $xpath = new DOMXPath($dom);

        // h1タグの取得
        $titles = [];
        foreach ($xpath->query('//h1') as $node) {
            $titles[] = trim($node->nodeValue);
        }

        // aタグのリンクを取得
        $links = [];
        foreach ($xpath->query('//a[@href]') as $node) {
            $linkText = trim($node->nodeValue);
            $linkHref = $node->getAttribute('href');

            // データベースに保存 
            //変数名とカラム名を揃えておけば、そのまま代入で保存される
            ScrapedData::create([
                'title'     => $titles[0] ?? null,
                'link_text' => $linkText,
                'url'       => $linkHref,
            ]);

            $links[] = [
                'text' => $linkText,
                'href' => $linkHref,
            ];
        }

        return [
            'titles' => $titles,
            'links'  => $links,
        ];
    }
}

4. コントローラーの更新

スクレイピング時にデータベースへ保存しつつ、レスポンスを返します。

app/Http/Controllers/ScraperController.php

php

<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;
use App\Services\WebScraperService;
use App\Models\ScrapedData;

class ScraperController extends Controller
{
    protected $scraperService;

    public function __construct(WebScraperService $scraperService)
    {
        $this->scraperService = $scraperService;
    }

    public function scrape()
    {
        $url = 'https://example.com'; // 取得したいサイトのURL

        try {
            $data = $this->scraperService->scrape($url);
            return response()->json($data);
        } catch (\Exception $e) {
            return response()->json(['error' => $e->getMessage()], 500);
        }
    }

    public function getScrapedData()
    {
        $data = ScrapedData::all();
        return response()->json($data);
    }
}

5. ルートの設定

ルートを設定し、スクレイピングしたデータを取得できるようにします。

routes/web.php

php

use App\Http\Controllers\ScraperController;

Route::get('/scrape', [ScraperController::class, 'scrape']);
Route::get('/scraped-data', [ScraperController::class, 'getScrapedData']);

6. 動作確認

データのスクレイピング

以下のURLにアクセスすると、スクレイピングが実行され、データベースに保存されます。

arduino

http://localhost/scrape

保存データの確認

以下のURLにアクセスすると、データベースに保存されたスクレイピングデータを取得できます。

arduino

http://localhost/scraped-data

まとめ

マイグレーションを作成 → scraped_data テーブルを作成
モデル ScrapedData を作成 → データを扱う
WebScraperService を更新 → スクレイピング時にデータを保存
コントローラーの作成 → データを取得・保存
ルートを設定 → スクレイピング実行＆データ取得
動作確認 → /scrape でスクレイピング、/scraped-data で保存データ確認

これでLaravelでスクレイピングしたデータをデータベースに保存できるようになります！