Cheerio v1.2.0 ドキュメント

CheerioはサーバーサイドでHTMLとXMLを解析・操作するための高速で柔軟、かつエレガントなライブラリです。jQueryのコア機能のサブセットを実装し、サーバーサイド環境に最適化されながらも開発者にとって馴染みのあるAPIを提供します。

現在のバージョン（1.2.0）の機能

コア読み込みメソッド

現在のバージョンでは、HTML/XMLドキュメントを読み込み・解析するための複数の強力な方法を提供しています：

import * as cheerio from 'cheerio';

// Basic loading from string
const $ = cheerio.load('<h2 class="title">Hello world</h2>');

// Loading from buffer with encoding detection
const buffer = fs.readFileSync('index.html');
const $ = cheerio.loadBuffer(buffer);

// Loading from URL with automatic encoding detection
const $ = await cheerio.fromURL('https://example.com');

// Stream-based loading for large documents
const stream = cheerio.stringStream({}, (err, $) => {
  if (!err) {
    console.log($('h1').text());
  }
});

強化されたTypeScriptサポート

バージョン1.2.0では、型安全性が改善された包括的なTypeScript定義が含まれています：

import { CheerioAPI, Cheerio, Element } from 'cheerio';

// Strongly typed element selection
const $: CheerioAPI = cheerio.load(html);
const elements: Cheerio<Element> = $('.my-class');

新しいExtract API

HTMLドキュメントからのデータ抽出のための強力な新機能：

const data = $root.extract({
  title: 'h1',
  links: [{ selector: 'a', value: 'href' }],
  metadata: {
    selector: '.meta',
    value: {
      author: '.author',
      date: '.date'
    }
  }
});

高度なURL処理

baseURIによるURL解決の強化されたサポート：

const $ = cheerio.load(html, { 
  baseURI: 'https://example.com/page/' 
});

// Automatically resolves relative URLs
$('a').prop('href'); // Returns absolute URL
$('img').prop('src'); // Returns absolute URL

カテゴリ別主要機能

DOM操作

CSSセレクタによるjQueryスタイルの要素選択
包括的な属性とプロパティの操作
DOMトラバーサルメソッドの完全サポート
要素の挿入、削除、変更

フォーム処理

// Serialize forms to URL-encoded strings
$('form').serialize(); // 'name=value&email=test@example.com'

// Get form data as structured arrays
$('form').serializeArray();
// [{ name: 'username', value: 'john' }, ...]

CSSとスタイリング

// Get/set CSS properties
$('.element').css('color', 'red');
$('.element').css(['margin', 'padding']); // Get multiple properties

// Class manipulation
$('.item').addClass('active selected');
$('.item').removeClass('old-class');
$('.item').toggleClass('visible');

データ属性

// HTML5 data-* attribute support with automatic type coercion
$('.widget').data('config'); // Parses JSON automatically
$('.widget').data('count', 42); // Set data programmatically

旧バージョンからの移行

0.xから1.xへ

破壊的変更: Node.js 12+が必要（旧バージョンのサポート終了）
破壊的変更: 一部の内部APIが変更されました
改善: 全体的にTypeScriptサポートが向上
新機能: メモリ効率向上のためのストリームベース解析

注意すべき主な変更点

// Old way (0.x) - still works but discouraged
const cheerio = require('cheerio');
const $ = cheerio.load(html);

// New way (1.x) - recommended
import * as cheerio from 'cheerio';
const $ = cheerio.load(html);

パフォーマンスの改善

バージョン1.2.0では大幅なパフォーマンス向上が含まれています：

高速解析: エラーハンドリングが改善されたHTMLパーサー
メモリ効率: 大きなドキュメントでのメモリフットプリント削減
ストリーミングサポート: 全体をメモリに読み込まずに大きなドキュメントを処理

// Stream processing for large files
const stream = cheerio.decodeStream({
  encoding: { defaultEncoding: 'utf8' }
}, (err, $) => {
  // Process document as it streams
});

高度な機能

カスタムパーサーオプション

const $ = cheerio.load(html, {
  xmlMode: true,        // Parse as XML
  decodeEntities: false, // Don't decode HTML entities
  scriptingEnabled: false // Disable script tag processing
});

オプション付きネットワーク読み込み

const $ = await cheerio.fromURL('https://api.example.com/data', {
  requestOptions: {
    headers: { 'User-Agent': 'MyBot/1.0' }
  },
  encoding: { defaultEncoding: 'utf8' }
});

ベストプラクティス

TypeScriptを使用する: 包括的な型定義を活用しましょう
大きなドキュメントはストリーム処理する: メモリ管理を向上させるためストリーミングAPIを使用しましょう
extract APIを活用する: 構造化データ抽出には新しいextractメソッドを使用しましょう
baseURIを設定する: ウェブサイトをスクレイピングする際は、適切なURL解決のためbaseURIを設定しましょう

ブラウザ vs サーバーの違い

Cheerioはサーバーサイド使用向けに設計されており、ブラウザ固有のjQuery機能を除去する一方で、以下のようなサーバー最適化機能を追加しています：

ブラウザのDOM不整合がない
不正なHTMLに対するより良いエラーハンドリング
メモリ効率的な解析
ストリーム処理機能
自動エンコーディング検出

これにより、CheerioはNode.js環境でのウェブスクレイピング、サーバーサイドレンダリング、その他のHTML/XML処理タスクに理想的です。

← FAQ