Cheerio v1.2.0 Documentation

Cheerio is a fast, flexible, and elegant library for parsing and manipulating HTML and XML on the server side. It implements a subset of jQuery's core functionality, providing a familiar API for developers while being optimized for server-side environments.

Current Version (1.2.0) Features

Core Loading Methods

The current version provides several powerful ways to load and parse HTML/XML documents:

import * as cheerio from 'cheerio';

// Basic loading from string
const $ = cheerio.load('<h2 class="title">Hello world</h2>');

// Loading from buffer with encoding detection
const buffer = fs.readFileSync('index.html');
const $ = cheerio.loadBuffer(buffer);

// Loading from URL with automatic encoding detection
const $ = await cheerio.fromURL('https://example.com');

// Stream-based loading for large documents
const stream = cheerio.stringStream({}, (err, $) => {
  if (!err) {
    console.log($('h1').text());
  }
});

Enhanced TypeScript Support

Version 1.2.0 includes comprehensive TypeScript definitions with improved type safety:

import { CheerioAPI, Cheerio, Element } from 'cheerio';

// Strongly typed element selection
const $: CheerioAPI = cheerio.load(html);
const elements: Cheerio<Element> = $('.my-class');

New Extract API

A powerful new feature for data extraction from HTML documents:

const data = $root.extract({
  title: 'h1',
  links: [{ selector: 'a', value: 'href' }],
  metadata: {
    selector: '.meta',
    value: {
      author: '.author',
      date: '.date'
    }
  }
});

Advanced URL Handling

Enhanced support for URL resolution with baseURI:

const $ = cheerio.load(html, { 
  baseURI: 'https://example.com/page/' 
});

// Automatically resolves relative URLs
$('a').prop('href'); // Returns absolute URL
$('img').prop('src'); // Returns absolute URL

Key Features by Category

DOM Manipulation

jQuery-style element selection with CSS selectors
Comprehensive attribute and property manipulation
Full support for DOM traversal methods
Element insertion, removal, and modification

Form Handling

// Serialize forms to URL-encoded strings
$('form').serialize(); // 'name=value&email=test@example.com'

// Get form data as structured arrays
$('form').serializeArray();
// [{ name: 'username', value: 'john' }, ...]

CSS and Styling

// Get/set CSS properties
$('.element').css('color', 'red');
$('.element').css(['margin', 'padding']); // Get multiple properties

// Class manipulation
$('.item').addClass('active selected');
$('.item').removeClass('old-class');
$('.item').toggleClass('visible');

Data Attributes

// HTML5 data-* attribute support with automatic type coercion
$('.widget').data('config'); // Parses JSON automatically
$('.widget').data('count', 42); // Set data programmatically

Migration from Older Versions

From 0.x to 1.x

Breaking: Node.js 12+ required (dropped support for older versions)
Breaking: Some internal APIs have changed
Improved: Better TypeScript support throughout
New: Stream-based parsing for better memory efficiency

Key Changes to Watch

// Old way (0.x) - still works but discouraged
const cheerio = require('cheerio');
const $ = cheerio.load(html);

// New way (1.x) - recommended
import * as cheerio from 'cheerio';
const $ = cheerio.load(html);

Performance Improvements

Version 1.2.0 includes significant performance enhancements:

Faster parsing: Improved HTML parser with better error handling
Memory efficiency: Reduced memory footprint for large documents
Streaming support: Process large documents without loading entirely into memory

// Stream processing for large files
const stream = cheerio.decodeStream({
  encoding: { defaultEncoding: 'utf8' }
}, (err, $) => {
  // Process document as it streams
});

Advanced Features

Custom Parser Options

const $ = cheerio.load(html, {
  xmlMode: true,        // Parse as XML
  decodeEntities: false, // Don't decode HTML entities
  scriptingEnabled: false // Disable script tag processing
});

Network Loading with Options

const $ = await cheerio.fromURL('https://api.example.com/data', {
  requestOptions: {
    headers: { 'User-Agent': 'MyBot/1.0' }
  },
  encoding: { defaultEncoding: 'utf8' }
});

Best Practices

Use TypeScript: Take advantage of the comprehensive type definitions
Stream large documents: Use streaming APIs for better memory management
Leverage the extract API: Use the new extract method for structured data extraction
Set baseURI: When scraping websites, set baseURI for proper URL resolution

Browser vs Server Differences

Cheerio is designed for server-side use and removes browser-specific jQuery features while adding server-optimized functionality like:

No browser DOM inconsistencies
Better error handling for malformed HTML
Memory-efficient parsing
Stream processing capabilities
Automatic encoding detection

This makes Cheerio ideal for web scraping, server-side rendering, and any HTML/XML processing tasks in Node.js environments.

← FAQ