Cheerio v1.2.0 Documentation
Cheerio is a fast, flexible, and elegant library for parsing and manipulating HTML and XML on the server side. It implements a subset of jQuery's core functionality, providing a familiar API for developers while being optimized for server-side environments.
Current Version (1.2.0) Features
Core Loading Methods
The current version provides several powerful ways to load and parse HTML/XML documents:
import * as cheerio from 'cheerio';
// Basic loading from string
const $ = cheerio.load('<h2 class="title">Hello world</h2>');
// Loading from buffer with encoding detection
const buffer = fs.readFileSync('index.html');
const $ = cheerio.loadBuffer(buffer);
// Loading from URL with automatic encoding detection
const $ = await cheerio.fromURL('https://example.com');
// Stream-based loading for large documents
const stream = cheerio.stringStream({}, (err, $) => {
if (!err) {
console.log($('h1').text());
}
});
Enhanced TypeScript Support
Version 1.2.0 includes comprehensive TypeScript definitions with improved type safety:
import { CheerioAPI, Cheerio, Element } from 'cheerio';
// Strongly typed element selection
const $: CheerioAPI = cheerio.load(html);
const elements: Cheerio<Element> = $('.my-class');
New Extract API
A powerful new feature for data extraction from HTML documents:
const data = $root.extract({
title: 'h1',
links: [{ selector: 'a', value: 'href' }],
metadata: {
selector: '.meta',
value: {
author: '.author',
date: '.date'
}
}
});
Advanced URL Handling
Enhanced support for URL resolution with baseURI:
const $ = cheerio.load(html, {
baseURI: 'https://example.com/page/'
});
// Automatically resolves relative URLs
$('a').prop('href'); // Returns absolute URL
$('img').prop('src'); // Returns absolute URL
Key Features by Category
DOM Manipulation
- jQuery-style element selection with CSS selectors
- Comprehensive attribute and property manipulation
- Full support for DOM traversal methods
- Element insertion, removal, and modification
Form Handling
// Serialize forms to URL-encoded strings
$('form').serialize(); // 'name=value&email=test@example.com'
// Get form data as structured arrays
$('form').serializeArray();
// [{ name: 'username', value: 'john' }, ...]
CSS and Styling
// Get/set CSS properties
$('.element').css('color', 'red');
$('.element').css(['margin', 'padding']); // Get multiple properties
// Class manipulation
$('.item').addClass('active selected');
$('.item').removeClass('old-class');
$('.item').toggleClass('visible');
Data Attributes
// HTML5 data-* attribute support with automatic type coercion
$('.widget').data('config'); // Parses JSON automatically
$('.widget').data('count', 42); // Set data programmatically
Migration from Older Versions
From 0.x to 1.x
- Breaking: Node.js 12+ required (dropped support for older versions)
- Breaking: Some internal APIs have changed
- Improved: Better TypeScript support throughout
- New: Stream-based parsing for better memory efficiency
Key Changes to Watch
// Old way (0.x) - still works but discouraged
const cheerio = require('cheerio');
const $ = cheerio.load(html);
// New way (1.x) - recommended
import * as cheerio from 'cheerio';
const $ = cheerio.load(html);
Performance Improvements
Version 1.2.0 includes significant performance enhancements:
- Faster parsing: Improved HTML parser with better error handling
- Memory efficiency: Reduced memory footprint for large documents
- Streaming support: Process large documents without loading entirely into memory
// Stream processing for large files
const stream = cheerio.decodeStream({
encoding: { defaultEncoding: 'utf8' }
}, (err, $) => {
// Process document as it streams
});
Advanced Features
Custom Parser Options
const $ = cheerio.load(html, {
xmlMode: true, // Parse as XML
decodeEntities: false, // Don't decode HTML entities
scriptingEnabled: false // Disable script tag processing
});
Network Loading with Options
const $ = await cheerio.fromURL('https://api.example.com/data', {
requestOptions: {
headers: { 'User-Agent': 'MyBot/1.0' }
},
encoding: { defaultEncoding: 'utf8' }
});
Best Practices
- Use TypeScript: Take advantage of the comprehensive type definitions
- Stream large documents: Use streaming APIs for better memory management
- Leverage the extract API: Use the new extract method for structured data extraction
- Set baseURI: When scraping websites, set baseURI for proper URL resolution
Browser vs Server Differences
Cheerio is designed for server-side use and removes browser-specific jQuery features while adding server-optimized functionality like:
- No browser DOM inconsistencies
- Better error handling for malformed HTML
- Memory-efficient parsing
- Stream processing capabilities
- Automatic encoding detection
This makes Cheerio ideal for web scraping, server-side rendering, and any HTML/XML processing tasks in Node.js environments.