Garbled Characters in Bing Index: When zstd Meets Search Engine Crawlers

The blog enabled zstd, gzip, and brotli compression simultaneously. Since zstd achieves compression ratios similar to gzip while significantly reducing CPU usage, I set zstd as the default compression method. Compatibility measures were implemented: when browsers don't include zstd in the accept-encoding header, the system falls back to gzip or brotli. Caching also responds differently based on accept-encoding headers. However, when I searched my site recently, I discovered some pages appeared as garbled characters.

Garbled Characters in Bing Search Results
Garbled Characters in Bing Search Results

Checking Bing Webmaster Tools revealed:

SEO Issues Reported by Bing
SEO Issues Reported by Bing

HTML Content Crawled by Bing
HTML Content Crawled by Bing

HTTP Response Indicates Bing Spider Crawled a zstd-Encoded Page
HTTP Response Indicates Bing Spider Crawled a zstd-Encoded Page

This is a normally indexed page:

Normally Indexed Page is Compressed with gzip
Normally Indexed Page is Compressed with gzip

In theory, if Bing's spider didn't send an accept-encoding header containing zstd, zstd compression shouldn't be used. However, I couldn't identify exactly where the failure occurred. Ultimately, I reverted to gzip as the default compression.

By the way, let's examine the compression ratios. Compression levels were set to 6 for zstd, 5 for gzip, and 4 for brotli. Article 1 contains more code blocks.

Compression AlgorithmArticle 1Article 2
Uncompressed309kb69.6kb
zstd27.9kb17.2kb
gzip39.5kb16.8kb
br28.5kb17kb

你可能也感兴趣

加载评论…