I recently came across an SEO test that attempted to verify whether compression ratio affects rankings. It seems there may be some who believe that higher compression ratios correlate with lower rankings. Understanding compressibility in the context of SEO requires reading both the original source on compression ratios and the research paper itself before drawing conclusions about whether or not it’s an SEO myth.
Search Engines Compress Web Pages
Compressibility, in the context of search engines, refers to how much web pages can be compressed. Shrinking a document into a zip file is an example of compression. Search engines compress indexed web pages because it saves space and results in faster processing. It’s something that all search engines do.
Websites & Host Providers Compress Web Pages
Web page compression is a good thing because it helps search crawlers quickly access web pages which in turn sends the signal to Googlebot that it won’t strain the server and it’s okay to grab even more pages for indexing.
Compression speeds up websites, providing site visitors a high quality user experience. Most web hosts automatically enable compression because it’s good for websites, site visitors and also good for web hosts because it saves on bandwidth loads.
Everybody wins with website compression.
High Levels Of Compression Correlate With Spam
Researchers at a search engine discovered that highly compressible web pages correlated with low-quality content. The study called Spam, Damn Spam, and Statistics: Using Statistical Analysis to Locate Spam Web Pages (PDF) was conducted in 2006 by two of the world’s leading researchers, Marc Najork and Dennis Fetterly.
Najork currently works at DeepMind as Distinguished Research Scientist. Fetterly, a software engineer at Google, is an author of many important research papers related to search, content analysis and other related topics. This research paper isn’t just any research paper, it’s an important one.
What the 2006 research paper shows is that 70% of web pages that compress at a level of 4.0 or higher tended to be low quality pages with a high level of redundant word usage. The average compression level of sites was around 2.0.