Published on

Extract Article Content from any Website with Readability.js

Authors

These days websites are so cluttered with ads and do not provide good Readability.

That's why most browsers come up with an article reader feature which shows only the article so you can just focus on it.

Ever wondered to build something similar to that?

That's where Readability.js comes in.

It is used under the hood for Mozilla Firefox's Reader View.

Here is how to use it:

Dependency

npm install @mozilla/readability
npm install jsdom

Snippet

const { Readability } = require('@mozilla/readability');
const { JSDOM } = require('jsdom');

const url = 'https://nesin.io/blog/opencommit-auto-generate-commit-message';

const main = async () => {
    const dom = await JSDOM.fromURL(url);
    const reader = new Readability(dom.window.document);
    const article = reader.parse();
    console.log(article);
}
main()

Happy extracting article!