- Published on
Extract Article Content from any Website with Readability.js
- Authors
- Name
- Ashik Nesin
- @AshikNesin
These days websites are so cluttered with ads and do not provide good Readability.
That's why most browsers come up with an article reader feature which shows only the article so you can just focus on it.
Ever wondered to build something similar to that?
That's where Readability.js comes in.
It is used under the hood for Mozilla Firefox's Reader View.
Here is how to use it:
Dependency
npm install @mozilla/readability
npm install jsdom
Snippet
const { Readability } = require('@mozilla/readability');
const { JSDOM } = require('jsdom');
const url = 'https://nesin.io/blog/opencommit-auto-generate-commit-message';
const main = async () => {
const dom = await JSDOM.fromURL(url);
const reader = new Readability(dom.window.document);
const article = reader.parse();
console.log(article);
}
main()
Happy extracting article!