Web scraping in R
Learn how to extract data directly from a web page, referred as web scraping, in R through a real life example
This post has been written in collaboration with Pietro Zanotta.
Introduction
Almost anyone is familiar with web pages (otherwise you would not be here), but what if we tell you that how you see a site is different from how Google or your browser does?
In fact, when you type any site address in your browser, your browser will download and render the page for you, but for rendering the page it needs some instructions.
There are 3 types of instructions:
- HTML: describes a web page’s infrastructure;
- CSS: defines the appearance of a site;
- JavaScript: decides the behavior of the page.
Web scraping is the art of extracting information from the HTML, CSS and Javascript lines of code. The term usually refers to an automated process, which is less error-prone and faster than gathering data by hand.
It is important to note that web scraping can raise ethical concerns, as it involves accessing and using data from websites without the explicit permission of the website owner. It is a good practice to respect…