Web scraping in R

Learn how to extract data directly from a web page, referred as web scraping, in R through a real life example

Antoine Soetewey
13 min readJan 16, 2023
Web scraping in R
Photo by Florian Olivo

This post has been written in collaboration with Pietro Zanotta.

Introduction

Almost anyone is familiar with web pages (otherwise you would not be here), but what if we tell you that how you see a site is different from how Google or your browser does?

In fact, when you type any site address in your browser, your browser will download and render the page for you, but for rendering the page it needs some instructions.

There are 3 types of instructions:

  • HTML: describes a web page’s infrastructure;
  • CSS: defines the appearance of a site;
  • JavaScript: decides the behavior of the page.

Web scraping is the art of extracting information from the HTML, CSS and Javascript lines of code. The term usually refers to an automated process, which is less error-prone and faster than gathering data by hand.

It is important to note that web scraping can raise ethical concerns, as it involves accessing and using data from websites without the explicit permission of the website owner. It is a good practice to respect…

--

--

Antoine Soetewey

Doctoral researcher in statistics at UCLouvain. Interested in statistics, R, and making them accessible to everyone. Author of statsandr.com.