Scrape Yahoo search engine results with R
Learn how to scrape Yahoo search engine results with R using the {rvest} package
--
Introduction
Web scraping is the process of extracting data from websites. It is usually done in an automated manner to obtain a large amounts of data through various websites, without the need to gather data by hand.
In a previous post, we introduced this method and illustrated it with a Wikipedia page. Although there are a lot of use cases of web scraping, in this blog post, we are restricting ourselves to scraping search results from Yahoo using R. Scraping search engine results can help you with SEO analysis, competitor analysis, keyword research, trend analysis, etc.
Scraping Yahoo search engine results with R
After installing R and RStudio, we first need to load the necessary packages by running the following commands:1
# install.packages("rvest")
# install.packages("jsonlite")
# install.packages("purrr")
library(rvest)
library(jsonlite)
library(purrr)
The {rvest}
package is for web scraping, the {jsonlite}
package is for working with JSON data and the {purrr}
package is for working with functions and vectors.
It is always better to decide in advance what exactly we are going to scrape. For this tutorial, we are going to scrape search results from this URL:
We are going to scrape the following data points from this page:
- Link
- Title
- Description
For this, we define the URL of the Yahoo search results page that we want to scrape. In this case, we are searching for the word “pizza”.
# URL of the Yahoo search results page
url <- "https://search.yahoo.com/search?p=pizza"
We then use the read_html()
function from the {rvest}
package to read the HTML content of the provided URL:
# Read the HTML content of the page…