Steve Breese

A Chicago-based Full-stack JavaScript Developer

Node.js Web Scraping


Node.js excels at a lot of things and one of them is web scraping. This simple web scraper demoed here does the following tasks in just milliseconds:

  • AJAX request is received and initiates the web scraping task
  • Requests the web page
  • Grabs the HTML and passes it to Cheerio for DOM parsing
  • Cheerio finds the unique element on the DOM or, if no unique element exists, finds the first tag that matches your query and returns the data in a JavaScript object
  • For each link in the list, the link text and href are pulled and stored in variables
  • The data is cross referenced and combined with my own data with the output stored in a JSON object
  • JSON is returned to the browser, where it is parsed and displayed in a formatted HTML table
  • A snapshot of the browser-rendered HTML is sent back to the server with a 2nd AJAX request
  • The formatted HTML snapshot is saved to a MongoDB database, for use in the DYNAMIC WEBSITE CONTENT WITH MONGODB demo.