Node.js Web Scraping
Demo
Node.js excels at a lot of things and one of them is web scraping. This simple web scraper demoed here does the following tasks in just milliseconds:
- AJAX request is received and initiates the web scraping task
- Requests the web page
- Grabs the HTML and passes it to Cheerio for DOM parsing
- Cheerio finds the unique element on the DOM or, if no unique element exists, finds the first tag that matches your query and returns the data in a JavaScript object
- For each link in the list, the link text and href are pulled and stored in variables
- The data is cross referenced and combined with my own data with the output stored in a JSON object
- JSON is returned to the browser, where it is parsed and displayed in a formatted HTML table
- A snapshot of the browser-rendered HTML is sent back to the server with a 2nd AJAX request
- The formatted HTML snapshot is saved to a MongoDB database, for use in the DYNAMIC WEBSITE CONTENT WITH MONGODB demo.