Better Web Scraping Through Reverse Engineering AJAX Calls
By Diego Berrocal

Have you ever wished you could click a button and order a pizza? There is a catch: your favorite pizzeria doesn’t have a button-to-order app, its website is client-side rendered and doesn’t have a public developer API. We’ll learn two ways to still get you that pizza, compare them and learn when to use which.

Saturday 4 p.m.–4:30 p.m.

Have you ever wished you could automate any task in a website? Be it ordering food from your favorite vendor with an online shopping cart, or find out the weekly items for sale in your local grocery store website. Due to the rise of libraries like React and the proliferation of client-side rendered webapps, chances are that you may be inclined to use dynamic scrapers like Selenium or Puppeteer. However, because these webapps need to initialize their components, they still make AJAX calls to services, exposing “internal” APIs we can use. This is not how its developers intended us to use the site, because it can change without notice, but that doesn’t mean it’s not useful! After leaving this talk, you’ll be able to backwards-engineer these “internal” APIs and replicate any trivial flow as well as knowing how to use Puppeteer to directly interact with the page. You’ll also learn the criteria to choose when to use a dynamic scraper vs just using the “internal” API webapps use.

Diego Berrocal

I'm a Peruvian ex-physicist and developer living in New York interested in programming, Linux desktop ricing, music, astronomy photography and AI. I tackle on challenging problems and try to solve them with code, I also spend a high amount of time using Emacs and contribute to Spacemacs in order to finish the Editor Wars (Emacs + Vim = ❤).