Ecommerce Image Scraper

3/28/2025Qorvinus

Why I Built This Scraper

The client runs a decent-sized e-commerce shop, and they needed to consolidate their product data. We're talking hundreds of items, each with images, alt attributes, descriptions, and pricing. Doing this manually? Nightmare fuel.

There wasn't an API to work with (of course), and the site's CMS made exports clunky and incomplete. So, the natural move: build a scraper tailored to their structure.

The Approach I Took

The site had a fairly consistent product page layout, which made the scraping job easier. I inspected the HTML to identify reliable selectors for each piece of info I needed: image URLs, alt text, descriptions, and prices.

I used Python with requests and BeautifulSoup for HTML parsing. Once the data was collected, I used csv to dump the metadata and saved image files locally using urllib.parse.

The scraper looped through product listing pages, visited each individual product, extracted the relevant data, and exported it all in one go.

Challenges Along the Way

  • Some images had lazy-loading with JavaScript. I had to tweak things to grab the data-src instead of src.
  • A few product pages were missing alt tags entirely. I flagged those for the client so they could fix them manually.
  • Rate limiting kicked in during my first full run. Threw in some random delays between requests to stay under the radar.

Tools and Tech Stack

  • Python for the whole script
  • requests for HTTP calls
  • BeautifulSoup for parsing HTML
  • urllib for downloading images
  • csv for output

Things I'd Improve Next Time

  • If I had more time, I’d add:
  • A small CLI interface with arguments for output path, rate limits, etc.
  • Option to export to JSON or a database instead of just CSV
  • Maybe a web UI for non-tech clients to run it themselves

For now though, it got the job done, saved the client a bunch of manual labor, and kept things lightweight. Always a win in my book.