Data Crawling and Web Scraping

Welcome to blog of Emmela’s.

Been in the industry for 8 years till date thought to put all my experiences / thoughts and share with you all.

In Aug 2007 , that was the first time we thought to crawl / fetch data from different websites for analytics purpose. We started working around RSS FEEDS. Extracted and parsed data using php libraries.

Today in Dec 2016, am leading a team of 60 python engineers scraping the data across the globe in 14 different languages, from 40 odd countries and built in-house framework. We work around the Scrapy framework with in-house wrapper on the top.

From RSS feed we have grown to a level where we are extracting data:

  • From complex websites
  • Login based sources
  • HTML / JSON / XML or any format of webpage
  • Offline – Excel / Docs / PDF
  • Email extraction

Leave a comment

Leave a Reply

%d bloggers like this: