Data Crawling and Web Scraping

Raja Emmela December 29, 2016June 6, 2017 Leave a comment

Welcome to blog of Emmela’s.

Been in the industry for 8 years till date thought to put all my experiences / thoughts and share with you all.

In Aug 2007 , that was the first time we thought to crawl / fetch data from different websites for analytics purpose. We started working around RSS FEEDS. Extracted and parsed data using php libraries.

Today in Dec 2016, am leading a team of 60 python engineers scraping the data across the globe in 14 different languages, from 40 odd countries and built in-house framework. We work around the Scrapy framework with in-house wrapper on the top.

From RSS feed we have grown to a level where we are extracting data:

From complex websites
Login based sources
HTML / JSON / XML or any format of webpage
Offline – Excel / Docs / PDF
Email extraction

Leave a comment

Leave a ReplyCancel reply