Data Crawling and Web Scraping

Welcome to blog of Emmela’s.

Been in the industry for 8 years till date thought to put all my experiences / thoughts and share with you all.

In Aug 2007 , that was the first time we thought to crawl / fetch data from different websites for analytics purpose. We started working around RSS FEEDS. Extracted and parsed data using php libraries.

Today in Dec 2016, am leading a team of 60 python engineers scraping the data across the globe in 14 different languages, from 40 odd countries and built in-house framework. We work around the Scrapy framework with in-house wrapper on the top.

From RSS feed we have grown to a level where we are extracting data:

  • From complex websites
  • Login based sources
  • HTML / JSON / XML or any format of webpage
  • Offline – Excel / Docs / PDF
  • Email extraction

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: