Scraping FAQs

With the increase of business requirements, here we are curating all the FAQs w.r.to Crawling/ Scraping . We answer them in detail.

[ At time we would also blog the Q&A format for more understanding with an example/ scenario/ usecase too ]

  1. Question: Can you guys setup the whole scraping ecosystem for us?
    1. Answer: Yes, we do.
  2. Q: Do you crawl all the websites?
    1. A. Technically we can crawl any wesbite that you may aks us with a business requirement. At the same time we respect the website and count on their hardship & content. Hence we do only scrape the publicly available data which is allowed to crawl,
    2. Q: How do you know if the online content is allowed to crawl?
      1. A: Our crawlers would automatically first check for robots.txt and proceed as per that.
        1. We also check the domain and the subdomains while crawling.
  3. Q: Howlong does it take to crawl a wesbite?
    1. A: It completely depends on the data we want to crawl and patterns in the target website.
      1. Typically extracting data takes less than a day to a week long too.
  4. Q:Is it legal to crawl data?
    1. A: Any publicly available data can be scraped and so does the search engines. Before crawling the data we do check robots.txt for the allowed content.
      1. Data that is scraped usally be used for educational or reaseach.
      2. If to be used for commerical purpose one has to follow the respective terms & conditions/ data usage policy / need to get reuasge permissions.

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s