Anyone here doing web scraping?

Web scraping is a business area that I think would be a good fit for our platform, so I’m trying to find out a bit more about the space, and see how we can help users solve their problems in this space.

If anyone here is doing a startup based around web scraping/internet data mining, could I ask a few questions:

  • What are you using it for?
  • How much data are you dealing with?
  • What are some of your major challenges?
  • What platforms/tools are you using right now?

thanks
Steve

Hi Steve,

One of my ventures is doing college registration help for students. As part of that, we have to scrape course catalogs off of the public website for a university. We’re dealing with probably about 1-2K pages of data (don’t have a size of data to give you there, but it’s a fair amount). The biggest challenge we have is format of the page changes since our scraping is someone format dependent (we try to avoid using very specific XPath queries to pull it, but things still break as the universities don’t notify us when they change their pages!). We do this through Python and BeautifulSoup, which does a reasonable job for what we need.

Hey Steve,

I don’t do scrapping actually, but I hear a lot of good things on https://www.kimonolabs.com/
Even their demonstration is well made, and honestly it convinces me to use it if I need it in the future.

Hope it helps.

Thanks for the info. Can I ask how often you update it? And how long does it take to scrape the full set of pages?