FAQ

Yes, any visible text on a web page can be extracted, we can also extract other attributes such as weight, size, qty in stock etc. Collectively this know as Web Crawling, Web Data Extraction, Web Harvesting, Web Data Mining, Screen Scraping

Yes, images can be downloaded, they are provided to you in a zip file. Each image file name is stored against a product row so you can easily match the image to the product text.

Yes, our Crawler can identify all PDF links on a web page. The crawler will then download them automatically just like images so you have an offline copy of the PDF’s. Extracting data from PDF’s is a different technique from extracting(scraping) data from a HTML page. In the past we have used different tools to then open up each PDF and extract the desired information. Which tool we use or recommend really depends how the PDF is structured.

As often as required, one offs, daily, weekly, let us know your requirements.

Yes, we can auto fill, login names, passwords, fill in dates, submit text, select from drop down lists. We can automate anything that a human can do manually

This is tricky. if you are looking at scraping a site with a CAPTCHA input box, contact our team and we’ll see if we can assist. Ultimately there might be some human input here.

Generally we provide the data in CSV, XLS,  XML or SQL Insert Statements. However, if you have custom requirement please contact our team and we’ll see if we can assist.

Yes, this is useful to categorise the data in to sections and sub sections.

Breadcrumbs typically appear horizontally across the top of a web page, usually below title bars or headers. They provide links back to each previous page the user navigated through to get to the current page or—in hierarchical site structures—the parent pages of the current one. Breadcrumbs provide a trail for the user to follow back to the starting or entry point. A greater-than sign (>) often serves as hierarchy separator, although designers may use other glyphs (such as » or ›), as well as various graphical treatments.

Typical breadcrumbs look like this:

Home page > Section page > Subsection page
or
Home page >> Section page >> Subsection page

for more info see http://en.wikipedia.org/wiki/Breadcrumb_(navigation)

Yes, can can provide a “WebShot” of a page. The web shot will “auto-scoll” to ensure the entire page is captured from top to bottom.

If you have any questions about our products, services, or about Screen Scraping, Data Extraction, Web Harvesting, Data Mining, Competitor Price Analysis, please don’t hesitate to contact us.

FAQ didn’t answer your question?