What is Web Data Harvesting ?
Web Data Harvesting is the art of accurately extracting data, images and other files from a web site.
Generally Web Harvesting will work coupled with a Web Crawler. The Web Crawler will be responsible for automatically navigating a web site. The crawler will follow every link in a methodical way hunting for data to be harvested. When the crawler (sometimes know as web spiders or web bots) has visited a page, the page is saved and added to the pile for the web harvester to extract the required information.
The harvested (extracted) data is then cleansed, processed, transformed, translated as required and then stored in another place such as a spread sheet or database. Once the data is stored in Excel, CSV (Comma Separated Values) or a Database, it makes life much easier to use the data.
A user can easily copy and paste data from a web page. They can even download images and store the files names with the associated product text. For small amounts of data this is fine. But consider many rows of data across 10s, 100s or even 1000s of pages, this can be very time consuming and prone to human error.
Web Data Harvesting
Save 100’s of hours manually inputting
Why use Web Data Harvesting
- Extract data and images from a web site very quickly.
- Analyses a competitors site. “Measure” their product range.
- Identify competitors items In Stock and Out of Stock
- Identify a competitors brand proportion, how much of one band to they sell, what product types.
- Identify if a competitor is selling products you are not, and vice-versa
- Extract data and images from a web site very accurately.
- Compile the extracted information into a database, spread sheet to draw further analysis.
- Screen Scraping data helps with data mining and business intelligence.
More References
Extract Automatically – no more copy and paste
iHarvest can save you hours and hours of manual effort.
If you have a Screen Scraping project / idea? Contact iHarvest today, we’ll happily discuss your idea and take a look at the web site you want to extract data from. Initially we’ll help you establish how scrape-able the data is, again, its 100% no obligation.