One of our clients has an XML feed with over 10K jobs. They need to find all “remote” and “work from home” jobs for IT specialists.
Our new job tagging feature allows us to create a separate category-based dictionary that we called “Remote jobs” and set specific tags for the XML feed to filter jobs with keywords such as “remote”, “work from home”, etc.
After going through all 10K jobs in the XML feed we received from the client, our spider software found all jobs with requested keywords in the job title and job description fields, and assigned the new “Remote jobs” category to each.
After further processing, our spider team excluded all jobs that didn’t have specific tags and “Remote jobs” category, and we were able to provide a clean feed with only remote jobs in a matter of minutes.
We noticed you mentioned scraping Indeed.com
Just to confirm: Indeed.com prohibits spidering of its content and they will block anyone trying to scrape it.
Normally, our clients ask us to spider jobs from direct employer websites and ATSes.
In some cases we can spider commercial job boards: if there is a formal agreement between our client and the job board to allow spidering.