Job wrapping: job data scraping software navigates career website pages and collects jobs to convert and re-post them to client job board database. Job wrapping process in detail.
Client supplies employer website URL to scrape jobs from. Spider software crawls job URLs and saves pages source HTML content.
Spidering software parsing module extracts jobs data from HTML and converts it into a format used by a target job board database. Data becomes available to be posted to job board job fields mapped.
As per screenshot above:
Source pages updates monitoring: spider technology tracks source website changes and reports updates to support team. Updates are applied to ensure correct job data scraping and parsing.
Yes, spider includes powerful filtering tools. For example:
Yes, normally source website job formatting and HTML are converted and cleaned up so that desired target job board format is used and job seekers have the same experience as with jobs posted manually.
Job description is normally spidered as HTML (includes font / paragraph formatting). Destination advert formatting (HTML) normally is identical to source formatting unless cleaned up. But CSS / styles used on destination board can be different to ones sourced. I.e. headings on the source and destination might not look identical whilst the general formatting is preserved.
Plain text info / dropdown listings are matched with source website (i.e. job title, employment type, location).These will look identical to other job adverts published manually on destination job board.
Job spider helps clean up source HTML via following options:
Spider tool can match spidered keyword data with your lists / IDs.
Yes, job spider can be set to “Synchronization” mode:
Job spider revisits jobs on Employer website and expires them on your job board once the jobs are removed from source website / URL.
Spider can be set up to save a job or apply button URL.
As per your choice / job board application process:
It is based on your job board software settings.
Candidates can be redirected to a job (or job-specific application form) only if Employer website navigation / URL structure offers this data.
If source jobs are not located on unique URLs: candidates will be redirected to any default page specified (i.e. a job search form on employer website).
Based on your job board software setting.
Spidering / posting sessions can be scheduled on hourly, daily or weekly basis. On specific weekdays and a certain time of the day. Also real-time updates are possible to reduce job distribution lag.
99% of career centers are technically OK to scrape.
Remaining 1% are the websites based on technology that blocks access for either jobs browsing or content retrieval. In this case alternative options can be considered (i.e. XML feed, CSV file, FTP access, etc).
Examples of source sites that are hard or impossible to configure initially:
Job spider can post jobs via existing bulk posting interface (API) available on your job board (i.e. Broadbean, Idibu).
Alternative options for job posting to 3rd party
Custom posting options can be mapped for target job boards having no standard XML via HTTP interface.
Client target job board can have:
See full description on SpiderMount job wrapping API
Scrapes are managed via the secure client dashboard:
SpiderMount posts jobs to most of job board software platforms and custom job sites via both commonly used bulk posting APIs (BroadBean, eQuest, etc.) and custom APIs. More details: SpiderMount API.
We noticed you mentioned scraping Indeed.com
Just to confirm: Indeed.com prohibits spidering of its content and they will block anyone trying to scrape it.
Normally, our clients ask us to spider jobs from direct employer websites and ATSes.
In some cases we can spider commercial job boards: if there is a formal agreement between our client and the job board to allow spidering.