Enhance Your Jobs Feeds with Our New Salary API Get It Today!

Job web spider service FAQ

What is job wrapping (scraping) service? What does job wrapping software do?

Job wrapping: job data scraping software navigates career website pages and collects jobs to convert and re-post them to client job board database. Job wrapping process in detail.

Frequently asked questions SpiderMount job wrapping

How does spider extract jobs from employer website?

Client supplies employer website URL to scrape jobs from. Spider software crawls job URLs and saves pages source HTML content.

Spidering software parsing module extracts jobs data from HTML and converts it into a format used by a target job board database. Data becomes available to be posted to job board job fields mapped.

As per screenshot above:

  • Job ID from employer website is put into “Job Ref” fields of a target job board.
  • Location on employer website contains State and City. Parsing Module normally divides these places them into corresponding fields of a job board.
  • Job Description: parsing module either saves job description exact HTML or converts it to a desired format so it is displayed correctly by target job board.

Source pages updates monitoring: spider technology tracks source website changes and reports updates to support team. Updates are applied to ensure correct job data scraping and parsing.

Can we scrape only specific jobs?

Yes, spider includes powerful filtering tools. For example:

  • filter jobs with specific keywords
  • filter jobs by category, job type etc
  • filter specific jobs by Ref number or URL
  • scrape only latest jobs i.e. newest 50 jobs
  • and others.

Will the scraped jobs look the same as on the source site and as my other job adverts on destination job board?

Yes, normally source website job formatting and HTML are converted and cleaned up so that desired target job board format is used and job seekers have the same experience as with jobs posted manually.

Job description is normally spidered as HTML (includes font / paragraph formatting). Destination advert formatting (HTML) normally is identical to source formatting unless cleaned up. But CSS / styles used on destination board can be different to ones sourced. I.e. headings on the source and destination might not look identical whilst the general formatting is preserved.

Plain text info / dropdown listings are matched with source website (i.e. job title, employment type, location).These will look identical to other job adverts published manually on destination job board.

Job spider helps clean up source HTML via following options:

  • Remove all HTML tags or keep only some of them (i.e. remove all except <br />, <strong>, <div>)
  • Make conversion from HTML to plain text
  • Replace specific HTML content

How can I sort jobs out on my website if I have predefined lists of states or categories?

Spider tool can match spidered keyword data with your lists / IDs.
For example:

  • Source website indicates Employment Type as “Full-Time”.
  • Your job board has another naming for this item – “Permanent” which is the same as “Full time”. Job spider can be set up to match this data and sort out jobs on your job board in the proper way.
  • This means that all “Full-Time” jobs will be posted to “Permanent” section because these two terms are matched in spider settings.

Once the posting expires on employer’s website, is it automatically removed from my Job Board?

Yes, job spider can be set to “Synchronization” mode:

Job spider revisits jobs on Employer website and expires them on your job board once the jobs are removed from source website / URL.

What happens when a candidate applies to a scraped job?

Spider can be set up to save a job or apply button URL.

As per your choice / job board application process:

  • This URL can be used for candidate redirect to a source website.
  • If job URL is not available – generic Employer application URL can be used.
  • Your job board application process can be utilized instead.

job-wrapping-frequently-asked-questions

Are the applications saved / logged in job board database?

It is based on your job board software settings.

Will candidates be always redirected to specific job page on Employer website? Or to an application page?

Candidates can be redirected to a job (or job-specific application form) only if Employer website navigation / URL structure offers this data.

If source jobs are not located on unique URLs: candidates will be redirected to any default page specified (i.e. a job search form on employer website).

Will Employers get application notifications?

Based on your job board software setting.

How often can spider revisit Employer website for job updates?

Spidering / posting sessions can be scheduled on hourly, daily or weekly basis. On specific weekdays and a certain time of the day. Also real-time updates are possible to reduce job distribution lag.

Can any website be spidered?

99% of career centers are technically OK to scrape.

Remaining 1% are the websites based on technology that blocks access for either jobs browsing or content retrieval. In this case alternative options can be considered (i.e. XML feed, CSV file, FTP access, etc).

Examples of source sites that are hard or impossible to configure initially:

  • Source website admin detects and blocks spider or scraping protection is implemented. In this case agreement with site owner required and XML feed alternative is recommended for such sites.
  • Vacancies in protected PDF format. Sometimes can be spidered when job PDF files are not protected by PDF security restrictions and are uniform. Normally takes extra configuration effort.
  • HTTPS site with an incorrect certificate.
  • Flash-based website (cannot be spidered).
  • Sites without uniform structure where all data is placed without formatting i.e. page just manually pasted from some text processor.
  • Sites with over 500k jobs could be problematic as it takes too long to download original data or to send to recipient API. XML feed alternative is recommended for such sites. Such sources are to be evaluated first.

How do I set up job scraping for my job site software?

Job spider can post jobs via existing bulk posting interface (API) available on your job board (i.e. Broadbean, Idibu).

Alternative options for job posting to 3rd party

Custom posting options can be mapped for target job boards having no standard XML via HTTP interface.

Client target job board can have:

  • unique XML or CSV format mapped (based on target job board database settings)
  • posting organized via FTP/sFTP, email send, grab XML from our location

See full description on SpiderMount job wrapping API

Please describe the process of managing scrapes with SpiderMount.

Scrapes are managed via the secure client dashboard:

  • New scrape setup requests submitted via the dashboard, email or Scrapes API
  • Scraping task is evaluated and scheduled by our configuration team
  • Once configuration is done – scrape is sent for an approval
  • Once a scrape is live it is shown in the dashboard together with stats, manual restart feature, change end date, commenting etc
  • Task tracking, assignments etc is also a part of the dashboard for an ongoing tasks and changes.

Which job board software platforms SpiderMount can post jobs to / integrated with?

  • JobMount
  • Jobboard.io
  • Madgex
  • Matchwork
  • SmartJobBoard
  • WordPress-based job boards (via WpPALLImport)
  • Taleo
  • iCIMS
  • Brassring
  • virtually any other type of site.

SpiderMount posts jobs to most of job board software platforms and custom job sites via both commonly used bulk posting APIs (BroadBean, eQuest, etc.) and custom APIs. More details: SpiderMount API.