Job Wrapping | Scraping 

Accurate & effortless job wrapping from career websites. Wrapping quality monitoring and support. Request a quote.

 

Archive for the ‘Configuration’ Category

Job Spider XML Interface For Bulk Posting

Job Spider auto-posts jobs packaged into XML file via HTTP.  A specific URL either obtains new jobs or removes expired ones. Whole process is automatically managed by Job Spider  scheduler.

 

Download XML/HTTP bulk posting interface description.

Synchronize Jobs Via Incremental Scraping

Keep your job listings up-to-date with source Employer websites by utilizing Synchronization feature of Job Spider. The tool will make sure only new jobs are added to your system and expire the vacancies removed from source websites.


Incremental scraping
Synchronization feature provides incremental downloads to make sure you don’t overload source websites with excessive requests, but only download new jobs added to system.


Job Sync requirements:
Job Sync feature requires HTTP/XML posting interface on a target job board.
The interface is to provide “Add” and “Remove” commands.
It is also to be able to return job ID to spider upon successful posting.


Sync process:
1. Spider runs search on Employer career center and assigns unique ID to each job / specific URL scraped.
2. Spider posts the job via HTTP/XML: posting interface of a Job Board returns unique job ID of each specific job posting.
3. Job Spider saves job ID from job board and links with Job ID of a Spider.
4. Next scraping sessions are done by spider. If the job is gone from job search results: job spider sends Remove ID command to job board XML interface.
5. Job board removes/expires the job.



Job Spider screenshots:


Activate Synchronization for any scraping package/website:

Set job URL to be a unique identifier:

Resulting scraping sessions list:

Hit “Items” to view jobs downloaded:

Replicating URLs (jobs scraped in earlier sessions) were not downloaded (old status):

Updates to the stored job data:
- New: new jobs scraped during this session
- Deleted: jobs removed from source website


Posting to your job board / database:

Job Spider will synchronize jobs data with your recipient website or database.

Job ID from job board  (Received ID) will be received whilst posting and matched to Spider job ID (Entity ID):

Jobs deleted from client source website will be removed from job board.

Integrate ATS With No APIs

Job Spider is a simple option to avoid complex ATS integration. Retrieve jobs daily by opening online vacancy listings, extract job data and forward application to desired URL.

 

Some Applicant Tracking Systems do not provide API or XML export feature, but do publish vacancy listings in pre-defined formats.

 

Job Spider can be configured for grabbing the full lists or selected openings, parsing the content and publishing extracted data to your career site or job board.

Posting To Multiple Websites

A new feature is added to Job Spider to allow you to aggregate jobs and post to multiple websites of yours. 

 

Aggregrate jobs from various sources, filter out by keywords, set up various XML or CSV posting formats for recipient websites and post.

 

Frequency of scraping

Run your scraping sessions:
- either automatically daily or weekly
- or run manual scrapes when required

Job Scraping Example

Basic employer job search form usage, vacancy list spidering, parsing and XML generation example:

 

1. Job search form and list of jobs to spider:

- scrape all jobs or specify search criteria
- schedule for daily or weekly spidering
- filter out by desired / non-desired keywords

list of jobs

 

2. Job page downloading instructions abstract (Job Spider tool):
Job scrape

 

3. Example of a job advert scraped:
Fields highlighted will be extracted as per instructions below.

Job content

 

4. Source of the job to parse: 
HTML and Javascript tags are used to identify job content.

job html

 

5. Job Spider configuration instructions for parsing:
Regular expressions are used for flexible content extraction from HTML source.

Parsing rules

 

6. Resulting XML to be auto-posted to your job board interface:
Match XML file to your job board fields for correct posting.

Job XML

Redundant HTML Tags Removal From Content

It is not a rare occasion for the jobs downloaded from Employer website to hold redundant HTML tags like <b>, <br>, <font>, etc. For sometimes jobs are posted by copy-pasting from MS Word to WYSIWYGs…

 

We have added a tag clean up feature to Job Spider to solve the issue so excessive  formatting does not compromise job descriptions, page titles and headings  presentation on your job board.

Selective Job Posting: Filtering Results

Jobs spidered from source websites are not alway a perfect fit for your job board niche. SpiderMount Job Spider solves the issue by adding must have / must not have filtering criteria for job parsing. So not all of the jobs downloaded from employer site would get to your jobs database.
 

Any job field parsed can be automatically checked so only relevant jobs are then posted to your job board.
  

Example: filtering criteria configuration for job title 

Jobs HTML Scraping & Preview

Once you have configured your scraping algorithm, run the test task to spider the jobs pages HTML:
 

Once the task is complete: review log and resulting File List

 

File List holds scraping session result: HTML files of the jobs allocated

 

Resulting HTML files are ready for parsing and posting to your job board.

Job Scraping: Advanced Job Search Form

Some career sites would require you to fill out a job search form to retrieve required vacancies. 

 

Here is a preview of one of the implementations:

 

Job Spider scraping configuration example:

 

Target sector fields to fill out (part of the search form):
 

 

Spider configurations for form elements choices:

Job Posting: XML Interface To Your Job Board

Job Board owners are currently offered 2 options of publishing jobs to their websites:

 

1. Job Board XML interface
Your  job board has Broadbean or other XML posting interface.
 

Job spider is configured to match your job / XML format:

 

Example of the source job and parsed content passed via XML:

 

 2. CSV file (MS Excel spreadsheet)
Your job board can have the jobs uploaded via CSV format.
SpiderMount saves the jobs downloaded into a CSV spreadsheet.  It can be then uploaded to your job board.


© Copyright 2008. All rights reserved
Aspen Technology Labs