Mastermind. Brain. Call it whatever you want. Algorithms and all that good stuff that helps us create the Job Web
An algorithm that allows us to mine and discover inter-dependencies and associations among jobs positions in a sector and across various sectors.
Finely-tuned algorithms that come under the family of association rule mining algorithm to uncover hidden patterns amongst job position trends conditioned on sector trends.
Using a two level data collection process (job position and sector trends), we are able to reduce noise in our novel job web. We condition the job position trends on the sector trends that the job position falls under.
State of the art Tokenization, Lemmatization, Stopwords and POS tagging to extract only the relevant information the large data dumps we have collected.
This distributed, open-source search and analytics engine has drastically reduced our query time and in short, made our lives easier.
It also accommodates all types of data from text and geospatial to unstructured data as well.
Apart from being simple and fast, it’s extremely scalable and therefore ties in well with the rest of our solution.
Postgres is another open-source database that is perfect for our processed and structured data.
This is the database our server directly interacts with. It’s highly extensible and like all our other technologies, a great fit for our solution.
Our historical data comes from many sources including, CEIC, Central Statistics Office and Statista just to name a few.
This data is crucial in understanding patterns and trends in order to see whether they recur in the future.
Our RSS bot scrapes live news in order for us to keep a track of all upcoming industry events.
It could be new offices, headquarters, layoffs or mergers. We want to make sure we’re up to date so that our users are better informed of jobs to be on the lookout for.
We use web scrapers to get job data from websites like Indeed and Monster.
This data enables us to predict the salary and requirements for future jobs that may come up by looking at the qualifications required by jobs that currently exist.
Hopefully we can partner with these companies in the future and refine our data even more.
To make sense of all our data and extract the relevant information we need, we use our summariser program
It simply takes our text file and gives us a concise understanding of the text without us having to waste a huge amount of time reading and understanding the entire article.
We came across a lot of scanned copies of information that we could not possibly parse.
OCR bot converts the text inside those images and scanned copies into machine readable text after which it’s ready to be put into our database.