How does the current PDF server interact with the system
When a view of a pdf occurs, the configuration is looked at as to what it should contain. This is usually a “view” of the candidate data (html printable view), along with a number of attachments, such as a CV.
The PDF server is used to generate a PDF of each of these items. Therefore, a PDF of the html printable view, and a PDF of each of the attached documents.
It then takes all of these PDFs and merges them into one.
It looks at date time stamps of the uploaded attached files and the html printable view to work out if it already has a previous PDF that is up to date, or if it needs to regenerate another one.
The actual functionality of the PDF server is to be given the URL or Document that needs converting, then it will browse to that page, or open that document in a suitable application and then “print” the file as a PDF. It will then return the resulting PDF. Therefore, if a word document is uploaded, it will load it in word, then print it as a PDF.
Limitations of the PDF server
Based on the above, the limitations are that, the PDF server may not be able to open various files or webpages. If for example, the user took an excel spreadsheet and renamed it to MyCV.Doc then the PDF server will not be able to convert that document to a PDF. Likewise if the document they uploaded was a zip file, or an executable, there may not be a way of displaying that as a PDF.
Possible root cause of performance problems
The PDF server has been written to expect everything it receives to be malicious or invalid. Its approach is one of a “try it” with anything that is uploaded. As there is absolutely no way of knowing that what has been uploaded is a genuine document or not.
Therefore, it is very likely that some documents could be uploaded that have word macro viruses in them, or are simply too big for an application to load. We have even seen instances where a candidate has uploaded a word document that is nothing more than a few thousand blank pages.
The server attempts to be very resilient against these things, but there will always be a chance that something is uploaded that could cause it to fail. Under such circumstances it will try to recover although occasionally this results in us having to reset the server.
Description of proposed new solution
The new PDF Server is exactly as above, however it is under a load balanced scenario. Therefore, there are multiple underlying PDF Servers available to undertake the actual work. In such an instance, should a PDF server become unstable then the work can be carried out by the remaining PDF Servers.
How will this improve performance problems?
This will address the performance problems in that there will always be more than one PDF Server available and therefore all of them would have to fail before we encountered a problem.
We are working on enhanced monitoring of these servers so that should one fail we can automatically restart it so that it can be brought back online without manual intervention.
Risk analysis of moving to the load balanced solution
The major risk is that this new mechanism has not been used under the same level of load that the current server is used.
We have load tested it and tried a number of scenarios, but the nature of the problems this server encounters are ones that are unpredictable.
The switch over to the new server is a single configuration setting. Whilst it does result in all logged in users and candidates being logged out of the system when this change is made, we are in a position to revert it back should we encounter any problems.