Given that, you might try increasing the amount of time that alfserver waits before escalating to SIGKILL. One second is a long time for a process to be unresponsive. Try small increments, for example, in alfserver.ini change this line:
SetPref timerAlfserverSignalEscalation 3
Also, the log indicates that you are running the server as root, which would be ok if you were causing the actually-launched apps to run as a different user. I think you’ve got “AlfProcessOwnerConfig login root” set in your alfserver.ini ? There is a chance that the “login” mode is adding an extra layer of complexity to this issue.
Usually we suggest you run your alfserver in one of two modes:
mode A:
- run alfserver as root
- use “AlfProcessOwnerConfig setuid $dispatcherUser”
- set up env configuration in alfserver.ini
mode B:
- run alfserver as some other user (like create an “alfserver” account), with a well-defined operating environment
- use “AlfProcessOwnerConfig server” in alfserver.ini
The above is a quote from Pixar’s replying to alfserver’s configuration.
In Digimax, we use mode A essentially but using a little tricky way to handle environment variables. It is common for an animation studio to have 2 or more productions concurrently using different versions of softwares (or renderers). For example, project A uses prman 13.3 while project B needs prman 14.0. Different softwares with different versions have different environment variable configurations and they have to be very carefully designed to satisfy any situation and even any combinations (ex, maya 7.0 + prman 13.0, maya 8.5 + prman 13.5.4).
In order to handle this complicated situation for your alfserver (or render nodes in render farm), you have to put lots of “env” attributes in your alfserver.ini. The worse is environment is changing such that you have to change that “env” often and often. Some situations you need to re-configurate that “env”:
- Adding a new software in render farm. (ex, mental ray renderer)
- Adding a software with different version. (ex, houdini 9.0 and houdini 9.5)
- 2 production projects use the same softwares but different environments. (ex, different MAYA_PLUGIN_PATH)
There are lots of ways to handle those and what we have in Digimax is:
Wrap the job command
In default, alfserver will invoke the program (ex, prman, maya, nuke, …) directly after setting up some environment variables (in mode A). Here we just use an indirect way by wrapping the whole command (program + arguments).
For example, the original command is:
prman -Progress -t:2 "//file_server/.../ooxx.rib"
We wrap it into something like this:
alf_runner --prog prman --envkey rps-13.5 \
--envkey RenderMan_Studio-1.0.1-Maya8.5 -- \
-Progress -t:2 "//file_server/.../ooxx.rib"
Prepare a script to handle all kinds of job running and environment configuration
Then, we prepare a script alf_runner which is something like this:
#!/bin/bash
###################
#
# a script cooperating with RenderMan programs to handel envkeys
# please refer to ~RPS/etc/alfserver.ini for the counterpart
#
# author: Shuen-Huei (Drake) Guan
#
# input: alf_runner
# --prog prman
# --envkey RenderMan_Studio-1.0.1-Maya8.5 --envkey prman-13.5
# -- -Progress "ooxx.rib"
# ouput: prman -Progress "ooxx.rib"
#
# note: take care of envkeys. their naming is very sensitive to lots of things.
###################
# parsing envkeys
#
OOXX=`getopt -o e: --long prog:,envkey: -- "$@"`
if [ $? != 0 ] ; then echo "Terminating..." >&2 ; exit 1 ; fi
eval set -- "$OOXX"
PROG=''
RAT=''
RPS=''
MAYA=''
while true ; do
case "$1" in
--prog)
##################
# get program name
PROG=$2 ;
shift 2 ;;
--envkey)
#################
# for each envkey, source different env file
case "$2" in
RenderMan_Studio-1.0.1-Maya8.5)
. "/.../rms_1.0.1_for_maya_8.5_env"
RAT=$2
shift 2 ;;
prman-13.0.3)
. "/.../rps_13.0_env"
RPS=$2
shift 2 ;;
prman-13.5.4)
. "/.../rps_13.5.4_env"
RPS=$2
shift 2 ;;
maya-8.5)
. "/.../maya_8.5_env"
MAYA=$2
shift 2 ;;
*)
echo $2
shift 2 ;;
esac ;;
--)
shift
break ;;
*)
echo "Internal error!"
exit 1 ;;
esac
done
if [ -z `which ${PROG}` ];
then
echo "can't find program: '${PROG}' @" `hostname`
else
exec ${PROG} $*
fi
# vim: ai sw=4 sts=4 et nu:
Then, we can just alter alf_runner and corresponding ooxx.env without changing alfserver.ini and restarting alfserver. That is, we can just make environment modification on-the-fly without any server restart!
Note. Alfserver is the service for remote job handling. It is part of RMS (RenderMan Studio) or RAT (Renderman Artists Tools) which is the trademark of Pixar.