Distributed computing
A framework for distributed computing -- that is, running jobs across multiple machines.
Compute core API version 1
Functions that should be exposed, without mangling, by the compute core .so
library:
-
int32_t worker_init(int32_t version)
-
Will be called when initialising the library. The integer argument is the version of this API specification; the library should check that it is equal to the expected value.
Should return 0 on successful initialisation, or nonzero if an error occurred.
-
-
int32_t worker_run_job(uint64_t size, void *data, uint64_t *outsize, void **outdata)
-
Run a job. The job data is specified in the data blob pointed to by
data
, which issize
bytes in size. The worker is allowed to modify the memory behinddata
during execution of this function, but the memory will be deallocated as soon asworker_run_job
returns.A pointer to memory containing the computed results should be stored in
outdata
, and its size inoutsize
. When the memory allocated for the output data may be freed,worker_free_outdata
will be called. If there is no data to return, for example because an error occurred, store 0 inoutsize
and a null pointer inoutdata
.Should return 0 on successful execution, or nonzero if an error occurred.
-
-
void worker_free_outdata(uint64_t size, void *outdata)
- Free memory allocated for the output data of a job. This is called when the memory for the output data of the last job is no longer needed, and will always be called before the next job is started.
Note that there is no function called before unloading the library. If you need
such a thing, please use destructors, or unload necessary things in
worker_free_outdata
.
Worker socket protocol version 1
All integers in the below description are little-endian.
Common data types used in the message descriptions below:
- String/Blob: 8-byte unsigned integer indicating the length of the data, then that many bytes making up the string or blob. A string is valid UTF-8, while a blob can contain arbitrary data.
A message from controller to worker has the following format:
- Message type [1 byte]
- ID [8 bytes]
- Payload length [8-byte unsigned integer]
- Payload [variable length and contents]
A response from worker to controller has the following format:
- Response type [1 byte]
- ID of message replied to [8 bytes]
- Payload length [8-byte unsigned integer]
- Payload [variable length and contents]
The possible response types are the following:
0x01
: Successful response to a message, as described in the table of message types.0xff
: An error response; something went wrong. The entire payload is an UTF-8 error message.
The possible message types are the following:
-
0x01
: Version exchange- Payload: 4-byte unsigned integer, the protocol version of the server. In this version, this is 1.
- Successful response: 1 byte, 1 if the version is accepted by the worker, 0 if not. If the version is not accepted, the connection is closed by both sides.
-
0x02
: New compute core- Payload: A string giving the name of the compute core, then a blob giving
the contents of a dynamic library file that can be loaded at runtime, e.g.
a
.so
file. This library will be loaded as the compute core for the worker. - Successful response: Empty.
- Payload: A string giving the name of the compute core, then a blob giving
the contents of a dynamic library file that can be loaded at runtime, e.g.
a
-
0x03
: New job- Payload: An 8-byte unsigned integer giving the ID of the job, then a blob giving the input data for the compute core.
- Successful response: A 4-byte signed integer giving the exit code of the job as returned by the compute core, then a blob giving the output data.