aboutsummaryrefslogtreecommitdiff

Distributed computing

A framework for distributed computing -- that is, running jobs across multiple machines.

Compute core API version 1

Functions that should be exposed, without mangling, by the compute core .so library:

  • int32_t worker_init(int32_t version)

    • Will be called when initialising the library. The integer argument is the version of this API specification; the library should check that it is equal to the expected value.

      Should return 0 on successful initialisation, or nonzero if an error occurred.

  • int32_t worker_run_job(uint64_t size, void *data, uint64_t *outsize, void **outdata)

    • Run a job. The job data is specified in the data blob pointed to by data, which is size bytes in size. The worker is allowed to modify the memory behind data during execution of this function, but the memory will be deallocated as soon as worker_run_job returns.

      A pointer to memory containing the computed results should be stored in outdata, and its size in outsize. When the memory allocated for the output data may be freed, worker_free_outdata will be called. If there is no data to return, for example because an error occurred, store 0 in outsize and a null pointer in outdata.

      Should return 0 on successful execution, or nonzero if an error occurred.

  • void worker_free_outdata(uint64_t size, void *outdata)

    • Free memory allocated for the output data of a job. This is called when the memory for the output data of the last job is no longer needed, and will always be called before the next job is started.

Note that there is no function called before unloading the library. If you need such a thing, please use destructors, or unload necessary things in worker_free_outdata.

Worker socket protocol version 1

All integers in the below description are little-endian.

Common data types used in the message descriptions below:

  • String/Blob: 8-byte unsigned integer indicating the length of the data, then that many bytes making up the string or blob. A string is valid UTF-8, while a blob can contain arbitrary data.

A message from controller to worker has the following format:

  • Message type [1 byte]
  • ID [8 bytes]
  • Payload length [8-byte unsigned integer]
  • Payload [variable length and contents]

A response from worker to controller has the following format:

  • Response type [1 byte]
  • ID of message replied to [8 bytes]
  • Payload length [8-byte unsigned integer]
  • Payload [variable length and contents]

The possible response types are the following:

  • 0x01: Successful response to a message, as described in the table of message types.
  • 0xff: An error response; something went wrong. The entire payload is an UTF-8 error message.

The possible message types are the following:

  • 0x01: Version exchange

    • Payload: 4-byte unsigned integer, the protocol version of the server. In this version, this is 1.
    • Successful response: 1 byte, 1 if the version is accepted by the worker, 0 if not. If the version is not accepted, the connection is closed by both sides.
  • 0x02: New compute core

    • Payload: A string giving the name of the compute core, then a blob giving the contents of a dynamic library file that can be loaded at runtime, e.g. a .so file. This library will be loaded as the compute core for the worker.
    • Successful response: Empty.
  • 0x03: New job

    • Payload: An 8-byte unsigned integer giving the ID of the job, then a blob giving the input data for the compute core.
    • Successful response: A 4-byte signed integer giving the exit code of the job as returned by the compute core, then a blob giving the output data.