aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 256ecd22d6764d07456fe27ff9790186a0b2cb7f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# Distributed computing

A framework for distributed computing -- that is, running jobs across multiple
machines.

## Compute core API version 1

Functions that should be exposed, without mangling, by the compute core `.so`
library:

- `int32_t worker_init(int32_t version)`
  - Will be called when initialising the library. The integer argument is the
    version of this API specification; the library should check that it is
    equal to the expected value.

    Should return 0 on successful initialisation, or nonzero if an error
    occurred.

- `int32_t worker_run_job(uint64_t size, void *data, uint64_t *outsize, void **outdata)`
  - Run a job. The job data is specified in the data blob pointed to by `data`,
    which is `size` bytes in size. The worker is allowed to modify the memory
    behind `data` during execution of this function, but the memory will be
    deallocated as soon as `worker_run_job` returns.

    A pointer to memory containing the computed results should be stored in
    `outdata`, and its size in `outsize`. When the memory allocated for the
    output data may be freed, `worker_free_outdata` will be called. If there
    is no data to return, for example because an error occurred, store 0 in
    `outsize` and a null pointer in `outdata`.

    Should return 0 on successful execution, or nonzero if an error occurred.

- `void worker_free_outdata(uint64_t size, void *outdata)`
  - Free memory allocated for the output data of a job. This is called when the
    memory for the output data of the last job is no longer needed, and will
    always be called before the next job is started.

Note that there is no function called before unloading the library. If you need
such a thing, please use [destructors][1], or unload necessary things in
`worker_free_outdata`.

## Worker socket protocol version 1

All integers in the below description are little-endian.

Common data types used in the message descriptions below:
- **String**/**Blob**: 8-byte unsigned integer indicating the length of
  the data, then that many bytes making up the string or blob. A string is
  valid UTF-8, while a blob can contain arbitrary data.

A **message from controller to worker** has the following format:
- Message type [1 byte]
- ID [8 bytes]
- Payload length [8-byte unsigned integer]
- Payload [variable length and contents]

A **response from worker to controller** has the following format:
- Response type [1 byte]
- ID of message replied to [8 bytes]
- Payload length [8-byte unsigned integer]
- Payload [variable length and contents]

The possible **response types** are the following:
- `0x01`: Successful response to a message, as described in the table of
  message types.
- `0xff`: An error response; something went wrong. The entire payload is an
  UTF-8 error message.

The possible **message types** are the following:

- `0x01`: Version exchange
  - Payload: 4-byte unsigned integer, the protocol version of the
    server. In this version, this is 1.
  - Successful response: 1 byte, 1 if the version is accepted by the worker, 0
	if not. If the version is not accepted, the connection is closed by both
	sides.

- `0x02`: New compute core
  - Payload: A string giving the name of the compute core, then a blob giving
	the contents of a dynamic library file that can be loaded at runtime, e.g.
	a `.so` file. This library will be loaded as the compute core for the
	worker.
  - Successful response: Empty.

- `0x03`: New job
  - Payload: An 8-byte unsigned integer giving the ID of the job, then a blob
    giving the input data for the compute core.
  - Successful response: A 4-byte signed integer giving the exit code of the
	job as returned by the compute core, then a blob giving the output data.


[1]: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-destructor-function-attribute