High performance lib

Hello All,

I need to write a block file store library for work. It has to be fast, because I need to saturate a 10Gbe link. It is basically just storing file blocks in an sqlite3 database with sha256 hashes. I also need to create a server application to serve files from the block store. I started to write this in C, but there isn’t really any heavy cpu related stuff other than sha256 and LZ4 compression (which I assume is probably cffi anyway). I’m thinking about using Lisp for this project, because the requirements aren’t completely nailed down and Lisp is so great in situations like this. I have never really tried to write anything that is performance critical in Lisp. This is a rare situation where I can use any language I wish. Would someone please give me some advice on whether it would be possible to get the performance I need?

I was considering actor based concurrency for the file server. Does anyone know if someone has written a scheduler which maps green threads to native threads (like erlang)? I’ve done some looking around, but I haven’t found anything yet.

Thanks,

-G

RE: High performance lib

Hi Jerome,

I am aware of bordeaux-threads, but I never looked at it in depth. I will definitely take a look. I appreciate the pointer.

Thanks,

-G

From: Jerome Chan <eviltofu@mac.com>
Sent: Saturday, May 23, 2020 10:00 PM
To: Gerry Weaver <gerryw@compvia.com>
Subject: Re: High performance lib

Have you taken a look at https://github.com/sionescu/bordeaux-threads ? There is a LispWorks implementation. I’ve not been able to play with this in LispWorks as the personal edition does not allow me to load it via quicklisp.

On 24 May 2020, at 10:23, Gerry Weaver <gerryw@compvia.com> wrote:

Hello All,

I need to write a block file store library for work. It has to be fast, because I need to saturate a 10Gbe link. It is basically just storing file blocks in an sqlite3 database with sha256 hashes. I also need to create a server application to serve files from the block store. I started to write this in C, but there isn’t really any heavy cpu related stuff other than sha256 and LZ4 compression (which I assume is probably cffi anyway). I’m thinking about using Lisp for this project, because the requirements aren’t completely nailed down and Lisp is so great in situations like this. I have never really tried to write anything that is performance critical in Lisp. This is a rare situation where I can use any language I wish. Would someone please give me some advice on whether it would be possible to get the performance I need?

I was considering actor based concurrency for the file server. Does anyone know if someone has written a scheduler which maps green threads to native threads (like erlang)? I’ve done some looking around, but I haven’t found anything yet.

Thanks,

-G

Re: High performance lib

Hi Gerry,

I have done a ton of high performance code from Lisp - mostly for signal and image processing. But while working on the open source Emotiq / Stegos effort we used the Actors system - which I happen to think is a terrific way to go.

For file I/O I have used memory mapped file I/O for very high persistent store performance. And LZ4 doesn’t need to call FFI - it can be done quite easily from high level Lisp. For Sha256 there is the Ironclad library in high level Lisp. Going to C might buy you a bit of speed, but I doubt that will be the bottleneck in your code.

Instead, I think your biggest challenge will be the network connection at 10 Gb. Networking in Lisp is still a bit of a mash up…

- DM

On May 23, 2020, at 7:21 PM, Gerry Weaver <gerryw@compvia.com> wrote:

Hello All,

I need to write a block file store library for work. It has to be fast, because I need to saturate a 10Gbe link. It is basically just storing file blocks in an sqlite3 database with sha256 hashes. I also need to create a server application to serve files from the block store. I started to write this in C, but there isn’t really any heavy cpu related stuff other than sha256 and LZ4 compression (which I assume is probably cffi anyway). I’m thinking about using Lisp for this project, because the requirements aren’t completely nailed down and Lisp is so great in situations like this. I have never really tried to write anything that is performance critical in Lisp. This is a rare situation where I can use any language I wish. Would someone please give me some advice on whether it would be possible to get the performance I need?

I was considering actor based concurrency for the file server. Does anyone know if someone has written a scheduler which maps green threads to native threads (like erlang)? I’ve done some looking around, but I haven’t found anything yet.

Thanks,
-G

Re: High performance lib

Hi Gerry

Start with simple experiments, then characterize them.

Answer these questions:

- What is your biggest unknown?

- What is the cheapest experiment you can do to better understand that unknown?

- How can you characterize that experiment and its results?

Then, repeat.

For example, I think your biggest unkown is speed.

Speed of what?

Simple experiment - how fast can you pull characters out of SQLite?

Simple experiment - how fast can you push characters down the throat of the link?

Simple experiment(s) - increase the buffer size from single characters to ??? (1K, 100K, 1M, 1G)

Simple experiment - which operating systems / languages / products let you get at SMP?

Simple question - will you need MMAP? What will make this less painful?

Simple characterization (for speed-related unknowns) : profiling.

Simple question - build vs. buy. What can you buy in (from, say, github) that will help you? What do you need to control (i.e. build yourself, instead of githubbing)?

My gut says that using C instead of Lisp won't buy you nearly as much as getting rid of the operating system (and SQLite?).

When I needed bare-metal speeds, I wrote device drivers. I used Harel Statecharts to avoid spaghetti code.[1]

Simple question - can you find any ethernet stacks written on top of any operating system? Are such stacks written only at the device driver level? How does your problem compare to ethernet speeds?

Green threads are closure wannabes. (Greenspun's 10th Rule).

Red threads are green-thread wannabes, with the added complication of Time-Sharing (full preemption). Time-sharing is the reason that multi-tasking has a bad name. AFAIK, Erlang is based on the Time-sharing paradigm (using call-counting, instead of timer interrupts).

Simple question - can you get away with using Bordeaux Threads? It's just an abstraction, a union of what is available in various implementations. Create an experiment. Can you get away with using any Actors libraries?

Other places to look: Doug Hoyte's "Anti-Web". A web server that doesn't use O/S threads, AFAIK.

pt

[1] Threads/Processes are state machines. State is scurried away on the stacks. Separate stacks for processes are overheads. O/S'es hide these overheads from you. Separate stacks are OO. OO is a skin on closures. OO targets data only, and screws up control-flow. The worst problem in Software is Design Intent and Structure (aka "Architecture"). Speed problems have tools (profilers). Spaghetti Architectures do not have tools, AFAIK. HLLs are the best that we have. And self-discipline.

Re: High performance lib

Hi Paul,

great answer, I personally havent tried this, but read that a lot of overhead is at OS/user process boundary especially in networking (copying data between kernel space and use space) and there is a possibility to use “unikernel” solution.. But I wonder if anyone here has experience running lispworks application as unikernel.

The networking guys were telling me that the difference between unikernel and standard solution (few years back) was quite dramatic in their use case. Like going from 20MB a second 400MB a second, just by dropping the kernel userspace barrier copy problem in unikernel solution.

Ladislav Koščo

+421-949-49-36-36

On 24 May 2020, at 12:19, paul tarvydas <paultarvydas@gmail.com> wrote:

Hi Gerry

Start with simple experiments, then characterize them.

Answer these questions:

- What is your biggest unknown?

- What is the cheapest experiment you can do to better understand that unknown?

- How can you characterize that experiment and its results?

Then, repeat.

For example, I think your biggest unkown is speed.

Speed of what?

Simple experiment - how fast can you pull characters out of SQLite?

Simple experiment - how fast can you push characters down the throat of the link?

Simple experiment(s) - increase the buffer size from single characters to ??? (1K, 100K, 1M, 1G)

Simple experiment - which operating systems / languages / products let you get at SMP?

Simple question - will you need MMAP? What will make this less painful?

Simple characterization (for speed-related unknowns) : profiling.

Simple question - build vs. buy. What can you buy in (from, say, github) that will help you? What do you need to control (i.e. build yourself, instead of githubbing)?

My gut says that using C instead of Lisp won't buy you nearly as much as getting rid of the operating system (and SQLite?).

When I needed bare-metal speeds, I wrote device drivers. I used Harel Statecharts to avoid spaghetti code.[1]

Simple question - can you find any ethernet stacks written on top of any operating system? Are such stacks written only at the device driver level? How does your problem compare to ethernet speeds?

Green threads are closure wannabes. (Greenspun's 10th Rule).

Red threads are green-thread wannabes, with the added complication of Time-Sharing (full preemption). Time-sharing is the reason that multi-tasking has a bad name. AFAIK, Erlang is based on the Time-sharing paradigm (using call-counting, instead of timer interrupts).

Simple question - can you get away with using Bordeaux Threads? It's just an abstraction, a union of what is available in various implementations. Create an experiment. Can you get away with using any Actors libraries?

Other places to look: Doug Hoyte's "Anti-Web". A web server that doesn't use O/S threads, AFAIK.

pt

[1] Threads/Processes are state machines. State is scurried away on the stacks. Separate stacks for processes are overheads. O/S'es hide these overheads from you. Separate stacks are OO. OO is a skin on closures. OO targets data only, and screws up control-flow. The worst problem in Software is Design Intent and Structure (aka "Architecture"). Speed problems have tools (profilers). Spaghetti Architectures do not have tools, AFAIK. HLLs are the best that we have. And self-discipline.

Re: High performance lib

… yep, that sounds right. One of my old Profs developed something called the X Kernel (not X Windows) that explicitly avoided buffer copying for high performance networking.

- DM

On May 24, 2020, at 10:49 PM, laci.kosco@gmail.com wrote:

Hi Paul,
great answer, I personally havent tried this, but read that a lot of overhead is at OS/user process boundary especially in networking (copying data between kernel space and use space) and there is a possibility to use “unikernel” solution. But I wonder if anyone here has experience running lispworks application as unikernel.

The networking guys were telling me that the difference between unikernel and standard solution (few years back) was quite dramatic in their use case. Like going from 20MB a second 400MB a second, just by dropping the kernel userspace barrier copy problem in unikernel solution.

Ladislav Koščo
+421-949-49-36-36

On 24 May 2020, at 12:19, paul tarvydas <paultarvydas@gmail.com> wrote:

Hi Gerry

Start with simple experiments, then characterize them.

Answer these questions:

- What is your biggest unknown?

- What is the cheapest experiment you can do to better understand that unknown?

- How can you characterize that experiment and its results?

Then, repeat.

For example, I think your biggest unkown is speed.

Speed of what?

Simple experiment - how fast can you pull characters out of SQLite?

Simple experiment - how fast can you push characters down the throat of the link?

Simple experiment(s) - increase the buffer size from single characters to ??? (1K, 100K, 1M, 1G)

Simple experiment - which operating systems / languages / products let you get at SMP?

Simple question - will you need MMAP? What will make this less painful?

Simple characterization (for speed-related unknowns) : profiling.

Simple question - build vs. buy. What can you buy in (from, say, github) that will help you? What do you need to control (i.e. build yourself, instead of githubbing)?

My gut says that using C instead of Lisp won't buy you nearly as much as getting rid of the operating system (and SQLite?).

When I needed bare-metal speeds, I wrote device drivers. I used Harel Statecharts to avoid spaghetti code.[1]

Simple question - can you find any ethernet stacks written on top of any operating system? Are such stacks written only at the device driver level? How does your problem compare to ethernet speeds?

Green threads are closure wannabes. (Greenspun's 10th Rule).

Red threads are green-thread wannabes, with the added complication of Time-Sharing (full preemption). Time-sharing is the reason that multi-tasking has a bad name. AFAIK, Erlang is based on the Time-sharing paradigm (using call-counting, instead of timer interrupts).

Simple question - can you get away with using Bordeaux Threads? It's just an abstraction, a union of what is available in various implementations. Create an experiment. Can you get away with using any Actors libraries?

Other places to look: Doug Hoyte's "Anti-Web". A web server that doesn't use O/S threads, AFAIK.

pt

[1] Threads/Processes are state machines. State is scurried away on the stacks. Separate stacks for processes are overheads. O/S'es hide these overheads from you. Separate stacks are OO. OO is a skin on closures. OO targets data only, and screws up control-flow. The worst problem in Software is Design Intent and Structure (aka "Architecture"). Speed problems have tools (profilers). Spaghetti Architectures do not have tools, AFAIK. HLLs are the best that we have. And self-discipline.