Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modular design #15

Closed
foxcpp opened this issue Mar 10, 2019 · 19 comments
Closed

Modular design #15

foxcpp opened this issue Mar 10, 2019 · 19 comments

Comments

@foxcpp
Copy link
Owner

foxcpp commented Mar 10, 2019

#15 (comment)

--
Original post:

Let's say I configured IMAP endpoint as follows (where first line creates IMAP backend):

imap://127.0.0.1:1993 {
    sql sqlite3 maddy.db
}

Then I want to have SMTP endpoint that will deliver mail to same storage. How would I do this?

  1. Assuming that backend also provides implementation of SMTP upstream (perhaps as separate object).
smtp://127.0.0.1:1025 {
	sql sqlite3 maddy.db
}

This approach creates another set of problems, because it now requires two separate backend/upstream objects to coordinate access to the same storage (think of IMAP unilateral updates).
Global variables? External IPC sockets? All this seems to be dirty solution.

What is we can create one "storage" object and associate it with multiple IMAP/SMTP endpoints?
This will transform "another set of problems" into just serialization of access to storage object. Which is easily solved by throwing some mutexes into it (or even without them, I haven't tested that but go-sqlmail backend object should be safe for concurrent use by multiple goroutines).

It also reduces resources usage (we will have only one SQLite "connection" page cache, for example)

Now I can imagine something like this:

backend sql arbitrary_name {
  driver sqlite3
  dsn maddy.db
}

imap://127.0.0.1:1993 {
  backend arbitrary_name 
  # of course this requires storage object to implement go-imap's Backend interface
}

smtp://127.0.0.1:1025 {
  backend arbitrary_name 
  # and also go-smtp's Backend here now.
}

What do you think?

@emersion
Copy link
Collaborator

emersion commented Mar 10, 2019

Yeah, +1 from me. I wonder if we need some kind of Storage interface in maddy. This may make things like unilateral updates easier to handle.

@emersion
Copy link
Collaborator

Also we need to abstract away authentication

@foxcpp
Copy link
Owner Author

foxcpp commented Mar 11, 2019

Continuing idea of separation of backends from endpoints...

Module-based maddy design

Module concept

Each interface required by maddy for operation is provided by some object called "module".
This includes authentication, storage backends, DKIM, email filters, etc.
Each module may serve multiple functions. For example, go-sqlmail module could implement IMAP backend/storage, delivery to IMAP mailboxes (thus SMTP backend) and authentication. In order to use module you need to first create instance of it (read on).

Each module gets its own unique name (sqlmail for go-sqlmail, proxy for proxy module, local for local delivery perhaps, etc). Each module instance also gets its own (unique too) name which is used to refer to it in configuration. Both module and instance names are allowed to be any strings allowed in Caddyfiles without escaping (???, I don't know much about Caddyfile format, correct me here).

Endpoint listeners are modules too, they just don't implement any interface and just start listening on address from instance name after initialization.

Here is the most minimal interface for any module:

type Module interface {
  // Unique module name. Used in configuration and in logs.
  Name() string

  // Returns module version. May be printed to log and probably exposed to clients using extensions like IMAP ID.
  Version() string
}

type NewModule func(instName string, cfg caddyfile.???) error

And here is generic syntax for configuration:

module-name instance-name {
  module-configuration
}

Each block of this form creates new instance of module. Failure in module initialization (error returned by NewModule) is a fatal error and maddy will terminate after it.
Modules can refer to each other using application-global index. For example, when you specify auth. provider to use in endpoint's block by instance name, endpoint module can get instance object using this name.

'storage' interface

When you are storing information about email in modern world you definitely store it together with IMAP meta-data and in IMAP-friendly format. That's it. IMAP defines email storage structure. There is very small room for freedom. For this very reason we make "IMAP backend" and "storage" terms mean the same idea: Place where we can place emails and read it later. In this case, proxy is a kind of storage too: It stores email on a different server. To avoid confusion we will use only "storage" term in the future.

Basically, go-imap/backend.Backend interface with removed authentication. There are just GetUser(username string) backend.User instead.

IMAP extensions that modify/extend storage behavior or require knowledge of its state are handled here. These extensions are enabled on endpoint-level only if they are supported by 'storage' implementation.

'auth' interface

type AuthProvider interface {
  CheckPlain(username, password string) bool
}

'filter' interface

Used in SMTP pipeline to mutate or drop messages during processing.

type Filter interface {
  // Allowed to change body. If returns false - message is dropped.
  // opts are optional "context values" set in configuration, can be used to tweak
  // filter behavior on per-message basis.
  // if it returns non-nil error - message is dropped
  Apply(ctx *DeliveryContext, body *bytes.Buffer) error
}

Here DeliveryContext is a structure that contains basic information about SMTP client, SMTP envelope information (FROM, RCPT), opts set in configuration and arbitrary values that may be set by other filters.

'delivery' interface

Basically the same as filter except it is not allowed to change anything.

type Delivery interface {
  Deliver(ctx DeliveryContext, body bytes.Buffer) error
}

'imap' module

Configuration options: auth - sets auth provider to use, storage - sets storage backend to use.
Etc, tls, blah-blah.

SMTP pipeline

SMTP doesn't creates any restrictions on how we can process email. SMTP is basically "I give you that message and I want it to be seen by Alice and Bob, do whatever you need to get this done".

So we define "SMTP pipeline" concept here: Sequence of module instances that can transform messages how they want or probably save it somewhere or send it to a different server or all this at once. This allows users to construct infinitely complex chains to describe any logic they need.

There are several variables (like tls) and set of possible pipeline steps (described below).
auth variable sets auth provider to use. Pipeline steps are applied in order in which they are defined in config.

'filter' step applies instance_name filter to message, passing specified opts as first argument.

filter <instance_name> [opts]

'delivery' step pushes message to instance_name SMTP backend and continues processing (this is necessary to correctly support multiple recipients both local and remote).

delivery <instance_name> [opts]

Pipeline steps wrapped in match block run only if condition of match block matches.

match value-name pattern {
  other-pipeline-items
}

Adding no between value-name and match inverts condition.

Value-name can be one of these:

  • rcpt-domain - recipient's domain
  • rcpt - recipient's email
  • from - sender email
  • src-ip - IP of connected client
  • src-hostname - FQDN as reported by connected client in EHLO/HELO
    Pattern is not actually a pattern by default and value should just be equal to it. If you wrap it with forward slashes /like that/ then it is interpreted as regexp and partial match is enough.
    For values with multiple possible values (rcpt, rcpt-domain) only one match is required.
    TBD: Is it enough for most common cases?

Obviously, this stops processing:

stop

Continue processing if client is logged in anonymously or using account from auth. provider. Otherwise - send "access denied" error and stop.

require-anonymous-auth

Require successful authentication using account from set auth. provider (auth variable).

require-auth

Config example

Here is example of complete IMAP+SMTP server configuration:
(again, I'm very not sure about how much of this is allowed by Caddyfile syntax, correct me if I'm wrong)

# implements 'delivery', 'storage' and 'auth' interfaces.
sqlmail sqlstorage {
  driver sqlite3
  dsn /var/lib/maddy/maddy.db
}

# implements 'filter' interface
dkim dkim {
  public filepath
  private filepath
}

imap 0.0.0.0:993 {
  # Configuration variable
  tls auto
  
  auth sqlstorage
  storage sqlstorage
}

smtp 0.0.0.0:25 {
  # Configuration variables
  tls auto
  auth sqlstorage

  # SMTP pipeline definition.
  filter verify-hostname
  match rcpt-domain emersion.fr {
    require-anonymous-auth 
    filter dkim verify=true
    delivery sqlstorage
  }
  match no rcpt-domain emersion.fr {
    require-auth
    filter dkim sign=true
    delivery outgoing-queue
  }
}

This is nowhere complete proposal, just dumping some ideas for discussion. Any questions, additions or related ideas?

@foxcpp foxcpp changed the title Share storage backends between IMAP and SMTP Modular design Mar 11, 2019
@xeoncross
Copy link

xeoncross commented Mar 13, 2019

I would love to see support for a flexible pipeline as not all SMTP servers deliver emails to an end client via IMAP. Here is the main use-case I have been working on:

SMTP-to-_________ Gateway. This might be a HTTP postback/webhooks, NATS message queue, or logging system. The idea is that data is consumed from regular emails and pushed into a non-email system.

This would also allow many creative types of email delivery (IPFS storage, encryption, posting articles to a blog) in addition to queued processing for automated systems (logging, ticket creation, NLP, etc...)

If storage is flexible, and the pipeline chain-able, I can parse a streaming MIME message body in a memory safe way, encrypt the payloads, store it on S3 and further process it from another system.

@foxcpp
Copy link
Owner Author

foxcpp commented Mar 13, 2019

@emersion, I would like to know your opinion on proposed design. I think I'm done with basic ideas.

Will start experimenting with implementation of ideas stated above in my maddy fork.

foxcpp added a commit to foxcpp/go-imap-sql that referenced this issue Mar 14, 2019
@foxcpp
Copy link
Owner Author

foxcpp commented Mar 15, 2019

Alright, here we are hit by limitations of Caddyfile format.
We can't describe proposed SMTP pipeline configuration with it.
No nested blocks, repeated directives require a lot of crunches to be parsed at all.

What should we do? I would really like to switch to a different config format but I think it diverges too much from emersion's ideas about Caddy-like server.

@emersion
Copy link
Collaborator

The overall goal of the configuration format is to keep it as simple as possible while still allowing for more complex (not too complex) scenario. An example would be your smtp pipeline proposal: it's already difficult to configure the basic "put it in the IMAP storage and don't bother me" setup.

I'd also like to make things secure by default: no need to configure complex pipelines to get DKIM.

A problem with that is loosing customizability. Thoughts? I'll try to think of a better approach.

Your current approach looks pretty reasonable regardless. I think it'd be best to experiment with it and adjust it as needed. Here are a few more minor comments:

We can probably simplify dkim dkim { blocks to just dkim { (this is a DKIM block with an empty name).

Deliver(ctx DeliveryContext, body bytes.Buffer) error

I'd prefer to use streaming interfaces (io.Reader).


Sorry for taking so long to give feedback. While I'm pretty busy with IRL stuff right now (moving to a different country), I'd like to contribute too. Things will likely slow down in the next days/weeks.

If you want, you can join the ##emersion channel on Freenode to discuss.

@emersion
Copy link
Collaborator

emersion commented Mar 16, 2019

I would really like to switch to a different config format but I think it diverges too much from emersion's ideas about Caddy-like server.

I'm fine with using a different parser btw. Caddy's is tedious to use imho.

@foxcpp
Copy link
Owner Author

foxcpp commented Mar 16, 2019

The overall goal of the configuration format is to keep it as simple as possible while still allowing for >more complex (not too complex) scenario. An example would be your smtp pipeline proposal: it's >already difficult to configure the basic "put it in the IMAP storage and don't bother me" setup.
I'd also like to make things secure by default: no need to configure complex pipelines to get DKIM.

I guess we can have reasonably default pipeline configuration while still allowing user to redefine it if they are ok with increased complexity. Also we can get default set of backends (say, go-sqlmail with sqlite3 configured to store stuff at /var/maddy/messages.db).

smtp 0.0.0.0:25 {
  hostname emersion.fr
}

Expanding to something similar to what I shown as full config example.

This approach preserves full flexibility while making maddy almost zero-configuration.

We can probably simplify dkim dkim { blocks to just dkim { (this is a DKIM block with an empty name).

Except that it probably should be DKIM instance with "dkim" name because instance names should be unique.

I'd prefer to use streaming interfaces (io.Reader).

Probably we can pass io.Reader and io.Writer to filters.

@foxcpp
Copy link
Owner Author

foxcpp commented Mar 16, 2019

Support for additional SASL authentication methods

Used authentication module should implement at least plaintext authentication.
Additionally it may implement additional interfaces for other authentication methods.
If auth. module implements additional interface known to maddy - it will expose corresponding auth. method to clients (AUTH=METHOD capability for IMAP, for example).
Something like that for XOAUTH2:

type OAuthAuth interface {
  CheckOAuth(username, token string) bool
}

Default configuration

Unless user explicitly specifies auth. module instance to use we try to use default-auth or default (in that order).
Same goes for IMAP storage backend.

SMTP pipeline

If user didn't specified custom pipeline (no pipeline steps declarations in server block) and specified hostname - we use default pipeline that does the following:

  • Verifies that FQDN, IP and rDNS of the connected client match.
  • Applies DKIM verification for incoming messages.
  • Probably does something with SPF
  • Delivers messages with domain equal to our hostname to default-storage or default.
  • Adds DKIM signatures for outgoing messages.
  • Passes messages with recipients other than local ones to message queue (also defined as a module, default-queue or default).
Default modules

Unless overridden by a user, maddy adds default module that implements authentication, email storage and delivery target (perhaps go-sqlmail? :)). So you can then literally specify hostname and have it just work.

@emersion
Copy link
Collaborator

Probably does something with SPF

SPF is gross (doesn't handle relaying). Maybe we should just drop it?

We could add DMARC checks though.

@foxcpp
Copy link
Owner Author

foxcpp commented Mar 16, 2019

We need a collection of use-cases to check how well our design (and more importantly -- config structure) works for them.

@xeoncross, @sapiens-sapide, any thoughts?

@foxcpp
Copy link
Owner Author

foxcpp commented Mar 16, 2019

Probably does something with SPF

SPF is gross (doesn't handle relaying). Maybe we should just drop it?

We could add DMARC checks though.

Sure.

@foxcpp
Copy link
Owner Author

foxcpp commented Mar 17, 2019

It is relatively easy to use caddyfile lexer to parse config into tree structure: https://hastebin.com/ovozofiweh.go

So I guess our problem with configuration format is solved.

@sapiens-sapide
Copy link

sapiens-sapide commented Mar 17, 2019

Adds DKIM signatures for outgoing messages.

This implies to set a DNS record for DKIM. I'm not sure if it's good to sign outgoing messages by default if DKIM signature can't be verified by peers because record is missing.
At least, a message should be printed out to user showing the TXT record that should be set in DNS zone.

@emersion
Copy link
Collaborator

The user will need to setup a bunch of records anyway (MX, DMARC, MTA-STS, etc). I think @foxcpp's idea of an embeddable zone file could solve this issue.

@xeoncross
Copy link

xeoncross commented Mar 18, 2019

The user will need to setup a bunch of records anyway (MX, DMARC, MTA-STS, etc)

Not if maddy runs a DNS server itself. I've been looking at adding a crippled DNS to projects using different Go libraries and it seems pretty due-able. Simply set the domain DNS to point to the same box, then maddy only replies to requests for [domains here] and provides the needed TXT, MX, etc.. records.

We need a collection of use-cases to check how well our design (and more importantly -- config structure) works for them.

I have little interest in a configuration file for the use-cases I mentioned above. I would like to use maddy programmatically wiring in pipelines on a per-project bases. Then again, I see maddy not as a simple MTA/MDA/MSA, but as a powerful library to add a full SMTP/IMAP server to other projects.

The benefit is a single binary / process which also runs a HTTP server, slack bot, queue client, etc..

@foxcpp
Copy link
Owner Author

foxcpp commented Mar 18, 2019

@xeoncross I guess "maddy as a library" is not going to be the top-priority use-case to support. At first, we want to make "simple MTA/MDA/MSA" but only then a generic IMAP/SMTP framework.

@emersion
Copy link
Collaborator

emersion commented Mar 18, 2019

If you want libraries, you can already use go-imap, go-smtp et al.

Maddy could become a DNS server, but just generating a zone file as @foxcpp suggested is probably better. We could always think again about it if there are issues with this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants