diff --git a/docs/review_02.03.22/architecture.png b/docs/review_02.03.22/architecture.png new file mode 100644 index 0000000..2c488d5 Binary files /dev/null and b/docs/review_02.03.22/architecture.png differ diff --git a/docs/review_02.03.22/architecture.puml b/docs/review_02.03.22/architecture.puml new file mode 100644 index 0000000..64f89e3 --- /dev/null +++ b/docs/review_02.03.22/architecture.puml @@ -0,0 +1,49 @@ +@startuml +hide circle +hide members + +entity "Light Wallet client" as Client { + * e.g. MyMonero app, OpenMonero + html, etc. +} +show Client fields + +package "monero-lws" <> { + entity "monero-lws-daemon" as MoneroLWSDaemon { + * implements light wallet server REST API + * takes addresses and view keys and scans + for transactions in real-time + } + show MoneroLWSDaemon fields + entity "monero-lws-admin" as MoneroLWSAdmin { + * admin uses this program to perform various + administrative tasks managing the server, + such as accept or reject requests to + add an address to the server + } + show MoneroLWSAdmin fields + entity "monero-lws database" as MoneroLWSdb { + * uses LMDB, which reads/writes from/to + files on the host machine by default + } + show MoneroLWSdb fields + + MoneroLWSDaemon -up-> MoneroLWSdb + MoneroLWSAdmin -up-> MoneroLWSdb +} + +entity monerod { + * running Monero daemon +} +show monerod fields + +entity "Exchange rate API provider (optional)" as ExchangeRateProvider { + * currently cryptocompare.com + * disabled by default +} +show ExchangeRateProvider fields + +Client -right-> MoneroLWSDaemon +MoneroLWSDaemon -down-> monerod +MoneroLWSDaemon -down-> ExchangeRateProvider +@enduml diff --git a/docs/review_02.03.22/review_02.03.22.md b/docs/review_02.03.22/review_02.03.22.md new file mode 100644 index 0000000..b2e4b8c --- /dev/null +++ b/docs/review_02.03.22/review_02.03.22.md @@ -0,0 +1,546 @@ +## Overview + +This review is meant to serve as an informal audit of `monero-lws` as of +February 3, 2022. It is divided into 3 parts: + +1. How does `monero-lws` work? +2. The code +3. Suggestions + +It was put together as part of [a CCS proposal](https://ccs.getmonero.org/proposals/j-berman-3-months-full-time.html) +with the intention of getting `monero-lws` further along (and potentially +included in the [Monero project github](https://github.com/monero-project)). + +Note that this report does not contain documentation for all code in +`monero-lws`, and has to-do's for a number of folders and files. + +In my review, I did not find anything that would compromise `monero-lws` when +examining from the lens of the following threat model: + +``` +A user who is running `monerod` + `monero-lws` on a machine only the user has +access to does not leak any information about their Monero transactions to a 3rd +party through normal usage. Examples of leaks would be: + +- a backdoor that sends sensitive data from `monero-lws` out to a 3rd party +- `get_random_outs` responds with decoys that compromise the user's transactions +when stored on chain +- calls to the exchange rate API reveal information to the service provider +``` + +Disclaimer: I wouldn't call this audit definitively conclusive that the above +threat model is successfully defended, but I hope that it does inspire more +confidence in this fine piece of software, and serves as a useful document for +curious users, potential contributors, or any future auditors of `monero-lws`. +I reviewed (and tried to provide documentation for) nooks and crannies across +the code, and so at the very least can say with confidence there aren't any +obvious backdoors. + +## How does `monero-lws` work? + +Monero wallets need to scan all transactions in the blockchain to find which +ones belong to a user. `monero-lws` ("Monero light wallet server") is designed +to offload scanning for a user's transactions from the client to a server. As +stated in the [README.md](/README.md), it is an implementation of the +[Monero light wallet REST API](https://github.com/monero-project/meta/blob/master/api/lightwallet_rest.md), and is therefore compatible with e.g. a +[MyMonero frontend](https://github.com/mymonero/mymonero-app-js). A user may use +a MyMonero app (or any other compatible frontend) to connect with their own +running `monero-lws` instance, so that when the user revisits the app, the app +is immediately ready to use for spending. `monero-lws` works by taking in a +user's address and view key, and perpetually scanning for transactions received +and plausibly spent by the user. + +The primary risk to a user using `monero-lws` as their backend is that the +device where it runs is an additional target for someone malicious to compromise +a user's privacy. A user should assume that someone with access to wherever the +`monero-lws` backend is hosted will be able to determine all outputs received to +the user, and in most cases today, the outputs spent by the user as well. This +also includes **amounts** received to and spent by the user. If a `monero-lws` +instance is compromised, user funds are still SAFU since the private spend key +is never sent to `monero-lws`. Ongoing research may improve this risk profile +further, but the above should be users' default assumptions today. + +Whoever hosts a `monero-lws` instance can run the server via the +`monero-lws-daemon` as described in the [README.md](/README.md#running-monero-lws-daemon) +("Running monero-lws-daemon"). The daemon is expected to run 24/7 and +perpetually scan for transactions recevied to (and plausibly spent from) the +addresses known to the server in real-time. + +Whoever hosts a `monero-lws` instance can perform administrative tasks via the +`monero-lws-admin` program, as described in +[administration.md](/docs/administration.md). The `monero-lws-daemon` does not +need to be running in order for the admin to execute commands via +`monero-lws-admin`. + +Both `monero-lws-daemon` and `monero-lws-admin` read/write from/to an LMDB +database on the same machine hosting `monero-lws`. + +`monero-lws-daemon` can also fetch exchange rates from a 3rd party API provider +if an administrator wants. This is turned off by default. + +Here is a simplified diagram of the `monero-lws` architecture: + +
+

+
+ +In the [next section](#the-code), the code will be explained folder-by-folder, +file-by-file, and in some cases where appropriate, function-by-function. The +[final section](#areas-worth-exploring) will highlight suggestions to +potentially improve `monero-lws`. + +## The code + +### /src + +This is where the source code lives. The starter file for `monero-lws-daemon` is +[/src/server_main.cpp](/src/server_main.cpp). The starter file for +`monero-lws-admin` is [/src/admin_main.cpp](/src/admin_main.cpp). Both use +a lot of shared code across the repo. + +#### /src/server_main.cpp + +Starts in `main()` initializing logs and loading tools, then passes command line +args to a function `get_program()` to parse the args and construct the program +to run. The resulting program is then passed to `run()`. Failures are caught +gracefully. + +Passing `--help` to `monero-lws-daemon` is the one exception that does not run +a program, and instead outputs the following (note the configurable defaults): + +``` +Usage: [options] +Options: + --db-path arg (=~/.bitmonero/light_wallet_server) + Folder for LMDB files + --network arg (=main) <"main"|"stage"|"test"> - Blockchain + net type + --help Produce help message + --daemon arg (=tcp://127.0.0.1:18082) ://
: of a + monerod ZMQ RPC + --sub arg tcp://address:port or ipc://path of a + monerod ZMQ Pub + --rest-server arg (=https://0.0.0.0:8443) + [(https|http)://
:] for + incoming connections, multiple + declarations allowed + --rest-ssl-key arg to PEM formatted SSL key for + https REST server + --rest-ssl-certificate arg to PEM formatted SSL certificate + (chains supported) for https REST + server + --rest-threads arg (=1) Number of threads to process REST + connections + --scan-threads arg (=8) Maximum number of threads for account + scanning + --access-control-origin arg Specify a whitelisted HTTP control + origin domain + --confirm-external-bind Allow listening for external + connections + --create-queue-max arg (=10000) Set pending create account requests + maximum + --exchange-rate-interval arg (=0) Retrieve exchange rates in minute + intervals from cryptocompare.com if + greater than 0 + --log-level arg (=1) Log level [0-4] +``` + +Note that `scan-threads` defaults to the maxmimum number of available threads +on the host machine. + +Also note that the `rest-server` defaults to being served over https. + +Here are sample logs when you start the default program with `log-level` 4. + +``` +2022-01-11 07:56:42.713 I Using monerod ZMQ RPC at tcp://127.0.0.1:18082 +2022-01-11 07:56:42.713 I Starting blockchain sync with daemon +2022-01-11 07:56:42.715 I [PARSE URI] regex not matched for uri: ^((.*?)://)?(\[(.*)\](:(\d+))?)(.*)? +2022-01-11 07:56:42.715 I Binding on 0.0.0.0 (IPv4):8443 +2022-01-11 07:56:42.715 I Generating SSL certificate +2022-01-11 07:56:43.092 D start accept (IPv4) +2022-01-11 07:56:43.092 D Spawned connection #0 to 0.0.0.0 currently we have sockets count:1 +2022-01-11 07:56:43.092 D test, connection constructor set m_connection_type=1 +2022-01-11 07:56:43.092 I Run net_service loop( 1 threads)... +2022-01-11 07:56:43.092 D Run server thread name: NET +2022-01-11 07:56:43.092 D Reiniting OK. +2022-01-11 07:56:43.092 I Listening for REST clients at https://0.0.0.0:8443 +``` + +When `get_program()` calls `run()`, this creates the LMDB environment and +initializes all tables in the db, syncs `monero-lws` with the running `monerod` +daemon, subscribes to the monero ZMQ RPC, initializes an http client to query +for exchange rates from cryptocompare.com, initializes the REST server so that +it can start listening for requests, and starts the scanner so that it will +perpetually scan for users' transactions. + +#### /src/rest_server.cpp + /src/rest_server.h + +The REST server exposes the following 7 endpoints: + +- `/login` +- `/import_wallet_request` +- `/get_address_info` +- `/get_address_txs` +- `/get_random_outs` +- `/get_unspent_outs` +- `/submit_raw_tx` + +It handles requests via the function `handle_http_request()`, which does +standard validation such as whether the endpoint is implemented, if the request +is appropriately sized (each endpoint has sane default byte size allowances), +if the request is an HTTP POST request. Then executes the respective endpoint +passing in params included in the request, then handles any errors and responds +accordingly, or returns with a standard success response and body. + +##### Authentication + +Every endpoint that accepts an `address` and `view_key` as params +(`/login`, `/get_address_info`, `/get_address_txs`, `/get_unspent_outs`, and +`/import_wallet_request`) also enforces that the public view key included in +the `address` is derived from the private `view_key` provided in the request +(via the `key_check()` function). Additionally, addresses are indexed in the db +by the public view key. Thus, even if the public view key *is* derived from the +private view key, if the public view key is not stored in the db, then the +response will fail. This way if a malicious user provides an honest user's +public spend key, and also provides a public view key that is derived from some +random private view key, the account would not be found in the db. This +ensures that only the user with the correct `address` and `view_key` pair +can access their own data on the server through the REST API. + +##### Exchange Rates + +`/get_address_info` is the only endpoint that responds with exchange rates iff +the server has exchange rates enabled. The server serves the latest cached +exchange rates, rather than makes requests to the exchange rate API provider +every time exchange rates are included in a response. The server requests +exchange rates and caches them in intervals of `exchange-rate-interval`, which +is an arg the admin can set when starting the `monero-lws-daemon`. This caching +in set intervals strategy is a good privacy-respecting approach, otherwise the +exchange rate API provider could collect usage stats on how often a user calls +`/get_address_info`. + +##### /login + +> Check for the existence of an account or create a new one. + +The very first thing done is authentication on the provided `address` and +`view_key` as described above. On failure, it returns `bad_view_key`, which is +great. Next, it queries the db for the `account` with the given `address`. If +the account is found, it checks if the account has status `hidden`, which is +a status the admin can set via `monero-lws-admin`. If hidden, it will return +`account_not_found`. Otherwise, if it exists, it will simply return with +`create_account` set to false (account was already created), and with whatever +`generated_locally` flag was set at account creation. If the account does not +exist and the user is not attempting to `create_account`, or there was an error +reading the db, then it will return with an error. Finally, if the account does +not already exist, and the user is attempting to `create_account`, then the +server will write a "creation request" to the db, which is just a request to +create an account with that `address` and `view_key` pair (and provided flags). +The admin will need to then manually approve the creation request via +`monero-lws-admin` in order to add the address to the server. + +##### /import_wallet_request + +> Request an account scan from the genesis block. + +This first calls `open_account()`, which authenticates the `address` and +`view_key`, then retrieves the account from the db by address. Just like above, +this function returns with `account_not_found` if hidden or not found, or +returns with the account. If the account's starting scanning height +(`start_height`) is already 0, then the import request returns as fulfilled. +Otherwise, it checks for the existence of another import request. +If one already exists, then the endpoint returns saying `Waiting for Approval`. +If not, then it says `Accepted, waiting for approval`. + +##### /get_address_info + +> Returns the minimal set of information needed to calculate a wallet balance. The server cannot calculate when a spend occurs without the spend key, so a list of candidate spends is returned. + +First calls `open_account()`. Then, using the account's unique identifier in the +db, queries for all the user's received outputs and candidate spends. Then +checks the db for the latest height the server knows. Then starts constructing +height metadata for response. Then iterates over all received outputs, collects +spend metadata for each, and totals up `total_received` and `locked_funds`. +Then iterates over candidate spends, retrieves spend metadata collected in +prior step, and totals up `total_sent`. Finally, retrieves exchange rates +if the exchange rate interval is set. Then returns the constructed response. + +##### /get_address_txs + +> Returns information needed to show transaction history. The server cannot calculate when a spend occurs without the spend key, so a list of candidate spends is returned. + +First calls `open_account()`. Then queries db for received outputs, candidate +spends, and latest height data. Constructs the same height metadata as above. +Then, iterates over all received and candidates spends at one time, placing +candidate spends in an array inside each received output. + +Note there is some more work done before sending the final response over the +wire in `/src/rpc/light_wallet.cpp` that will be explained further later on. + +##### /get_random_outs + +> Selects random outputs to use in a ring signature of a new transaction. If the `amount` is `0` then the `monerod` RPC `get_output_distribution` should be used to locally select outputs using a gamma distribution as described in "An Empirical Analysis of Traceability in the Monero Blockchain". If the `amount` is not `0`, then the `monerod` RPC `get_output_histogram` should be used to locally select outputs using a triangular distribution (`uint64_t dummy_out = histogram.total * sqrt(float64(random_uint53) / float64(2^53))`). + +First, limits requests to a max mixin (ring size) of 50, or max amounts +requested (the number of different rings) to 20 to avoid overloading the server +with a single request. Sets up daemon RPC client to query for output data. Then +for all non-0 amounts, retrieives output histograms and places in an array. A +relevant query parameter to highlight here is the `recent_cutoff` when +requesting histograms; the recent cutoff was used in wallet2 to bias toward +recent outputs when selecting decoys for **pre**-RingCT outputs. However, +pre-RingCT outputs are infrequent enough today that it should no longer yield +practical privacy benefits to try and select more recent pre-RingCT outputs in +higher frequency. Thus, it is ok that `monero-lws` sets this `recent_cutoff` +param to 0, so that the query wastes no time in factoring in recency for +pre-RingCT outputs. + +After retrieving histograms for pre-RingCT outputs, retrieves the output +distribution of RingCT outputs across the whole chain (if 0 amount outputs +are requested). This distribution is later passed into a `lws:gamma_picker` +declaration, where it's used to select RingCT decoys. + +Then sets up a constructor to fetch output public keys over RPC (accepts output +id's as param). + +Final step, the function passes in all data (and fetch function) needed +to actually make decoy selections from the distribution/histograms. This +function is called `pick_random_outputs` and will be explained in greater +detail later. + +The response from `pick_random_outputs` is the final response to +`/get_random_outs`. + +Note: it should be noted here that any server that records the selected decoys +and cross-references them with the outputs included in a transaction can easily +tell which output is real (the one not included as a decoy in the response). +Thus, a user should expect that if they are using a light wallet server to +construct transactions, the server can trivially tell which outputs are +spent in the user's transactions, no heuristics needed. + +##### /get_unspent_outs + +> Returns a list of received outputs. The client must determine when the output was actually spent. + +First calls `open_account()`. Then makes an RPC request to `monerod` for the +dynamic fee estimate. A relevant query parameter to highlight here is +`num_grace_blocks`, which is the +["number of blocks we want the fee to be valid for"](https://github.com/monero-project/monero/blob/298c9a357f6e57eccf28db1f3e734eb6da080d9a/src/cryptonote_core/blockchain.h#L646). +It is set to `10`, which matches [wallet2](https://github.com/monero-project/monero/blob/319b831e65437f1c8e5ff4b4cb9be03f091f6fc6/src/wallet/wallet2.cpp#L118). +It is important this estimate matches what other Monero ecosystem users would +use, since if the fee were different, it would cause transactions constructed +with the different fee to be fingerprintable. Thus, matching `num_grace_blocks` +is good. + +The endpoint then gets all of a user's received outputs and for all outputs that +are >= requested `dust_threshold`, or are in tx's with ring sizes >= requested +`mixin_count`, gets their associated key images, then tacks the final `output` +object onto what will eventually become the `outputs` array in the response. + +Finally, calculates the `per_kb_fee` factoring in `size_scale` and `fee_mask` +returned by the daemon. Note that `per_kb_fee` has been deprecated in the +light wallet REST API, and this should be updated to `per_byte_fee`. See [this +issue](https://github.com/vtnerd/monero-lws/issues/25) for more. + +Note there is some more work done before sending the final response over the +wire in `/src/rpc/light_wallet.cpp` that will be explained further later on. + +##### /submit_raw_tx + +> Submit raw transaction to be relayed to monero network. + +Very simple. Just relays the transaction over the daemon RPC `send_raw_tx_hex` +endpoint. Returns error if not relayed successfully. + +#### /src/scanner.cpp + /src/scanner.h + +This code handles scanning the blockchain for users' transactions while +`monero-lws-daemon` is running. It makes sure `monero-lws` is in sync with the +chain, stays in sync, and just keeps scanning all the time. + +##### lws::scanner::stop() + +Stops the scanner, naturally. + +Sets the `monero-lws-daemon`'s `running` state to false. It is called from +`server_main.cpp`, which is listening for a SIGINT signal in order to trigger +`lws::scanner::stop()`. + +##### lws::scanner::sync() + +Syncs the running `monero-lws-daemon` with the blockchain, making sure +`monero-lws-daemon` knows what blocks still need to be scanned. + +First thing this function does is reads the `monero-lws` db to find the most +recent known block hashes. Then it requests block hashes from the running +`monerod` daemon via RPC endpoint `/get_hashes_fast`, providing the known hashes +as a param. Uses while loops to wait on daemon response and times out or exits +if admin signals to end the program via SIGINT. Once daemon returns the +response, if the daemon has already synced to the top of the chain, the `sync()` +function returns successfully. If not, writes the returned new hashes to the +`monero-lws` db, and rolls back/updates the chain if a reorg has been detected. +If there are still more hashes to request from the daemon, continues loop +requesting from the daemon. Otherwise, breaks and the function returns. + +##### lws::scanner::run() + +Runs the scanner in a perpetual loop, so that it continues scanning for users' +transactions while `monero-lws-daemon` is running. Also updates exchange rates +held in the cache. + +First retrieives all active accounts and their received outputs from the +`monero-lws` db. If no active accounts, sleeps for a bit. Otherwise, proceeds +to scan for transactions for all active accounts inside `check_loop()`. Then, +checks if the scanner has stopped and should exit `run()`, otherwise continues. +Next initializes an rpc client connection to the daemon if one already not +initialized. Then calls `lws::scanner::sync()` to make sure `monero-lws` is +still in sync with `monerod`. Finally, if there is a sync error that is not a +timeout, throws; if not timeout, warns in a log. Continues looping until +there are users to scan transactions for. + +###### check_loop() + +Relevant comments: + +> Launches `thread_count` threads to run `scan_loop`, and then polls for active account changes in background + +``` +The algorithm here is extremely basic. Users are divided evenly amongst +the configurable thread count, and grouped by scan height. If an old +account appears, some accounts (grouped on that thread) will be delayed +in processing waiting for that account to catch up. Its not the greatest, +but this "will have to do" for the first cut. +Its not expected that many people will be running +"enterprise level" of nodes where accounts are constantly added. + +Another "issue" is that each thread works independently instead of more +cooperatively for scanning. This requires a bit more synchronization, so +was left for later. Its likely worth doing to reduce the number of +transfers from the daemon, and the bottleneck on the writes into LMDB. + +If the active user list changes, all threads are stopped/joined, and +everything is re-started. +``` + +Exactly as described, the first thing it does is sorts all users by +`scan_height`, so that it can send batches of users at a time into threads +that execute `scan_loop()` over those users (so that users with similar heights +are processed together). Then it enters a while loop that updates exchange rates +over the exchange rate interval, and then perpetually checks for changes to +the users stored in the `monero-lws` db. If it finds that there is a change +in active users stored, `check_loop()` returns so that `lws::scanner::run()` +can do another loop, collecting all active users again and re-entering +`check_loop()` with the updated list. + +###### scan_loop() + +TO-DO + +#### /src/admin_main.cpp + +#### /src/CMakeLists.txt + +#### /src/config.cpp + +#### /src/config.h + +#### /src/error.cpp + +#### /src/error.h + +#### /src/fwd.h + +#### /src/options.h + +#### /src/wire.h + +#### /src/db + +##### /src/db/account.cpp + +##### /src/db/account.h + +##### /src/db/CMakeLists.txt + +##### /src/db/data.cpp + +##### /src/db/data.h + +##### /src/db/fwd.h + +##### /src/db/storage.cpp + +##### /src/db/storage.h + +##### /src/db/string.cpp + +##### /src/db/string.h + +#### /src/rpc + +##### /src/rpc/client.cpp + +##### /src/rpc/client.h + +##### /src/rpc/CMakeLists.txt + +##### /src/rpc/daemon_pub.cpp + +##### /src/rpc/daemon_pub.h + +##### /src/rpc/daemon_zmq.cpp + +##### /src/rpc/daemon_zmq.h + +##### /src/rpc/fwd.h + +##### /src/rpc/json.h + +##### /src/rpc/ligt_wallet.cpp + +##### /src/rpc/ligt_wallet.h + +##### /src/rpc/rates.cpp + +##### /src/rpc/rates.h + +#### /src/util + +#### /src/wire + +### /CMakeLists.txt + +## Suggestions + +- Consider indexing addresses by a hash of the private view key instead of just +public view key, so that timing analysis under perfect conditions cannot reveal +whether an address is stored. + +- See the current state of subaddress support +[over here](https://github.com/monero-project/meta/pull/647). + +- Share a guide on hosting `monero-lws` behind a .onion, and enable a frontend +to make calls to it. + +- The exchange rate API defaults to off. When turned on, it fetches prices in a +custom interval over the clearnet. `monero-lws` could be bundled with tor or +another anonymity network, so that queries for exchange rates are made over the +anonymity network by default. + +- Consider suggesting a particular interval to admins who wish to turn on the +exchange rate interval or requiring intervals at particular windows (e.g. 10 +seconds, 30 seconds, 60 seconds, etc.), to prevent an admin from fingerprinting +themselves via the exchange rate interval they choose to set. + +- Allow users to add addresses and import requests in a secure way without +needing manual input from the admin, for example, by using a special auth token +(or username/password combo) that allows a user to make as many requests as +they want (or comes with special rules). This minimizes need for out-of-band +solutions to automate this process in a potentially insecure way (e.g. +auto-accepting all requests). + +- Be mindful that even if an account has a `hidden` status, someone would be +able to still tell that account is stored on the server since `/login` will +always return `account_not_found`, even if trying to create the account. This +is not a big deal, because you'd need the `address` and `view_key` to be able +to reach this point. Just worth being mindful of.