Firefish + Cloudflare R2 Tutorial: A Quickfix for Media Upload Safety

by bumblefudge

Introduction

These days, Fediverse activity is ramping up on many fronts: extensions are being designed and specified, development is proceeding quickly, major platforms are considering interoperating and/or federating, communities are growing, and infrastructure for shared and/or federated moderation systems are coming. One small way of helping is to document little things the semi-technical and under-funded can do to upgrade their own community infrastructure.

Context: A Maslovian Triangle of moderation problems

There has been a conversation lately happening in various overlapping fora. The Stanford Internet Observatory's released a report on moderation compliance on the Fediverse. Daphne Keller wrote a great crash-course for Americans about the incoming API-based moderation-compliance requirements for the Digital Services Act and its draft data model, which the EU is imposing on servers above a certain user-count threshold to allow auditing by regulators and researchers alike. IFTAS has been coordinating on their matrix server with all kinds of moderators and instance operators about putting together infrastructure for shared and/or mutually-auditable moderation, which might well prove the most fedi-native and long-term-viable approach, based as it is on a platform coöperative resourcing model and fine-tuned to fediverse experience.

In this context, some people have been asking for "low-hanging fruit"-- quick fixes that medium-to-large instances could do to cover their, um, data liabilities if public and open servers find themselves dealing with bad apples in higher numbers over the coming years, or even if they just want to sleep a little better at night.

The most toxic liability remains so-called "CSAM" (compliance-shattering antisocial material). Were a new user of the instance to upload this toxic digital sludge without a moderator noticing, or if a user on another instance sent it to a instance attached to a DM or along a follow channel, it could be quite damaging to an otherwise 100%-compliant and well-moderated instance. That instance could be swiftly booted by their infrastructure operators (if hosted on a virtual cloud) or added to a DNS blocklist just as swiftly and find itself "disconnected from the internet". While more nuanced solutions are being developed to comply with rising moderation compliance standards (and securing cross-server follows), an easy gap to follow is just to scan all media at time of upload. The base of Maslov's pyramid of things to keep out of community spaces.

One commercial solution is to store all User Generated Content in a storage backend that already scans everything at ingress for CSAM and and rejects/deletes the content if detection-confidence is above a given threshold. There are multiple of these, but one of the easiest to configure is Cloudflare's, and since Cloudflare has been generally supportive of the Fediverse open-source developer community, I don't feel bad recommending their commercial solutions to instance operators currently DIYing all infrastructure. Heck, if you have a small but public instance, your usage might even stay within the free-tier between now and when you can set up something more complex, inshallah.

Huge thanks to user @gme@bofh.social for sharing his own install notes which I recreated here with screengrabs.

Pre-requisites

Virtual server - e.g. Hetzner

Firefish install - Dockerized

The docker instructions seem to be a little out of date at the moment, but as soon as the open merge requests go in, the docker instructions should be pretty newbie-friendly on a virtual server à la Hetzner, Cloudflare, or DigitalOcean.

Cloudlflare R2 - e.g. free tier

set up payment
name your bucket
price-limit your bucket
(optional) - if you also manage DNS through the same CF account, you can configure DNS for your bucket. This is the domain from which your uploaded media will be downloaded, and without one, it'll be an ugly r2.dev subdomain.
Configure your bucket
- note "R2.dev subdomain" which I had to explicitly enable since I am not configuring a subdomain just for this tutorial, heh. Otherwise I'd have to "connect domain" to a DNS domain controlled through the same CF account.
- Note that I added a CORS policy to allow queries from my firefish server by domain or by IP. Here again I'm being lazy and not configuring a reverse proxy for my staging server yet, heh.
- Note that I had to remove the default Object Lifecycle rule of deleting all content after 7 days. Unless you have a "blissful goldfish memory" instance, most users might be expecting you to host their uploads a little longer than 7 days.

Open the R2 submenu in the left side navigation and click "Overview", which takes you here:
- click "Manage R2 API tokens", then click "Create API token" on the following page
Configure your API token. "Object read & write" is probably fine unless you're doing anything fancy
- Note that I'm hosting my server from a static IPv4 so I can just exclude the rest of the web from connecting to my bucket to cut down on the risk of some spam or malice happening in a bucket I pay for
- I am not expiring this token because I am lazy
On the following screen you'll get an Access Key ID and a Secret Access Key
- you can click both and paste them into a notepad type doohickey, unless you know how to access multiple recent clipboard entries on your given operating system. Windowskey+V for the zero Windows users who will be reading this!

Configuring Firefish for Bucket - GUI edition

The Control Panel for Firefish has a whole "Object Storage" section
- Note that the base URL is the base URL at which each uploaded image will be viewable, not the base URL of the endpoint. When you set either a custom subdomain or an ugly R2 subdomain, the latter had a little button inviting you to copy this long string of hexidecimals to your clipboard.
- I set this up with SSL (I believe Cloudflare requires it) and without outbound proxy via nginx/apache. YMMV, this is just a demo, etc.
- path-based endpoint URLs are what CloudFlare R2 currently uses
- don't forget to hit "Save" in the upper right:

Testing and Troubleshooting

This part's pretty simple: if you go to upload an image to a post and you get one of these:
- your firefish isn't connecting to your bucket
If you messed up the Base URL value, you might see the "blur hash" that normally placeholds for the image while it loads seemingly never being replaced, because the desired path is not producing the image. It will look like this but otherwise throw no errors
And if you configured nothing wrong, you'll see sweet sweet images (the ones that Cloudflare doesn't immediately nuke from space) in the timeline.