Sovereign Social-Web Technology: Milestone 1 Report-Out

by bumblefudge

Context

The Social Web Co-operative is a young organization founded by three public-good technologists working towards a standards-based and publicly-governed social web. In Autumn of 2023, we worked with the Sovereign Technology Fund to define some high-stakes, low-risk goals towards that end:

Tools that can compose into a conformance system rendering most testable claims to conformance with the existing ActivityPub standard
Some extensions to that standard that increase user agency vis-a-vis their "servers" by offering interoperability standards for data- and user-migrations
Focused and sustained community work to ensure a healthy adoption and feedback loop with implementers of the two community contributions above.

Today we are proud to announce the release of the first work package addressing the conformance needs of ActivityPub, and we would like to thank the STF and show our work in the interest of transparency and community ownership of these outputs.

Work Package 1: Tools for Conformance Testing

Analysis of the ActivityPub Specification

Our first chore was to read through the final text of the 2019 ActivityPub standard splicing out every MUST and SHOULD/MAY into a spreadsheet for objectively tracking its requirements and recommendations. This can be tedious and legalistic, but doing it in deliberation isolation from analytic or design thinking is crucial to objectivity, and not pre-interpreting (i.e. prejudicing) the design of test cases corresponding to each requirement. As part of the process, we also added some columns to the spreadsheet to capture some initial thoughts about the architecture implied by these requirements, which were handy in the later breakdown of core requirements and behaviorial/advanced requirements. For more detail about this stage in the process, see the "piece offering" section in this blog post.

The core behaviors that we distilled from the specification can be browsed in a reactive web interface here.

Conformance Requirements Dataset

The above web interface is handy for a human-readable interpretation of conformance requirements, but interfacing between:

these required behaviors,
the test cases that capture the logic of a test of those requirements,
the runnable tests instantiating those test-cases in code, and
the results outputted by a collection of runnable tests Is really the interface between four different corpuses and codebases. Using strings to key together these four very different groups wouldn't really scale, so as a stitch in time we assigned arbitrary UUIDs of the classic RFC-4122 variety to each, and used these to create a structured data object in YAML for each to be the machine-readable and highly pluggable/portable form taken by each behavior defined in the spec.

This breaks out the behaviors at the core of any conformance regime into its own distinct layer of the codebase, tracked by git and changelog in our testing repo and loadable dynamically as its own NPM package.

Conformance Requirements Website

One thing we gain from this methodical layering is that it allows tracking at and across all other layers by uuid, a feature you might notice if you click on any of the requirements listed on the requirements website.

This lightweight, portable rendering engine can serve various purposes, but its main one is to allow a translation from uuids (from, say, an error message or test result) into a maximally-useful HTML page. It transforms the canonical dataset above into a navigable resource. One reason we added this layer is that the website rendering these yaml files can compose nicely if a given implementer or tester wants to add a whole additional corpus of UUID+YAML tests from another specification or interoperability target (we hope the format will catch on in other open-source conformance projects!). Another reason it was worth open-sourcing in this form is that it is not a "website" in the sense that it requires the public or dynamic web server to be useful-- it can be forked, modified, rendered and served locally, reformatted, indexed and searched locally, etc etc. It's a reactive "manual" you can refer to locally in any testing environment.

ActivityPub Actor-Testing Harness and Core Unit Tests

Here, finally, we've reached the pay-off of all this methodical note-showing and cross-referencing. In tandem, we tried to define the actual JavaScript test and Markdown test-case together for each requirement. (Note: some of the tests naturally compose into one another, can only pass if other tests have already passed, etc.).

Working back-and-forth updating one and then making the other match was a great exercise, pointing out incrementally what kind of logic there needed to be between inputs and outputs, and identifying complexity missing on both the runnable-test side and the human-readable logic documentation side. It really helped identify the need for many shades of grey between "pass," "fail" and "inapplicable". These additional options, anchored to the testing taxonomy specified by in 2010 by the Mobile Web Test Suites WG at W3C, (For more on the nuances of non-passing, non-failing results, see the section "Let the Gifts Through" in bengo's blog post announcing the initial release of the tools.)

The harness we came up with delivered, we think, on all the core capabilities while minimizing dependencies and opinions that might keep the core logic from being able to run anywhere:

It can run from a command-line, in a CI environment, in a browser process, on a server;
It can work with standard inputs and outputs for composability;
It can take objects and variables as inputs per-test, or failing that fall back to environmental variables, or failing that fall back to sensible defaults;
It can incorporate complex scripts (examples coming soon!) to "mock" authorization tokens, generation of test objects on live servers, and other "interactive" tests of server-to-server interactive behaviors;
It comes packaged along with a bare-bones, partial "toy" implementation that passes all tests to date and can be marshalled into the service of those server-to-server tests;
It was structured to optimize for explicit decisions and flexibility for those that want to swap out tests for more opinionated ones, contribute tests, add optional tests to accelerate testable adoption of interoperable FEPs (community extensions to the protocol), etc etc.

In other words, it's ready to serve as the backbone of many different conformance test systems, whether these be a subset or a superset of the ActivityPub protocol!

Excursis: Flexible Design is Decentralizing Design

Our goal with this kind of segmented, layered, UUID-indexed design was to document every decision down to the most minute, such that even people uncomfortable suggesting changes in JavaScript could read the corresponding test-case document and identify assumptions or corner-cases. Crucially, this also maximizes the potential of forks, so that the entire project could be forked and still be useful downstream, preventing our small team from being the single point of failure if we got swamped by an avalanche of contributions or weren't serving the communities they came from due to differences of interpretation. We feel the most useful test harness is one that exists in various form factors and parallel or downstream forks, helping different communities achieve totally different things from the AP federation protocol.

The Implementation Guide

As one might expect given the features described above, the testing tools can be a little daunting with a tutorial, so one separate deliverable in our contract was some basic "## Getting Started" type guidance for all the various ways of using the tools, in different environments. These are all collected in the lightweight Implementation Guide, and here more than anywhere else issues and merge requests are warmly welcome to help usher in feedback and contributions from all interested parties.

Next Steps: Adoption

The activitypub-testing repository already houses a number of issues we already identified and backlogged ourselves to work on in the coming months now that the core harness and tooling are all released.

These include:

additions to the implementation guide based on feedback from implementers
further test coverage
harden our work-in-progress generator that attempts to make tests human-readable as test cases when the former are submitted without the latter.
better handling of HTTP post codes and authorization (testing outbox functioning in the most implementation-agnostic way)
more readable outputs
staying up to date with onepage.pub, another reference implementation being worked on by one of the editors of the ActivityStreams specification

We are hosting an issue tracker for the community so that people can discuss test results and identify gaps in the test-case logic or otherwise workshop the results of their own (or other) live implementations. We will also keep attending other calls of the W3C Social Web Community Group's testing taskforce, which includes other Fediverse interoperability targets beyond the protocol itself We have been active for years in this and other FOSS communities, across conferences and meet-ups, so we are optimistic that the feedback we have been getting along the way will continue into contributions from users working across the many major implementations in today's fediverse.

Next Steps: Portability Tools and other Extension Points

As mentioned above, one time-sensitive goal of this testing system was to accomodate new tests that correspond to community extensions beyond the W3C process and the "mandatory" behaviors provided by the core protocol. The reason this was time-sensitive is that the coöp will now be turning its attention to its next work package, adding some such extensions to implementations and adding tests for these extensions to the test suite to encourage the developer community to align on these and establish a precedent for "bottoms up" design of extensions.

Specifically, we are interested in making user accounts and user data more portable by offering an upgrade path for both, such that implementations can incrementally opt into standardized and migration-friendly forms of each. There are interesting knock-on effects to this kind of alignment, since it can also help moderation policies be more transparent and auditable if they reference user accounts and user data in archival and implementation-independent ways, so this work could not be more timely in supporting other community endeavors in the Fediverse. Stay tuned!