ATProtocol Record References: Authoritative vs Unauthoritative Patterns

Published by @smokesignal.events on 2025-07-31 15:40 UTC.

If you've been building on ATProtocol, you've probably wrestled with this question: should I reference another record using a strongRef, or should I just embed the data directly? It's one of those architectural decisions that seems simple at first but has surprisingly deep implications.

I've been calling these two approaches "authoritative" and "unauthoritative" records, though I'll admit I'm not entirely sold on the terminology. The ATProtocol community also uses terms like "embedded records," "inline records," or "denormalized records" for what I'm calling unauthoritative records. But "anonymous records" felt too overloaded, and could cause confusion with actual anonymous data.

Here's the basic distinction:

Authoritative record references maintain a reference to the authority and key. Think of a com.atproto.repo.strongRef object or an at:// URI. When you encounter one of these in a record, you need to do an additional lookup to actually use the data.

Unauthoritative records embed the content directly. No references, no lookups - the data is right there in the record.

Let me show you what I mean with a concrete example from the Lexicon Community calendar event system I've been working on.

Example: Calendar Events with Locations

Here's what an authoritative record reference looks like in practice. A Lexicon Community calendar event with an authoritative reference to a location:

{
    "locations": [
        {
            "$type": "com.atproto.repo.strongRef",
            "uri": "at://did:plc:rkh7hrrprgtarfraofpbh4vt/community.lexicon.location.address/01k1gmwnhn2s7c0y55p5jwbe2v",
            "cid": "13609594aa.."
        }
    ]
}

And here's the same event using an unauthoritative (embedded) approach:

{
    "locations": [
        {
            "$type": "community.lexicon.location.address",
            "country": "US",
            "locality": "Oakwood",
            "name": "Orchardly Park",
            "postalCode": "45419",
            "region": "Ohio",
            "street": "2599 Delaine Ave"
        }
    ]
}

At first glance, the embedded version looks simpler, right? And in many ways it is. But this choice can have substantial impact on your system architecture.

Why This Matters

The two different ways to reference record data have big impact on accessibility, complexity, and ownership.

Take Smoke Signal, an event and RSVP management and discovery site, as an example. By having locations embedded in the event directly, it means that the event organizer "owns" the event data. They're responsible for updating it, and the visibility of the event data is tied directly to the event itself. Systems that watch for relay or jetstream commits of community.lexicon.calendar.event records can see the location data immediately and index it accordingly.

There are some benefits to doing this, and they revolve around the core idea that addresses are folksonomies. There is some data duplication, but I believe that it's an acceptable compromise. For example, a regular meetup at a venue may have slight changes between events that don't warrant creating entire location records. The street address, locality, and administrative area may be the same, but the meeting room or area may be different week by week.

Data Changes and Updates

The major arguments for using authoritative record references typically come down to:

Data deduplication - Store once, reference many times
Updates - Change once, updates everywhere

For locations that are "permanent" and public, having a strong reference can make a lot of sense. The address of the local park won't change very often, and if it were to, then the CID can point to different versions or forward changes. On the same note, if the location were to have a content change, then it would be inherited and events that reference it may not need to be updated.

But here's where theory meets practice...

Technical Implementation of Record References

The strongRef Architecture

The com.atproto.repo.strongRef represents ATProtocol's solution for immutable, verifiable references. It combines two critical components:

{
  "uri": "at://did:plc:44ybard66vv44zksje25o7dz/app.bsky.feed.post/3jvz2442yt32g",
  "cid": "bafyreigbtj4x7ip5legnfznufuopl4sg4knzc2cof6duas4b3q2fy6swua"
}

Key Properties:

URI: Provides addressability within the repository structure
CID: Content Identifier that cryptographically verifies the record's content using SHA-256 hash

This dual-component design ensures both location (where to find it) and integrity (what you should find) are preserved. The CID makes the reference immutable - any content change produces a new CID, effectively breaking old references.

Weak References

Beyond strongRef, ATProtocol also supports what I call "weak" authoritative references. These are objects that contain just a uri field with a URI or ATURI (at://) formatted value:

{
    "$type": "weakLocation",
    "uri": "at://did:plc:rkh7hrrprgtarfraofpbh4vt/community.lexicon.location.address/01k1gmwnhn2s7c0y55p5jwbe2v"
}

The main use case here is when you're saying "there's a record that can be found at this location" but you don't care about the specific version or content integrity. This is useful when:

You want to reference the "current" version of something, whatever that may be
The referenced content updates frequently and you always want the latest
You're building looser coupling between records
You don't need cryptographic proof that the content hasn't changed

Think of it as the difference between linking to a specific Git commit (strongRef) versus linking to the HEAD of a branch (weak reference). Both have their place, depending on your needs.

Embedded Record Pattern

The alternative pattern embeds data directly within the parent record:

{
    "locations": [
        {
            "$type": "community.lexicon.location.address",
            "country": "US",
            "locality": "Oakwood",
            "name": "Orchardly Park",
            "postalCode": "45419",
            "region": "Ohio",
            "street": "2599 Delaine Ave"
        }
    ]
}

This approach trades referential integrity for self-containment and simplicity.

Real-World Implementation Examples

Smoke Signal (Events & RSVPs)

Smoke Signal uses both patterns in different ways.

Events have reference unauthoritative record data for locations. This creates a strong link between the owner of an event and the event details.

RSVPs use a strongRef to event records, capturing the exact version of the event at RSVP time. This allows users to prove they RSVP'd to a specific version of an event, even if details change later.

Important caveat: When events are deleted, RSVP references become orphaned. The solution was to discourage deletion in favor of state changes (canceled, postponed).

WhiteWind (Blogging Platform)

The first third-party AppView demonstrates:

Custom lexicons (com.whtwnd.blog.entry) that extend ATProtocol
Standard reference patterns for cross-blog citations
Challenges with limited documentation on record structures

Record Editing Implementation (verdverm.com)

A copy-on-write pattern for edit history:

$orig field stores strongRef to original record
$hist array maintains chronological edit history
Maintains stable URI while CID changes with edits
Trade-off: Creates potential social attack vectors as likes/comments remain attached

Note that official recommendations from protocol engineers discourage the use of fields starting with the "$" character, so this approach diverges from standard practices.

Comparative Analysis with Other Protocols

ActivityPub: Origin-Based Decisions

ActivityPub's approach provides a valuable lesson:

Same-server content: Embedded for performance
Cross-server content: Referenced to maintain authority
Decision based on trust boundaries rather than content type

IPFS/IPLD: Pure Content-Addressing

Everything is a cryptographically verifiable reference:

Minimal metadata embedding (size, MIME type)
Enables cross-protocol interoperability
Trade-off: Performance depends on content availability

Matrix: Operational vs Relational

Matrix embeds operational data (messages, state) but references relational data (replies, threads):

Strong eventual consistency through state resolution
Good performance with minimal fetches
Clear separation of concerns

Privacy Considerations

I think privacy is a non-starter here. From the information we have about where ATProtocol is going with private records, they won't likely have the same shape as current records and repositories. It's hard to speculate what they will look like, how they are referenced, or what their access patterns will be.

For that reason, I'm removing privacy and security as a consideration for this analysis.

Best Practices and Recommendations

Decision Framework

Use strongRef when:

Content has clear authority/ownership
Updates should propagate automatically
Referential integrity is critical
Storage efficiency matters at scale
Version-specific references are needed
You want to give identities a way to reference a specific version, protecting them from having the rug pulled out from under them if the referenced content changes dramatically

Use embedding when:

Offline access is required
Performance is critical
Content is small and unlikely to change
Simplified client implementation is priority
Avoiding orphaned references is important

Implementation Guidelines

Start simple: Begin with embedding, add references as needs emerge
Design for degradation: Ensure core functionality works without references
Cache strategically: Balance freshness with performance
Version explicitly: Use CIDs for immutable references when needed
Monitor orphans: Implement cleanup strategies for broken references

Community Patterns Emerging

The ATProtocol community is converging on:

Hybrid approaches for complex data types
Commons infrastructure for shared resources (like location data)
Progressive enhancement from embedded to referenced
Community curation for deduplication
Explicit version management via CIDs

Conclusion

The choice between authoritative and unauthoritative records in ATProtocol isn't binary - it's contextual. The most successful implementations use hybrid approaches that balance the trade-offs based on specific use cases.

For calendar events with locations, embedding essential location data while maintaining optional references to canonical venues provides the best balance of accessibility, performance, and maintainability.

My experience suggests starting with embedded data for simplicity, then selectively adding references where their benefits (deduplication, authority, updates) outweigh their costs (complexity, network dependency). As the ecosystem matures, expect more standardized patterns and tooling to emerge around these fundamental architectural choices.

This post has had 1 interaction.

app.bsky.feed.post (1)