Smoke Signal Blog

ATProtocol Record References: Authoritative vs Unauthoritative Patterns

Published by @smokesignal.events on 2025-07-31 15:40 UTC.

If you've been building on ATProtocol, you've probably wrestled with this question: should I reference another record using a strongRef, or should I just embed the data directly? It's one of those architectural decisions that seems simple at first but has surprisingly deep implications.

I've been calling these two approaches "authoritative" and "unauthoritative" records, though I'll admit I'm not entirely sold on the terminology. The ATProtocol community also uses terms like "embedded records," "inline records," or "denormalized records" for what I'm calling unauthoritative records. But "anonymous records" felt too overloaded, and could cause confusion with actual anonymous data.

Here's the basic distinction:

Authoritative record references maintain a reference to the authority and key. Think of a com.atproto.repo.strongRef object or an at:// URI. When you encounter one of these in a record, you need to do an additional lookup to actually use the data.

Unauthoritative records embed the content directly. No references, no lookups - the data is right there in the record.

Let me show you what I mean with a concrete example from the Lexicon Community calendar event system I've been working on.

Example: Calendar Events with Locations

Here's what an authoritative record reference looks like in practice. A Lexicon Community calendar event with an authoritative reference to a location:

{
    "locations": [
        {
            "$type": "com.atproto.repo.strongRef",
            "uri": "at://did:plc:rkh7hrrprgtarfraofpbh4vt/community.lexicon.location.address/01k1gmwnhn2s7c0y55p5jwbe2v",
            "cid": "13609594aa.."
        }
    ]
}

And here's the same event using an unauthoritative (embedded) approach:

{
    "locations": [
        {
            "$type": "community.lexicon.location.address",
            "country": "US",
            "locality": "Oakwood",
            "name": "Orchardly Park",
            "postalCode": "45419",
            "region": "Ohio",
            "street": "2599 Delaine Ave"
        }
    ]
}

At first glance, the embedded version looks simpler, right? And in many ways it is. But this choice can have substantial impact on your system architecture.

Why This Matters

The two different ways to reference record data have big impact on accessibility, complexity, and ownership.

Take Smoke Signal, an event and RSVP management and discovery site, as an example. By having locations embedded in the event directly, it means that the event organizer "owns" the event data. They're responsible for updating it, and the visibility of the event data is tied directly to the event itself. Systems that watch for relay or jetstream commits of community.lexicon.calendar.event records can see the location data immediately and index it accordingly.

There are some benefits to doing this, and they revolve around the core idea that addresses are folksonomies. There is some data duplication, but I believe that it's an acceptable compromise. For example, a regular meetup at a venue may have slight changes between events that don't warrant creating entire location records. The street address, locality, and administrative area may be the same, but the meeting room or area may be different week by week.

Data Changes and Updates

The major arguments for using authoritative record references typically come down to:

  1. Data deduplication - Store once, reference many times
  2. Updates - Change once, updates everywhere

For locations that are "permanent" and public, having a strong reference can make a lot of sense. The address of the local park won't change very often, and if it were to, then the CID can point to different versions or forward changes. On the same note, if the location were to have a content change, then it would be inherited and events that reference it may not need to be updated.

But here's where theory meets practice...

Technical Implementation of Record References

The strongRef Architecture

The com.atproto.repo.strongRef represents ATProtocol's solution for immutable, verifiable references. It combines two critical components:

{
  "uri": "at://did:plc:44ybard66vv44zksje25o7dz/app.bsky.feed.post/3jvz2442yt32g",
  "cid": "bafyreigbtj4x7ip5legnfznufuopl4sg4knzc2cof6duas4b3q2fy6swua"
}

Key Properties:

This dual-component design ensures both location (where to find it) and integrity (what you should find) are preserved. The CID makes the reference immutable - any content change produces a new CID, effectively breaking old references.

Weak References

Beyond strongRef, ATProtocol also supports what I call "weak" authoritative references. These are objects that contain just a uri field with a URI or ATURI (at://) formatted value:

{
    "$type": "weakLocation",
    "uri": "at://did:plc:rkh7hrrprgtarfraofpbh4vt/community.lexicon.location.address/01k1gmwnhn2s7c0y55p5jwbe2v"
}

The main use case here is when you're saying "there's a record that can be found at this location" but you don't care about the specific version or content integrity. This is useful when:

Think of it as the difference between linking to a specific Git commit (strongRef) versus linking to the HEAD of a branch (weak reference). Both have their place, depending on your needs.

Embedded Record Pattern

The alternative pattern embeds data directly within the parent record:

{
    "locations": [
        {
            "$type": "community.lexicon.location.address",
            "country": "US",
            "locality": "Oakwood",
            "name": "Orchardly Park",
            "postalCode": "45419",
            "region": "Ohio",
            "street": "2599 Delaine Ave"
        }
    ]
}

This approach trades referential integrity for self-containment and simplicity.

Real-World Implementation Examples

Smoke Signal (Events & RSVPs)

Smoke Signal uses both patterns in different ways.

Events have reference unauthoritative record data for locations. This creates a strong link between the owner of an event and the event details.

RSVPs use a strongRef to event records, capturing the exact version of the event at RSVP time. This allows users to prove they RSVP'd to a specific version of an event, even if details change later.

Important caveat: When events are deleted, RSVP references become orphaned. The solution was to discourage deletion in favor of state changes (canceled, postponed).

WhiteWind (Blogging Platform)

The first third-party AppView demonstrates:

Record Editing Implementation (verdverm.com)

A copy-on-write pattern for edit history:

Note that official recommendations from protocol engineers discourage the use of fields starting with the "$" character, so this approach diverges from standard practices.

Comparative Analysis with Other Protocols

ActivityPub: Origin-Based Decisions

ActivityPub's approach provides a valuable lesson:

IPFS/IPLD: Pure Content-Addressing

Everything is a cryptographically verifiable reference:

Matrix: Operational vs Relational

Matrix embeds operational data (messages, state) but references relational data (replies, threads):

Privacy Considerations

I think privacy is a non-starter here. From the information we have about where ATProtocol is going with private records, they won't likely have the same shape as current records and repositories. It's hard to speculate what they will look like, how they are referenced, or what their access patterns will be.

For that reason, I'm removing privacy and security as a consideration for this analysis.

Best Practices and Recommendations

Decision Framework

Use strongRef when:

Use embedding when:

Implementation Guidelines

  1. Start simple: Begin with embedding, add references as needs emerge
  2. Design for degradation: Ensure core functionality works without references
  3. Cache strategically: Balance freshness with performance
  4. Version explicitly: Use CIDs for immutable references when needed
  5. Monitor orphans: Implement cleanup strategies for broken references

Community Patterns Emerging

The ATProtocol community is converging on:

Conclusion

The choice between authoritative and unauthoritative records in ATProtocol isn't binary - it's contextual. The most successful implementations use hybrid approaches that balance the trade-offs based on specific use cases.

For calendar events with locations, embedding essential location data while maintaining optional references to canonical venues provides the best balance of accessibility, performance, and maintainability.

My experience suggests starting with embedded data for simplicity, then selectively adding references where their benefits (deduplication, authority, updates) outweigh their costs (complexity, network dependency). As the ecosystem matures, expect more standardized patterns and tooling to emerge around these fundamental architectural choices.

This post has had 1 interaction.