ATProtocol Record References: Authoritative vs Unauthoritative Patterns
Published by @smokesignal.events on 2025-07-31 15:40 UTC.
If you've been building on ATProtocol, you've probably wrestled with this question: should I reference another record using a strongRef
, or should I just embed the data directly? It's one of those architectural decisions that seems simple at first but has surprisingly deep implications.
I've been calling these two approaches "authoritative" and "unauthoritative" records, though I'll admit I'm not entirely sold on the terminology. The ATProtocol community also uses terms like "embedded records," "inline records," or "denormalized records" for what I'm calling unauthoritative records. But "anonymous records" felt too overloaded, and could cause confusion with actual anonymous data.
Here's the basic distinction:
Authoritative record references maintain a reference to the authority and key. Think of a com.atproto.repo.strongRef
object or an at://
URI. When you encounter one of these in a record, you need to do an additional lookup to actually use the data.
Unauthoritative records embed the content directly. No references, no lookups - the data is right there in the record.
Let me show you what I mean with a concrete example from the Lexicon Community calendar event system I've been working on.
Example: Calendar Events with Locations
Here's what an authoritative record reference looks like in practice. A Lexicon Community calendar event with an authoritative reference to a location:
{
"locations": [
{
"$type": "com.atproto.repo.strongRef",
"uri": "at://did:plc:rkh7hrrprgtarfraofpbh4vt/community.lexicon.location.address/01k1gmwnhn2s7c0y55p5jwbe2v",
"cid": "13609594aa.."
}
]
}
And here's the same event using an unauthoritative (embedded) approach:
{
"locations": [
{
"$type": "community.lexicon.location.address",
"country": "US",
"locality": "Oakwood",
"name": "Orchardly Park",
"postalCode": "45419",
"region": "Ohio",
"street": "2599 Delaine Ave"
}
]
}
At first glance, the embedded version looks simpler, right? And in many ways it is. But this choice can have substantial impact on your system architecture.
Why This Matters
The two different ways to reference record data have big impact on accessibility, complexity, and ownership.
Take Smoke Signal, an event and RSVP management and discovery site, as an example. By having locations embedded in the event directly, it means that the event organizer "owns" the event data. They're responsible for updating it, and the visibility of the event data is tied directly to the event itself. Systems that watch for relay or jetstream commits of community.lexicon.calendar.event
records can see the location data immediately and index it accordingly.
There are some benefits to doing this, and they revolve around the core idea that addresses are folksonomies. There is some data duplication, but I believe that it's an acceptable compromise. For example, a regular meetup at a venue may have slight changes between events that don't warrant creating entire location records. The street address, locality, and administrative area may be the same, but the meeting room or area may be different week by week.
Data Changes and Updates
The major arguments for using authoritative record references typically come down to:
- Data deduplication - Store once, reference many times
- Updates - Change once, updates everywhere
For locations that are "permanent" and public, having a strong reference can make a lot of sense. The address of the local park won't change very often, and if it were to, then the CID can point to different versions or forward changes. On the same note, if the location were to have a content change, then it would be inherited and events that reference it may not need to be updated.
But here's where theory meets practice...
Technical Implementation of Record References
The strongRef Architecture
The com.atproto.repo.strongRef
represents ATProtocol's solution for immutable, verifiable references. It combines two critical components:
{
"uri": "at://did:plc:44ybard66vv44zksje25o7dz/app.bsky.feed.post/3jvz2442yt32g",
"cid": "bafyreigbtj4x7ip5legnfznufuopl4sg4knzc2cof6duas4b3q2fy6swua"
}
Key Properties:
- URI: Provides addressability within the repository structure
- CID: Content Identifier that cryptographically verifies the record's content using SHA-256 hash
This dual-component design ensures both location (where to find it) and integrity (what you should find) are preserved. The CID makes the reference immutable - any content change produces a new CID, effectively breaking old references.
Weak References
Beyond strongRef, ATProtocol also supports what I call "weak" authoritative references. These are objects that contain just a uri
field with a URI or ATURI (at://
) formatted value:
{
"$type": "weakLocation",
"uri": "at://did:plc:rkh7hrrprgtarfraofpbh4vt/community.lexicon.location.address/01k1gmwnhn2s7c0y55p5jwbe2v"
}
The main use case here is when you're saying "there's a record that can be found at this location" but you don't care about the specific version or content integrity. This is useful when:
- You want to reference the "current" version of something, whatever that may be
- The referenced content updates frequently and you always want the latest
- You're building looser coupling between records
- You don't need cryptographic proof that the content hasn't changed
Think of it as the difference between linking to a specific Git commit (strongRef) versus linking to the HEAD of a branch (weak reference). Both have their place, depending on your needs.
Embedded Record Pattern
The alternative pattern embeds data directly within the parent record:
{
"locations": [
{
"$type": "community.lexicon.location.address",
"country": "US",
"locality": "Oakwood",
"name": "Orchardly Park",
"postalCode": "45419",
"region": "Ohio",
"street": "2599 Delaine Ave"
}
]
}
This approach trades referential integrity for self-containment and simplicity.
Real-World Implementation Examples
Smoke Signal (Events & RSVPs)
Smoke Signal uses both patterns in different ways.
Events have reference unauthoritative record data for locations. This creates a strong link between the owner of an event and the event details.
RSVPs use a strongRef
to event records, capturing the exact version of the event at RSVP time. This allows users to prove they RSVP'd to a specific version of an event, even if details change later.
Important caveat: When events are deleted, RSVP references become orphaned. The solution was to discourage deletion in favor of state changes (canceled, postponed).
WhiteWind (Blogging Platform)
The first third-party AppView demonstrates:
- Custom lexicons (
com.whtwnd.blog.entry
) that extend ATProtocol - Standard reference patterns for cross-blog citations
- Challenges with limited documentation on record structures
Record Editing Implementation (verdverm.com)
A copy-on-write pattern for edit history:
$orig
field stores strongRef to original record$hist
array maintains chronological edit history- Maintains stable URI while CID changes with edits
- Trade-off: Creates potential social attack vectors as likes/comments remain attached
Note that official recommendations from protocol engineers discourage the use of fields starting with the "$" character, so this approach diverges from standard practices.
Comparative Analysis with Other Protocols
ActivityPub: Origin-Based Decisions
ActivityPub's approach provides a valuable lesson:
- Same-server content: Embedded for performance
- Cross-server content: Referenced to maintain authority
- Decision based on trust boundaries rather than content type
IPFS/IPLD: Pure Content-Addressing
Everything is a cryptographically verifiable reference:
- Minimal metadata embedding (size, MIME type)
- Enables cross-protocol interoperability
- Trade-off: Performance depends on content availability
Matrix: Operational vs Relational
Matrix embeds operational data (messages, state) but references relational data (replies, threads):
- Strong eventual consistency through state resolution
- Good performance with minimal fetches
- Clear separation of concerns
Privacy Considerations
I think privacy is a non-starter here. From the information we have about where ATProtocol is going with private records, they won't likely have the same shape as current records and repositories. It's hard to speculate what they will look like, how they are referenced, or what their access patterns will be.
For that reason, I'm removing privacy and security as a consideration for this analysis.
Best Practices and Recommendations
Decision Framework
Use strongRef when:
- Content has clear authority/ownership
- Updates should propagate automatically
- Referential integrity is critical
- Storage efficiency matters at scale
- Version-specific references are needed
- You want to give identities a way to reference a specific version, protecting them from having the rug pulled out from under them if the referenced content changes dramatically
Use embedding when:
- Offline access is required
- Performance is critical
- Content is small and unlikely to change
- Simplified client implementation is priority
- Avoiding orphaned references is important
Implementation Guidelines
- Start simple: Begin with embedding, add references as needs emerge
- Design for degradation: Ensure core functionality works without references
- Cache strategically: Balance freshness with performance
- Version explicitly: Use CIDs for immutable references when needed
- Monitor orphans: Implement cleanup strategies for broken references
Community Patterns Emerging
The ATProtocol community is converging on:
- Hybrid approaches for complex data types
- Commons infrastructure for shared resources (like location data)
- Progressive enhancement from embedded to referenced
- Community curation for deduplication
- Explicit version management via CIDs
Conclusion
The choice between authoritative and unauthoritative records in ATProtocol isn't binary - it's contextual. The most successful implementations use hybrid approaches that balance the trade-offs based on specific use cases.
For calendar events with locations, embedding essential location data while maintaining optional references to canonical venues provides the best balance of accessibility, performance, and maintainability.
My experience suggests starting with embedded data for simplicity, then selectively adding references where their benefits (deduplication, authority, updates) outweigh their costs (complexity, network dependency). As the ecosystem matures, expect more standardized patterns and tooling to emerge around these fundamental architectural choices.