Smoke Signal Blog

Off-Protocol Data in ATProtocol Records

Published by @smokesignal.events on 2025-08-01 15:00 UTC.

The ATProtocol ecosystem presents an interesting architectural challenge: how do we handle data that doesn't naturally fit into highly structured schemas, as well as leverage the massive external datasets that already exist? This post explores a potential solution using off-protocol data references, with location data as our primary example.

As developers building on ATProtocol, we often encounter data that exists in a gray area between rigid structure and complete flexibility. Location data perfectly exemplifies this challenge — it's simultaneously universal (everyone understands "where") and incredibly variable (from GPS coordinates to "the coffee shop on Main Street").

The Challenge: Locations as Folksonomies

Let's start with a real-world scenario. Smoke Signal, an event management and discovery application, needs to handle location data for events. These locations could be:

Fundamentally, locations as communicated between humans are folksonomies — classification systems created through collaborative tagging and natural language. Their structure and completeness vary wildly based on context, culture, and purpose.

ATProtocol's Structured World

ATProtocol differentiates itself through its use of Lexicons: strongly typed schemas that define data structures. This creates predictability and interoperability but also rigidity. Here's how an address location currently looks in the Lexicon Community:

{
    "lexicon": 1,
    "id": "community.lexicon.location.address",
    "defs": {
        "main": {
            "type": "object",
            "description": "A physical location in the form of a street address.",
            "required": [
                "country"
            ],
            "properties": {
                "country": {
                    "type": "string",
                    "description": "The ISO 3166 country code. Preferably the 2-letter code.",
                    "minLength": 2,
                    "maxLength": 10
                },
                "postalCode": {
                    "type": "string",
                    "description": "The postal code of the location."
                },
                "region": {
                    "type": "string",
                    "description": "The administrative region of the country. For example, a state in the USA."
                },
                "locality": {
                    "type": "string",
                    "description": "The locality of the region. For example, a city in the USA."
                },
                "street": {
                    "type": "string",
                    "description": "The street address."
                },
                "name": {
                    "type": "string",
                    "description": "The name of the location."
                }
            }
        }
    }
}

This structure technically covers many use cases. Locations vary wildly in how they're expressed, and while this record type is flexible enough to handle most traditional addresses, it doesn't have to be the only way we reference street addresses and landmarks. The real world is messier than any single schema can capture.

The Off-Protocol Solution

What if we could reference structured data that lives outside the ATProtocol ecosystem while maintaining type safety and discoverability? This approach would allow us to:

  1. Leverage existing massive datasets (OpenStreetMap, Overture, Foursquare)
  2. Support multiple formats without lexicon explosion
  3. Avoid duplicating millions of location records on-protocol

Example: JSON Structure Schema

Using JSON Structure (json-structure.org), we can define location schemas that live off-protocol but maintain strong typing:

{
  "$schema": "https://json-structure.org/meta/extended/v0/#",
  "$id": "https://lexicon.community/schema/location/address/v0/#",
  "type": "object",
  "name": "Address",
  "description": "A physical location in the form of a street address.",
  "properties": {
    "name": {
      "type": "string",
      "maxLength": 512,
      "descriptions": {
        "lang:en": "The name of the location. For example, City Park."
      }
    },
    "country": {
      "type": "string",
      "maxLength": 8,
      "descriptions": {
        "lang:en": "The ISO 3166 country code."
      }
    }
    // ... other properties
  },
  "required": ["country"]
}

Referencing Off-Protocol Data

In an ATProtocol record, we could reference this external data:

{
    "name": "Walk in the park",
    "locations": [
        {
            "$type": "org.json-structure.record",
            "schema": "https://lexicon.community/schema/location/address/v0/#",
            "record": "https://dayton-pokemon.club/locations/orchardly-park"
        }
    ]
}

Implementation Considerations

Making this work isn't rocket science—it's mostly patterns we already know from building web apps. Clients need to fetch and cache schemas, validate external data, and handle failures gracefully. The interesting part comes with trust: who controls the external data, and what happens when it changes or disappears? These aren't deal-breakers, just questions to think through. Start conservative and expand trust boundaries as you learn what works.

Benefits of This Approach

The big win here is avoiding the duplication of millions of existing records. Why recreate massive geographic databases, business directories, or government address systems when we can reference what's already there?

This approach also handles the reality that locations come in many flavors—traditional addresses, what3words (like "filled.count.soap"), Plus Codes, coordinates, or even "the bench near the fountain." Each format can coexist peacefully, and apps can support whatever makes sense for their users without forcing complexity on everyone else.

Best of all, it enables progressive enhancement. Start simple with basic on-protocol locations, then add fancier formats as you grow. When external services go down, your app keeps working with cached data or simpler fallbacks. It's about building systems that improve over time while staying resilient when things aren't perfect.

Building on Record Reference Patterns

We've been referencing what I'm calling "authoritative record reference pattern", as described in our previous post on ATProtocol record references. We're pointing to external content that has a defined schema and structure—the external source is authoritative for that data.

What we haven't covered here is embedding unstructured external-sourced content. ATProtocol's Lexicon system includes "any" and "object" types that could handle more flexible data structures. Imagine referencing a blog post, a JSON-LD document, or other semi-structured content that doesn't fit neatly into predefined schemas. That's a whole other exploration waiting to happen — how do we balance the benefits of structure with the reality of messy, evolving data formats?

Conclusion

Off-protocol data references offer a pragmatic solution to ATProtocol's structure-flexibility tension. By embracing the web as a decentralized database, we can build richer applications without sacrificing the protocol's core principles.

This approach acknowledges a fundamental truth: not all data belongs on-protocol. Some data is too large, too varied, or too specialized. By creating well-defined bridges to external data, we expand what's possible while maintaining the benefits of a structured protocol.

The location data example demonstrates both the challenges and opportunities. As the ATProtocol ecosystem grows, patterns like off-protocol references will become increasingly important for building practical, scalable applications.


Have thoughts on this approach? Find me on Bluesky (@ngerakines) or join the discussion in the Smoke Signal Discourse.

This post has had 2 interactions.