MainPage8

The InfoCentral Project Last update: March 28, 2023.

InfoCentral is in transition..

Breakthrough LLM AI technologies have changed the game

The InfoCentral project has historically been focused more heavily upon the symbolic (human-meaningful) side of AI. This included derivatives of Semantic Web and related knowledge representation research, which were dependent on human effort to curate and categorize structured data. The symbolic approach to AI has largely lost the race to the neuro/neuro-symbolic (opaque vectors, statistical) side of AI research, notably the success of the Large Language Model (LLM) paradigm. Due to its faculties with natural language, remarkably little human involvement is needed to infer structure and meaning from traditional (mostly unstructured) information. As such, some of the approaches previously explored in the InfoCentral project have lost relevance. That said, LLM-based tools can perhaps help us finish the symbolic side of the grand AI project, by doing the tedious work that humans were reluctant to do! It has long been theorized that a combination of approaches will ultimately provide the most robust and performant systems, able to work with messy transient unstructured information while also building networks of verifiable persistent structured information that both humans and machines can cross-verify. This may prove useful in both ensuring that future AI systems remain aligned to human goals and values and that inevitable abuses and failures can be easily recovered from. To that end, some research results of this project may be re-usable in a new context.

Most of the information on this website should be considered historical at this time. A summary of prior research results will be provided soon, along with a condensed version of the proposed “Global Information Graph” (GIG) content-addressed data model that may be useful moving forward.

Historical project content:

An Architectural Approach to Decentralization

InfoCentral is an information-centered architecture for better software and a decentralizable internet. It is foremost concerned with data portability, semantics, and interoperability. Decentralized information, itself, is the platform – a neutral foundation that software and networks can evolve around. This avoids the adoption pitfalls of systems-first approaches, such as particular network designs or programming platforms. These often require a high investment from early adopters, with a risk of total loss if a design does not succeed. Instead, supporting systems will evolve through open collaboration and market forces, to meet the needs of decentralized information.

To learn more about this approach, read our introductory article about decentralized information: Decentralized Information and the Future of Software (draft)

InfoCentral is also connected with the Future of Text project. See Chris’ submission to the Future of Text 2020 book.

Our latest technical design proposal can be found here: Initial Design Proposal (huge update coming soon!)

The first round of core specifications will soon be posted to GitHub, followed by prototype repository implementations in Scala and Rust.

Slides from the lightning talk given at Decentralized Web Summit 2018: Lightning Talk Slides

Slides about relation of InfoCentral to the Semantic Web effort: InfoCentral and the Semantic Web

A Unifier of Decentralized Internet Technologies

Many decentralized internet projects have produced valuable ideas and inspiration. Unfortunately, their contributions are often difficult to combine. Consensus on shared foundations is needed to integrate the best ideas into a unified technology ecosystem. There is no one-size-fits-all solution. Therefore, a wider architecture is needed to allow various efforts to specialize over quality-of-service properties.

InfoCentral’s minimalist, graph-oriented Persistent Data Model provides an ideal foundation to promote collaboration and cross-pollination among decentralized internet technology projects. Our design proposal describes this model in detail, along with other unifying abstractions built upon it. A simplified explanation is that Persistent Data Model is a refactoring of the Semantic Web to better separate concerns and dramatically reduce the learning curve. In the process, it eliminates dependencies on legacy Web architecture.

A New Hypermedia for the Information-Centric Internet

General Data Model

InfoCentral provides for the information-centric internet what HTML, XML, DNS, and URIs were for the classic host-centric internet. The Persistent Data Model is an extensible, cryptography-minded standard for containing, linking, and layering all types of data, with no dependence on particular infrastructure, whether centralized, decentralized or somewhere in-between. It is not a blockchain or Merkle DAG, but it can support these and other higher-order data models. We propose the Persistent Data Model as the “thin neck” of future internet systems.

Alongside the Persistent Data Model, InfoCentral proposes neutral standards for network repositories and the metadata collections used to track and propagate knowledge of relationships among graph data. Taking lessons learned from HTTP, this ensures universal baseline compatibility while allowing future evolution.

By mandating independence from centralized components and hierarchical structure, InfoCentral’s hypermedia design and supporting software architecture ensure that all information is fluid and recomposable by users and software agents. This opens up fundamentally new modes of interaction, collaboration, and integration.

An Archival-Oriented Data Architecture

The internet has brought about a global explosion of creativity and knowledge production. We now risk losing our own history amidst the maddening pace of technological progress, as systems and data are constantly transformed and migrated. Proprietary systems and mutable named data are among the highest risk factors. The InfoCentral Persistent Data Model is an ideal foundation for archiving human digital history. Under this model, data is in archival format by default. It does not need to be sampled from transient sources of mutable documents and databases like on the web. Nevertheless, until we fully renovate the internet, Persistent Data Model is a good tool for doing just that. Once information has been sampled, it becomes a decentralized immutable record that can be further annotated and layered upon. This is also an avenue for driving conversion from centralized systems. Each immutable sampled data entity is a nexus for third-party interaction, not merely a static library facsimile. For example, a web browser plugin could add sidebar interactivity to every page, based on the latest sampled content. Once its popularity overtakes the original web context, the switch to native decentralized publishing is trivial for the content creators. In this manner, a decentralized internet can rise up alongside and eventually supplant the legacy internet.

A Post-Application Software Architecture

We need more than mere decentralized versions of existing web / cloud services. The software architecture native to decentralized graph data is far more powerful and exciting than its supporting networks. The future is app-free computing – fully-integrated, composeable, and adaptive software functionality that comes alongside neutral information rather than creating artificial boundaries.

InfoCentral radically departs from the approaches of other decentralized internet projects, most of which are still based around application-oriented software architecture. We envision user environments that are fully dynamic and integrated rather than focused on pre-designed interactions and modalities wrapped up as static, self-contained applications. While this transition will not happen overnight, we should begin laying the foundations today.

Renovating software architecture is ultimately about getting the abstractions right. Application-oriented software architecture is bound by assumptions that need not apply to decentralized systems. For example, cryptographic measures can replace database access control. This liberates users from centralized sources of data that must be protected and abstracted by centralized business logic. Without these restrictions, users can freely extend data and software in ways not anticipated by the original designers. Each user can layer and weave customizations useful to their needs without risk of compromising shared data. Anyone can publish graph data while trust networks guide end-user filtration and replication.

Post-application software architecture makes heavy use of declarative programming paradigms. This promotes runtime interpretation and composition, leaving room for much more fluid interactivity and customization than static applications.

Interaction Patterns over Graph Data

An Ideal Substrate for AI Development

Human-centric software technologies are a hindrance to AI because they are littered with implicit knowledge and manual processes. Authoritative naming is the worst offender because it is an anchor to manual data management. (ie. Meaningful names must be arbitrarily chosen and enforced by a system controlled by human rules.) Likewise, software applications, with their human-centric UI paradigms and interaction modalities, are clumsy barriers to AI agents.

By re-centering computing around independent, graph-structured, semantically-rich data, the InfoCentral architecture paves the way for future AI development.

We believe in re-humanizing technology, to ensure that it that helps real people and is more widely accessible, understandable, and personalizable. Our vision of social computing promotes:

native, default collaboration around all data
user-owned and controlled content
community-oriented operation – stability, moderation, organic growth
rationality, civility, and peer review as default cultural attitudes
trust networking and skill reputation systems
maximized entrepreneurship opportunities and lower barriers to market entry
a fair playing field from hyperlocal to global economic scales
consumer empowerment
robustness against misinformation and extremism by promoting contextualization
an unprofitable environment for spam and other mass-scale fraud
systems that encourage the best quality and most useful information to rise to the top
strong censorship-resistance, combined with community-driven filtration and curation

A Foundation for Learnable Programming Environments

We admire thought-leaders like Bret Victor, Chris Granger, Paul Chiusano, who have all recognized that the way we program today (and even just use computers in general) doesn’t make sense. Current methods aren’t natural for humans, widen digital divides by making technology too difficult, and at the same time create hindrances for deep innovation in machine learning and intelligence. As Granger notes, programming needs to be direct, observable, and free of incidental complexity – not an arcane exercise in weird syntaxes, wrangling of black boxes, and removal from the problems at hand by endless layers of arcane abstraction. For programming to become learnable and thereby accessible to all, it must be possible to see all state and control flow in operation, to access all vocabulary in context, to modify behavior in place with immediate result, and to decompose, recompose, and abstract at will.

Existing projects in the area of natural UIs and programming understandably tend to first focus on human-facing languages, visualizable data structures, and related interactive modalities. While inspirational, none propose a standard, globally-scalable, graph-structured persistent data model that is capable of bridging their experiments with broader research in distributed systems. We believe that user environments are best built up from shared semantically-rich information that is designed before a single piece of code is written. Taking the infocentric approach allows everything around information to develop independently. It is insufficient to simply wrap relational or JSON document databases, leaving semantics to casually evolve through competing remotable code interfaces. Likewise, starting with a functional language or VM leads to immediate dependencies and adoption hurdles. Composition over the neutral foundation of shared semantic graph data allows for unhindered language, UI, and network research. To avoid leaky abstractions, the complexities of secure distributed interaction must be addressed from the beginning, in a platform, language, and network neutral manner. Factors around these global concerns directly affect how features needed by programmable UIs are later provided. They also determine whether the resulting systems will be machine-friendly as well.

A Unified Communication, Collaboration, and Community-Building Platform

Redundant communication protocols and social network services continue to proliferate wildly. It makes no sense for there to exist dozens of competing methods for sending or publishing small pieces of text or performing simple collaborations. This is the application-centric philosophy at work, the welding together of data, business logic, presentation, and quality of service into ever more functionality silos.

InfoCentral standardizes the data and basic interaction patterns around communication, collaboration, and social computing, separating related information from supporting services and software. Competing networks may then evolve around the open data, providing for varying quality-of-service needs. Composeable software, under full local control of users, adapts shared communications data toward unlimited varieties of interactions and user experiences.

A Unified Global Information Management Platform

Designing a unified information management platform starts with accepting that it is inherently impossible to create a consistent and unified view of the world. The real world is eventually-consistent and so is all real-world information. Truth exists at the edges of networks and propagates through available channels. Ambiguities, falsehoods, and contradictions also arise and propagate. Social trust varies over time. Decisions must be made with incomplete and often conflicting information.

The only plausible solution to this dilemma is to assume that information will be multi-sourced, but make it easily layerable. This demands stability of reference, so that compositions and annotations can be built across even antagonisitic datasets. This is a primary motivation for our exclusive use of hash-based data referencing.

One of the primary challenges of the Semantic Web effort has been the creation of useful ontologies. It is notoriously difficult to achieve global, cross-cultural standardization of even simple concepts, with parallels seen in natural language processing. If we expected perfect consistency, this would indeed be intractable. Recent deep learning translation research successes may point the way, however. Instead of starting by gathering domain experts to manually design ontologies, machine-generated concept maps can be used to seed the process of collaborative ontology development. InfoCentral’s proposal for stable hash-based references to concept nodes, along with layering and context discovery, make this feasible as a globally-scalable, evolvable solution. Unlimited specialization of concepts via annotation alleviates the need for universal agreement on terms. If layered, a system can use whatever it understands and refine over time.

A Unified Private Information Management Platform

There is no valid reason for personal and private business information to be scattered across dozens of isolated filesystems, databases, storage mediums, devices, public and private internet services, and web applications. This is simply an artifact of the past era of computing, where devices and softwares were largely designed as standalone “appliances” that didn’t need to interact to one another – forcing the user to do all the work in between.

We believe that all information should be integrated, all the time, without artificial boundaries. Users shouldn’t have to worry about manually moving data around or wrestling it into different formats for different uses. Information should never be trapped at service or application boundaries. And it should be trivial to ensure that all of one’s information is stored redundantly.

A Secure, Private, User-controlled Environment

InfoCentral promotes users’ control of their own information, with flexible control of data visibility through ubiquitous cryptography and reliable attribution through signing. InfoCentral promotes direct network service models over the user-surveillance and forced-advertising models relied upon by nearly all proprietary websites and apps. Unlike other projects, however, InfoCentral does not propose that everyone should use the same network model. (ex. It has no dependencies on blockchains or DHTs.) By standardizing information first, users are free to switch among networks and software at will. Because it is no longer embedded, any advertising must be re-imagined as an independently desirable commercial service rather than a form of manipulation. Users could actively choose to opt-in if they find a genuine benefit.

InfoCentral will let us create..

Standardized Interaction Patterns

Interaction Patterns are declarative contracts for how shared graph data is used to accomplish things like sending a message, collaboratively editing a document, engaging in a threaded discussion, conducting a secret ballot, bidding on an auction, making a reservation, conducting any manner of business, or playing a game. Today, all of these sorts of interactions would need specialized software, like mobile apps. In the InfoCentral model, users can just grab a pattern, share it with participants, and start interacting.

Decentralizable services

Any user or system can operate over the global data graph. There is no need for custom web services as coordination points. Any sharable data repository will do. Services are simply automated participants in Interaction Patterns. They might be a local centralized trusted system. They might be a Distributed Autonomous Organization living in a blockchain network. The data model is agnostic to these details.

One size never fits all. The InfoCentral model promotes diverse public and private networks that can be woven together seamlessly thanks to layerable data and reliable referencing.

Quality discourses

While it’s hard to get people to agree, it’s even harder today to get people to talk constructively. Layered, hash-referenced information allows many participants to engage one another without censorship on any side. It ensures reliable conversation history and the ability to contextualize, cross-reference, annotate, and revise discussion points over time. With engaged communities, the best information and arguments can rise to the top, even amidst lack of true consensus. It almost goes without saying that such tools will also be a boon to communities already accustomed to civil discourse, like academic and scientific research.

Uniquitous Computing Environments

The Internet of Things is a great idea without the proper infrastructure to support it. Current solutions are embarassingly clumsy, insecure, inflexible, and unreliable. Consider the absurdity of a home thermostat or lighting appliance that must connect to a central web server just to set an integer value or an array of datetime-value tuples – all through an opaque proprietary interface that can only talk to a special mobile app. Such solutions are nowhere close to the promise of universal interoperability that has defined Uniquitous Computing research.

The semantic graph data standardization that InfoCentral proposes is the ideal universal interface for composing tomorrows Uniquitous Computing environments, bringing IoT devices into genuinely integrated meshes of personal and commercial functionality.

A Formal Introduction..

InfoCentral is a next-generation internet engineering project and proposal. It combines Information-Centric Networking, persistent graph data models, declarative programming, and the best elements of the Semantic Web into a new software and internet architecture – one that is fundamentally decentralized and distributable, while also easier to secure.

An information-centric internet and software ecosystem is fundamentally more composeable, contextual, and collaborative. Apps and sites are replaced by a fully integrated information environment and personalizable workspaces. The user is free to layer and adapt information and software to their needs, whether that user is human or AI.

InfoCentral has exciting practical applications for early adopters. However, it ultimately designs for a future driven by practical forms of artificial intelligence, more collaborative social and economic patterns, and an expectation of universal technology interoperability.

Purpose

Current software and internet architectures no longer properly support our ambitions. The InfoCentral proposal comprises a vision and set of principles to create clean-slate, future-proof open standards for information management, software engineering, and Internet communication. While InfoCentral builds upon academic research, it is a practical engineering project intent on real-world results.

Architectural Pillars

Secure-hash-based data identity and referencing

Within the InfoCentral data model, entities are exclusively referencable using cryptographically-secure hash values. Unlike URIs, hash IDs never go stale. They are mathematically linked to the data they reference, making them as reliable as the hash algorithm. InfoCentral designs take into account the need to migrate to stronger algorithms over time, while also mitigating the impact of discovered weaknesses. (ex. multi-hash references, nonces, MACs, size and other reference metadata, strict schema validations, etc.)

Mutable pointers are strictly disallowed by the data model because reference instability is not conducive to decentralized collaborative information. Human-meaningful naming is also disallowed in the data model due to its hierarchical nature, implicit encoding of semantics, and requirement for arbitrary manual human labor. While arbitrary object name metadata is supported at the UI level, memorable identifiers comparable to DNS and file paths are a false requirement based on legacy designs. There is no need to remember and input arbitrary names and addresses in a properly designed information environment. Likewise, AI has no use for human naming but does require the mathematical reliability that only hash-based identities can provide.

No single authoritative dereferencing scheme

Global, reliable dereferencing is historically unrealistic in practice, even before considering the need for permanent, flat data identity. Current approaches are costly and fragile. Going forward, the best approach is to support modularity. Network innovation must be unhindered, so that economics and popularity can drive QoS. Many networks and contained information overlays will also be private. The InfoCentral proposal has no expectation of a single global DHT, blockchain, or similar structure, though such approaches may be useful to spread lightweight information about available networks and to serve as a bootstrapping mechanism.

We wholesale reject hierarchical naming and resolution schemes (ie. two-phase) in which data identity is inseparably conflated with a network-specific locator component – even if it is PKI/hash-based. However, for the internal management of data exchange, networks may use any suitable packet identification, routing and metadata schemes. These are invisible and orthogonal to the Persistent Data Model, which is entirely portable between systems and networks.

Information-Centric Networking

Information-centric networks make data directly addressable and routable, abstracting most or all aspects of physical networks and storage systems. This causes data itself to become independent of the artifacts that support its physical existence, effectively removing the distinction between local and global resources. Users and high-level software are thus liberated from worrying about these artifacts and may treat all data as if it were local. A request for a data entity by its hash ID returns its contents, without knowledge of where it came from or how it was retrieved.

Unlike some related projects, InfoCentral intentionally does not specify a single, particular networking scheme. One-size-fits-all network designs are economically detrimental. Redundancy and performance needs vary greatly and often cannot be predicted. Many host-based and content-based networks can be used to transparently back InfoCentral-style repositories, each bringing their own unique economics and QoS parameters. Meanwhile, information itself has permanence while the networks and software around it evolve.

Networks of the future will be smarter, with content-awareness often driving replication. Constellations of linked, related, and adjacently-accessed information will tend to become clustered near locations where popularity is high. Service of subscriptions and interest registrations will likewise play a large role in shaping data flows.

Reference metadata collection

In any system founded upon immutable data structures, an out-of-band mechanism must provide a means to aggregate or notify of new data over time. Having rejected mutable pointers, InfoCentral instead uses reference metadata collections for discovered data around what is already known. Reference metadata pertains to what data entities reference a given entity (and potentially why). For example, a new revision references a previous revision or revision collection root. Upon creation, knowledge of its existence can be propagated to interested users.

Any given reference metadata collection is inherently partial knowledge of globally existent references to an entity. All nodes have their own collections per entity. The means of management are left unspecified because there are many possible schemes of propagation across and between varied networking schemes. Again, this allows for endless specialization without changing the data model – from state-based CRDTs to even fully synchronous replication among federated repositories.

Metadata collections allow for unlimited layering of information from unlimited sources. It is up to data consumers to decide which metadata is useful, for example based on type, timestamp, or signatures from trusted parties. Networks may also have rules about what metadata references they are willing to collect and/or they may provide query capabilities for clients.

Graph-based data models

Structuring information as a persistent graph is the only method that allows unlimited, global-scale, coordination-free composition and collaboration. Persistent graphs are even more powerful for data than hyperlinks were for the web of HTML pages. They allow precise 3rd party references that cannot break later, so long as the referenced entity exists somewhere in the world. The exclusive use of hash-based references means that data entities natively form a directed acyclic graph. With metadata reference collection, however, this becomes a bidirectional graph in the local scope. (similar to web search engine “referenced by” indexing)

All higher-level data structures built upon the persistent data model may take advantage of basic graph semantics. Semantic Web data is an obvious native fit, but all forms of personal and business data will be able to take advantage of the features that the graph data model provides, such as default versioning and annotation capabilities.

Property Graph Entities

Declarative programming models

Programming models where code owns mutable data are incredibly fragile and the source of most software problems today. Code and data must become orthogonal so that re-use is not hindered. Code may be applied to operate upon data and produce new data, but may not own data or change what already exists. This is a sharp departure from mainstay Object Oriented Programming and it requires a complete paradigm shift in thinking and development methodology. Fortunately, functional programming research has already paved the way to this future. It is the natural fit for the persistent graph data model we envision, in combination with other declarative models, of which functional is a branch.

Declarative code is the most effective route toward widespread parallelization. As processor core count continues to grow exponentially, this will quickly become non-negotiable. Declarative code is also the shortest path to verifiably secure systems and is the easiest for AI to reason about. Likewise, flow of data and control can be easily visualized and analyzed in a working system.

Pattern-driven graph interactions replace APIs

The graph of immutable entities is the universal software interface. Users, whether human or machine, interact solely by adding new entities that reference existing entities. Patterns of doing so are captured by declarative code, enabling standardization of useful interactions without the data encapsulation and dependency-creation of traditional APIs. Many Interaction Patterns can be used over the same open public data graph. Thanks to the elimination of shared writable objects through data entity immutability, users’ interactions cannot interfere with one another. This allows unlimited public and private overlays without needing permission or coordination. There is likewise no need to sandbox code, rather we may designate read access policies. Like patterns themselves, these policies can be collaboratively developed and trusted.

Modern software design usually starts with human-oriented user stories, often focused on views, and is dictated by a hierarchy of functionality designed to support these. This is incompatible with creating systems that are natively useful to AI. It is also incompatible with creating fully integrated information environments for humans, the ultimate realization of which is Ubiquitous Computing.

Pattern-driven graph interactions form the foundation upon which all higher-level user stories are realized. By reducing interaction to declared abilities and intentions, all human UI modalities can be automatically generated. Preferences and the user’s environment may be taken into account automatically.

Cryptography preferred to access control lists

Access controls are notoriously difficult to perfectly enforce. They also result in data being bound to particular systems and harder to securely and reliably back up. While cryptography is no panacea, it can at least consolidate security practices to a manageable number.

Architecture Quick Summary

A decentralized and distributable data management architecture

Exclusive use of secure-hash-based identities and references
Persistent graph-based data model, using immutability entities
No baked-in namespaces, path semantics, or hierarchies
General-purpose metadata collection model, to support revision and composition
Many-source information layering without conflicts
Allows for competing Information-Centric Networking protocols
Default strong cryptography
Default versioning semantics
100% machine friendly

An ‘Information Environment’ software architecture

Modular, programmable, fluid, multi-modal, learnable user interfaces
Seamless integration of all communications and information management
Collaborative, global ontologies to manage all data schemas
No applications - unbounded composeable functionality
Zero management of incidental software artifacts by end-users
Shared environment for both humans and AI agents
End-user control of private data (client-side focus)
Social by default (no need for 3rd party services)
100% human and machine friendly

Core Components

Project Philosophy

Why does technology architecture matter?

In the modern world, the architecture of information and the technology surrounding it dramatically influences how people interact with both technology and each other. As with public infrastructure, changes to IT architecture often produce massive downstream social changes. There should therefore be a great sense of responsibility when engineers design information systems.

We believe that the InfoCentral vision can especially improve society in areas of collaboration, community-building, contextualization, and civility. Education, healthcare, government, commerce, media, the arts, religion, and even interpersonal relations can all benefit from such improvements.

Who is involved with InfoCentral?

Because InfoCentral is a multi-disciplinary effort, it aims to draw a diverse community of participants. As an open source project, it will involve many developers. As a practical application of research, it has connections to academia. As a tool for social progress, it requires involvement with the public, NPO, and NGO sectors. As a platform for innovative development and commerce, it is of interest to entrepreneurs and business leaders.

How does the InfoCentral project operate?

InfoCentral has two primary operational arenas: core architecture and practical applications. The core architecture division is responsible for all low-level design and reference implementation of the data management and information environment standards. Numerous application teams focus on building generic modules and support necessary to enable particular end-user interactions and use cases. These may include crowd-sourced efforts, industry-specific consortiums, consultants, etc. Application teams build infrastructure, not “applications” in the software lingo sense. Because infrastructure is shared, cross-team collaboration should be the norm. The goal is that as little code as possible should be dedicated to meeting particular end-user needs.