What was built
A Graph Neural Network (QGTv10, ~45M parameters) trained over 2,860 epochs on a knowledge graph assembled from Wikidata SPARQL, biological databases, and scientific literature. Training ran on H100/H200 GPUs via RunPod.
The system explored relationships across scientific and symbolic domains including biology, chemistry, astronomy, physics, language, and ontology structures.
How the graph was built
Nodes represent named concepts: biological structures, physical laws, chemical compounds, organisms, astronomical objects, symbolic entities, and scientific terms.
Edges ("superedges") represent how often two nodes were co-activated during traversal through the graph. Higher edge count means the network repeatedly visited both concepts within related traversal contexts.
This is not a hand-curated ontology or a fact database. It is an emergent co-activation archive extracted from training behavior.
Simulated populations and training environments
Part of the broader experiment involved maintaining simulated populations of autonomous traversal agents inside simplified embodied environments.
These agents were not modeled as biological humans, but they operated with layered internal state systems assembled from publicly available biological, anatomical, symbolic, and linguistic structures.
Different runs experimented with synthetic DNA-derived parameter variation, simplified body-state variables (age, growth, weight, internal condition), multimodal perception pipelines, language exposure, memory persistence, traversal reward systems, and dynamically shifting long-term objectives.
Some agents operated with text-based internal narration, speech transcription pipelines, visual captioning systems, and limited language-decoding experiments.
One area of exploration involved persistence-oriented reward pressure and changing lifespan targets inside simulated environments. In some runs, longer-lasting traversal behavior appeared to emerge over time as reward conditions evolved. However, these observations were exploratory and informal. The project does not claim evidence of consciousness, general intelligence, or genuine autonomous life-like behavior.
The archive presented here preserves only the resulting traversal structure extracted from those experiments.
What is preserved in this archive
A frozen snapshot exported at epoch 2,860:
· 150,304 named nodes
· 140,026 superedges
Each node retains its global activation score, connection degree, and traversal centrality. Each edge retains co-activation count and traversal weight.
This is not the model weights themselves. It is the traversal surface: a record of what regions of the graph the system repeatedly activated together during training.
What was removed
Internal ENC: encoding nodes used during training for node disambiguation are filtered out. Wikidata QID-only nodes without resolved human-readable names are excluded from FILTERED mode, but may still appear in RAW mode.
What are attractors
Attractor nodes are concepts with extremely high global connectivity. During training they appeared in many unrelated traversal paths and became structurally dominant regions of the graph.
Examples: Seed · ATP · Gravity · Calcium Atom · Electromagnetic Force
Because attractors can overwhelm local traversal, FILTERED mode suppresses them by default to allow more domain-specific paths to emerge. RAW mode includes them.
RAW vs FILTERED
FILTERED — returns only cleaner semantic concepts: named scientific entities, validated concepts, reduced ontology noise, attractor suppression enabled. This mode attempts to surface more domain-specific traversal structure.
RAW — returns the broader unfiltered traversal artifact: lexical tokens, unresolved ontology nodes, attractor-heavy pathways, noisy graph regions. Less clean, but preserves more of the original traversal behavior.
What traversal means
When you submit a query, the system searches for matching anchor nodes and expands outward through the graph using weighted traversal paths.
The result is a local neighborhood built from co-activation frequency during training.
This is not semantic search, not an AI assistant, not generated text reasoning. It is a replay of traversal structure extracted from a trained graph system.
How to read the graph
· Blue node = anchor match (your query)
· Red nodes = attractors
· Green nodes = cleaner semantic concepts
· Amber nodes = RAW-only or noisier artifacts
Node size reflects global activation frequency. Edge thickness reflects traversal co-activation count. Drag nodes to reposition. Click any node to see its full stats.
Why does "black hole" connect to a 1732 German pastor?
Because during training, the GNN traversed those nodes in proximity. The graph reflects actual co-activation patterns — including cross-domain surprises. This is not a bug. It is an artifact of how the network explored the knowledge space.
Limitations
This is not a curated knowledge graph and not a source of verified scientific truth. Edges represent statistical traversal artifacts, not validated factual relationships.
Some cross-domain connections may appear strange, misleading, or semantically weak because the archive reflects what the network repeatedly traversed together during training — not what humans would necessarily consider meaningful.
Use the archive as a structural exploration tool, not as ground truth.
Why this project exists
This project started from a simple intuition:
that most things in the world are not isolated.
Cells depend on organisms.
Organisms depend on air, water, plants, bacteria, stars, planetary chemistry, and physical laws.
Ideas depend on other ideas.
Human beings move through networks of meaning the same way biological systems move through networks of energy and matter.
I wanted to explore what happens if a graph system is allowed to traverse across biology, physics, astronomy, language, and symbolic structures without being manually constrained into a fixed ontology.
The result is noisy, imperfect, and often strange.
But sometimes the traversal surfaces unexpected conceptual bridges that feel worth preserving.
Civilization Engine
Some experimental systems developed during training — including autonomous agent populations, traversal inheritance, and simulated adaptive environments — are documented separately.
Read Civilization Engine Notes →
Archive architecture — what is and is not deployed
✓ Superedges (140,026) — fully deployed and used for all BFS traversal. These are co-activation edges extracted from trained GNN behavior. Minimum count ≥ 2 (single-occurrence edges excluded at export time).
✓ Nodes (150,304) — fully deployed with score, connections, centrality, clean flag.
✓ discovered_edges.json (1,093 edges) — loaded and counted in status, but currently not used in BFS traversal. These are high-weight discovery edges (discovery_weight ≥ 0.72) generated during training. They are visible in /api/status but do not influence query results.
⚠ Confidence-ranked hypothesis chains — not deployed. The training process generated hypothesis-tier edges with confidence scores, but these were not included in the public export.
⚠ Export truncation — /api/export returns at most 5,000 of 140,026 edges. Full graph JSON is not publicly available.
⚠ Attractor suppression — FILTERED mode with suppress_attractors=true hides a large portion of the graph (nodes with >100 connections). RAW mode with suppress unchecked exposes the most structure.
Calcium-40: exploratory hypothesis chain (archived)
The following is a traversal analysis artifact — not a scientific claim. Derived from GNN traversal output at epoch 2,860, verified against physics literature using GPT-5.5 analysis.
This section documents an exploratory traversal-analysis chain preserved from the research process, not a validated scientific discovery.
Traversal pair discovered: Calcium-40 (nuclear physics) ↔ Ca²⁺ role in ATP biology
Hypothesis explored: Does the doubly-magic nuclear structure of ⁴⁰Ca (Z = N = 20, both canonical magic numbers) explain why Ca²⁺ was evolutionarily selected as a universal biological second messenger?
Recovered causal chain:
nuclear stability → nucleosynthetic yield → geochemical availability → ion chemistry → evolutionary selection
The one real formula in the chain (nucleosynthesis):
n(A,Z) ∝ n_p^Z · n_n^N · exp(B(A,Z) / k_B T)
where B(A,Z) is nuclear binding energy. This is the only point where nuclear structure directly affects downstream abundance — and thus biological availability.
Why Ca²⁺ is actually special (biochemistry, not nuclear physics):
· Electrochemical gradient ~10⁵ outside/inside → Nernst potential ~154 mV — a large signaling resource
· Lewis acid chemistry preferring oxygen ligands (carboxylate, phosphate, carbonyl) — ideal for fast protein binding
· Cytosolic background 50–100 nM vs. signal 0.5–10 µM → high signal-to-noise ratio
Verdict of the analysis: No direct formula B/A → P(biologically selected) exists. The magic numbers of Ca-40 contributed to elemental abundance via stellar nucleosynthesis, but biology selects based on ion chemistry and electrochemical gradients — not nuclear structure directly. The connection is real but mediated through multiple independent layers.
Traversal timestamp: 2026-05-16 · Analysis: GPT-5.5 · Source: UnThinq discovery pipeline output
Archived research note · May 2026 · Preserved as part of the UnThinq experimental archive. Not a validated scientific discovery.
PyTorch · QGTv10 · ~45M parameters · 2,860 epochs
Wikidata SPARQL · biological databases · torch-geometric · GloVe 6B 300d
Trained on H100/H200 SXM GPUs
Created by Ilnar Raisovich Tagirov