A Purely Distributional Embedding Algorithm
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper introduces the Distributional Embedding Algorithm (DEA), a purely deterministic framework for generating word embeddings through an \emph{Iterative Structural Saliency Extraction}, which is based on a natural Galois correspondence. Unlike stochastic "black-box" machine learning models, DEA grounds semantic representation in the topological structure of a corpus, mapping the redistribution of semantic mass across identifiable structural nuclei. We apply this model to a controlled dataset of 300 propositions from David Bohm’s \textit{Wholeness and the Implicate Order}, identifying four primary semantic basins that account for 74\% of the text's logical flow. By tracking the iterative expansion of these clusters, we demonstrate a ``topological collapse'' where shared lexical pivots connect distant propositions. Validation via cosine distance measures confirms high structural orthogonality between core conceptual terms and extrinsic category noise (e.g., \textit{intelligence} vs. \textit{desk}, $d=0.99$). We conclude that DEA offers a computationally efficient, transparent, and structurally-aware alternative that can be integrated with existing neural architectures to enhance interpretability in semantic modeling. Moreover, DEA is based on the \textbf{Logarithmic Hypothesis} about the dimension of the embedding vectors, w.r.t. the number of propositions of the corpus. While modern AI architectures require thousands of embedding components to process $10^{13}$ propositions, the DEA approach suggests a structural collapse of complexity, where the global semantic manifold can be distilled into $L \approx \log_{10}(13)$ features. Even at a hyper-refined resolution of $L\approx 30$, the model offers a deterministic, ``white-box'' alternative to current neural networks, providing a thousand-fold increase in computational efficiency without sacrificing logical precision.