LLM-Augmented Innovation Regime Classification: A Hybrid Framework for Patentometric Foresight

Serhat Burmaoglu
Ozcan Saritas
Karahan Kara

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Patent-based scientometrics has developed rich longitudinal indicators for characterizing technology innovation dynamics, yet translating continuous indicator vectors into discrete, interpretable regime categories — the vocabulary of technology foresight — remains methodologically underdeveloped. Existing rule-based classifiers generate high rates of unclassifiable observations and systematically neglect the qualitative semantic content embedded in patent text. This paper introduces a three-layer hybrid framework that integrates quantitative patentometric indicators, a purpose-built ontological classification scheme, and large language model (LLM) semantic calibration to address this gap. Rather than deploying LLMs as standalone classifiers, the framework formalizes their role as structured calibration agents within indicator-based scientometric workflows. The framework is evaluated on 41,475 green hydrogen patents across three Cooperative Patent Classification subdomains (C25B, H01M, Y02E) spanning 2005–2024. The first layer computes seven patentometric indicators across 54 rolling three-year windows; the second layer maps indicator profiles to the Minimal Foresight Ontology (MFO v1.0), an eight-regime categorical scheme with percentile-anchored threshold conditions; and the third layer employs Qwen2.5-3B-Instruct to adjudicate structurally ambiguous observations under a conservative dual-condition asymmetric overwrite rule. Calibrated regime sequences are then subjected to first-order Markov chain analysis and predictive validity testing. LLM calibration resolves the 38.9% of observations left unclassified by the rule-based layer and increases regime label diversity by ΔH = + 0.298 bits. Divergence cluster analysis reveals that epistemic misalignment between text-based and indicator-based signals concentrates in periods of rapid structural change. Markov analysis identifies Emerging Trajectory as the dominant long-run attractor (π = 0.433), Volatile Expansion as the most self-persistent regime (E[T] = 2.50 windows), and current regime labels as significant predictors of next-window Shannon entropy, semantic drift, and patent volume. The proposed framework contributes a replicable pipeline for LLM-augmented patent foresight and establishes the first empirical Markov characterization of innovation regime transition dynamics in a calibrated patent corpus.

Version published to 10.21203/rs.3.rs-9166858/v1 on Research Square
Mar 31, 2026

From Text to Sectors: Classifying 140 Years of Swiss Firm Registrations

This article has 5 authors:
1. Danyl Denysenko
2. Filippo Pasquali
3. Jesper Findahl
4. Andrea Mocci
5. Gianmarco Torchetti
This article has no evaluationsLatest version Apr 17, 2026
A Smart Contract-Based Patent Value Assessment Model

This article has 5 authors:
1. Fu Gao
2. Wenlong Feng
3. Mengxing Huang
4. Siling Feng
5. Jiangtao Li
This article has no evaluationsLatest version Apr 2, 2026
LLM-Based Measurement of Latent Attributes in Trade Data

This article has 3 authors:
1. Matthew DiGiuseppe
2. Xuelong Fu
3. Michael E Flynn
This article has no evaluationsLatest version Mar 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

From Text to Sectors: Classifying 140 Years of Swiss Firm Registrations

A Smart Contract-Based Patent Value Assessment Model

LLM-Based Measurement of Latent Attributes in Trade Data