Artificial Intelligence

Entropy Hunter: Fine-Tuning Qwen3-8B for Industrial Exergy Analysis

Can an 8-billion parameter model perform exergy analysis on industrial equipment? Entropy Hunter v0.4 proves it can — achieving a 92.7% benchmark score (Grade A-) via LoRA fine-tuning on Qwen3-8B for just $216 total cost. The role of JSON as a reasoning scaffold, 8 thermodynamic validation checks, and exergy analysis across 7 equipment types.

Olivenet Team

IoT & Automation Experts

2026-03-056 min read

Large language models produce impressive results on general tasks. But can an 8-billion parameter model perform exergy analysis on industrial thermodynamic equipment? On a consumer GPU, with a startup budget?

Entropy Hunter was built to answer this question. It's a domain-specific LLM specialized for industrial equipment exergy analysis via LoRA fine-tuning on Qwen3-8B. At v0.4, it achieved a 92.7% benchmark score (Grade A-) at a $216 total cost.

What Is Exergy Analysis?

Exergy is the amount of "usable energy" a system possesses to the extent it is not in equilibrium with its surroundings. While energy is conserved, exergy is not — it is destroyed in every real process due to irreversibilities.

In industrial plants, exergy analysis reveals which equipment causes how much loss, how much of that loss is avoidable, and where improvement potential is concentrated. Traditional energy analysis answers "how much energy is there?" while exergy analysis answers "how much of that energy is actually usable?"

Version Evolution

From Base Qwen3-8B to v0.4: 4 iterations, 1 critical failure, 1 fundamental insight

Base Qwen3-8B

No fine-tuning, zero-shot

Score

38%

—

v0.1

Initial LoRA fine-tuning

Score

65.2%

200 examples

v0.2

C+

Expanded dataset

Score

78%

600 examples

v0.3

JSON-free experiment (failed)

Score

52.4%

800 examples

v0.4

A-

Final — full pipeline

Score

92.7%

1,235 examples

v0.3 JSON-free experiment: When structured JSON output was removed, score dropped to 52.4%. JSON format isn't just output — it serves as a reasoning scaffold.

JSON = Reasoning Scaffold. Structured output format forces the model to think step-by-step and improves consistency in exergy calculations.

The path from Base Qwen3-8B to v0.4 was not a straight line. Each version tested a different strategy, and the most critical lesson came from the v0.3 failure.

v0.1 (200 examples) applied initial fine-tuning — the model began learning exergy terminology but calculation consistency was low. v0.2 (600 examples) expanded the dataset, bringing the score to 78%.

v0.3 experimented by removing JSON output format — the model was forced to respond in free text. The result was dramatic: score dropped to 52.4%. This failure revealed that JSON format wasn't just output structure — it served as a scaffold for step-by-step reasoning. Each JSON field forced the model to complete a specific calculation step.

v0.4 applied the full pipeline with this insight: JSON scaffold preserved, quality control pipeline added, and trained on 1,235 validated examples. Result: 92.7%, Grade A-.

Training Pipeline

5-stage pipeline: From raw taxonomy to evaluation

Taxonomy

7 equipment types, 48 subtypes, 6 analysis families defined

48 subtypes

Data Generation

Synthetic training data via Claude Opus 4.6 — structured Q&A in JSON format

1,500 samples

Quality Control

8 thermodynamic validation checks — energy balance, entropy generation, efficiency bounds

82.3% valid

LoRA Fine-Tuning

LoRA adapters on Qwen3-8B — rank 64, alpha 128, trained on consumer GPU

1,235 samples

Evaluation

Automated benchmark across 6 analysis families — compared against CoolProp reference

92.7% score

Taxonomy

48 subtypes

7 equipment types, 48 subtypes, 6 analysis families defined

Data Generation

1,500 samples

Synthetic training data via Claude Opus 4.6 — structured Q&A in JSON format

Quality Control

82.3% valid

8 thermodynamic validation checks — energy balance, entropy generation, efficiency bounds

LoRA Fine-Tuning

1,235 samples

LoRA adapters on Qwen3-8B — rank 64, alpha 128, trained on consumer GPU

Evaluation

92.7% score

Automated benchmark across 6 analysis families — compared against CoolProp reference

The pipeline consists of 5 stages:

Taxonomy: 7 equipment types, 48 subtypes, and 6 analysis families were systematically defined. This taxonomy ensured the training data comprehensively represented industrial scenarios.
Data Generation: 1,500 structured Q&A examples were generated via Claude Opus 4.6. Each example contained temperature, pressure, and flow rate values simulating real industrial conditions.
Quality Control: 8 thermodynamic validation checks were applied. Energy balance, mass conservation, entropy generation, and other physical constraints were verified for every example.
LoRA Fine-Tuning: Validated 1,235 examples were used to train LoRA adapters on Qwen3-8B. Rank 64, alpha 128, single consumer GPU (24GB VRAM) with 4-bit quantization.
Evaluation: Automated benchmark across 6 analysis families. Comparison against CoolProp reference values with ±2% tolerance.

Equipment & Analysis Coverage

7 equipment types × 48 subtypes × 6 analysis families — real industrial scenarios

7 Equipment48 Subtypes6 Analysis Families

Equipment Types

Turbine

7 subtypes

Compressor

6 subtypes

Pump

6 subtypes

Heat Exchanger

7 subtypes

Boiler

5 subtypes

Mixing Chamber

6 subtypes

Nozzle / Diffuser

11 subtypes

Analysis Families

Exergy Destruction

Irreversibility-driven exergy loss calculation

Exergetic Efficiency

Component exergy utilization effectiveness

Avoidable

Exergy destruction with improvement potential

Unavoidable

Destruction within technological limits

Improvement Potential

Van Gool improvement potential

EGM

Entropy Gen. Min.

Entropy generation minimization analysis

Entropy Hunter covers 7 core industrial equipment types: turbine, compressor, pump, heat exchanger, boiler, mixing chamber, and nozzle/diffuser. These equipment types are broken into 48 subtypes — reflecting real industrial configurations like steam turbines, centrifugal compressors, and plate heat exchangers.

The 6 analysis families cover different aspects of exergy analysis:

ED (Exergy Destruction): Irreversibility-driven exergy loss
EE (Exergetic Efficiency): Component exergy utilization effectiveness
AV (Avoidable): Exergy destruction share with improvement potential
UN (Unavoidable): Minimum destruction within technological limits
EI (Improvement Potential): Van Gool improvement potential calculation
EGM (Entropy Generation Minimization): Optimization via Bejan's EGM method

Benchmark Results

v0.4 performance across 6 analysis families — Base Qwen3-8B and v0.2 comparison

Overall Score

92.7%

Grade A-

EDExergy Destruction

96.2%

+56pp vs Base+16pp vs v0.2

EEExergetic Efficiency

94.5%

+57pp vs Base+17pp vs v0.2

AVAvoidable

100%

+55pp vs Base+15pp vs v0.2

UNUnavoidable

100%

+58pp vs Base+18pp vs v0.2

EIImprovement Pot.

93.5%

+59pp vs Base+19pp vs v0.2

EGMEntropy Gen. Min.

72%

+44pp vs Base+4pp vs v0.2

EGM is the weakest area: Entropy generation minimization shows the model struggles with abstract optimization reasoning.

100% score on AV and UN analysis families — avoidable/unavoidable exergy destruction decomposition is fully solved.

v0.4 performance across the 6 analysis families:

100% score on AV and UN families — showing the model fully learned avoidable/unavoidable exergy destruction decomposition. These families offer clear formula frameworks that, combined with JSON scaffold, produce consistent results.

ED (96.2%) and EE (94.5%) — exergy destruction calculation and efficiency evaluation performed with high accuracy. These categories require direct application of standard thermodynamic formulas.

EI (93.5%) — Van Gool improvement potential, a metric combining efficiency and exergy destruction values. The model handles this multi-step calculation well.

EGM (72.0%) — the weakest area. Entropy generation minimization requires abstract optimization reasoning, unlike other categories. The model is consistent here but accuracy is lower — likely at the capacity limits of an 8B model.

Compared to Base Qwen3-8B, an average +54.7 point improvement; compared to v0.2, a +14.7 point increase.

Quality Control

Thermodynamic Quality Checks

Every generated sample passes through 8 physical validation checks

1,500 generated→

8 checks→

1,235 valid (82.3%)

Energy Balance

ΔE = Q − W

Mass Conservation

Σṁᵢₙ = Σṁₒᵤₜ

Entropy Generation

Ṡgen ≥ 0

Exergy Balance

Ėd ≥ 0

Temperature Range

T > 0 K

Pressure Positivity

P > 0

Efficiency Bounds

0 ≤ η ≤ 1

Second Law

COP ≤ COPCarnot

Data quality is the foundation of model quality. Of 1,500 generated examples, 1,235 passed all 8 thermodynamic validation checks (82.3% validity rate).

The 8 checks:

Energy Balance (ΔE = Q − W) — first law consistency
Mass Conservation (Σṁᵢₙ = Σṁₒᵤₜ) — inlet-outlet mass flow balance
Entropy Generation (Ṡgen ≥ 0) — second law violation check
Exergy Balance (Ėd ≥ 0) — negative exergy destruction check
Temperature Range (T > 0 K) — physically meaningful temperatures
Pressure Positivity (P > 0) — physically meaningful pressures
Efficiency Bounds (0 ≤ η ≤ 1) — physically possible efficiency values
Second Law Compliance (COP ≤ COPCarnot) — Carnot limit not exceeded

The 265 failed examples (17.7%) typically failed on energy balance and entropy generation checks — showing that even Opus 4.6 occasionally produces inconsistent results in complex multi-step thermodynamic calculations.

Key Findings

6 Key Findings

The most important insights from the Entropy Hunter v0.4 development process

JSON = Reasoning Scaffold

When structured JSON output was removed, score dropped from 92.7% to 52.4%. JSON format forces step-by-step thinking and improves consistency in exergy calculations. Not just output format — it's a reasoning scaffold.

$216 Total Cost

Entire pipeline cost only $216: data generation (Opus 4.6 API) ~$180, fine-tuning (GPU) ~$28, evaluation ~$8. Frontier-model-level domain performance achievable on a startup budget.

Consumer GPU Sufficient

LoRA fine-tuning completed on a single consumer GPU (24GB VRAM) with 4-bit quantization. No full-parameter training needed to bring an 8B model to industrial-grade performance.

Thinking Mode Off

Qwen3-8B's thinking/reasoning mode performed better when disabled on the fine-tuned version. Extra reasoning steps conflict with JSON scaffold, reducing accuracy.

T=0.7 Optimal

Temperature 0.7 provided the best balance. T=0.0 is deterministic but narrow, T=1.0 is diverse but noisy. 0.7 is the sweet spot between thermodynamic terminology diversity and calculation consistency.

EGM at the Frontier

Entropy generation minimization (EGM) at 72% is the weakest area. EGM requires abstract optimization reasoning — at the limits of an 8B model's capacity. May require larger models or specialized training data.

Known Limitations

Known limitations of Entropy Hunter v0.4:

Numerical JSON output dependency: The model performs poorly without structured JSON. Cannot perform exergy analysis in free text format — this is an 8B model limitation.
Arithmetic variance: The same question can produce ±2-3% different numerical results across runs. Use T=0.0 for deterministic results, but this reduces diversity.
EGM weakness: Entropy generation minimization at 72% is the weakest area. Abstract optimization reasoning is at the limits of 8B model capacity.
Steam table approximation: The model uses approximate values learned from training data rather than precise databases like CoolProp or REFPROP. Accurate within ±2% tolerance, but CoolProp reference should be preferred for engineering design.

Methodology

Base model: Qwen3-8B (Qwen/Qwen3-8B)
Fine-tuning method: LoRA (Low-Rank Adaptation)
LoRA configuration: rank=64, alpha=128, dropout=0.05
Quantization: 4-bit (QLoRA, bitsandbytes)
Hardware: Single consumer GPU, 24GB VRAM
Training time: ~4 hours
Training data: 1,235 validated examples (from 1,500 generated)
Data generation: Claude Opus 4.6 API
Reference library: CoolProp 7.2.0
Tolerance: ±2% (industrial engineering standard)
Total cost: $216 (data generation ~$180, fine-tuning ~$28, evaluation ~$8)
Thinking mode: Disabled (better performance on fine-tuned version)
Temperature: 0.7 (optimal balance)
Benchmark: 6 analysis families (ED, EE, AV, UN, EI, EGM)

Resources

Model: HuggingFace — olivenet/entropy-hunter-v0.4
Source code: GitHub — olivenet-iot/entropy-hunter
CoolProp: coolprop.org
Qwen3-8B: HuggingFace — Qwen/Qwen3-8B
LoRA: HuggingFace PEFT

About the Author

Olivenet Team

IoT & Automation Experts

Technology team providing industrial IoT, smart farming, and energy monitoring solutions in Northern Cyprus and Turkey.

LoRaWANThingsBoardSmart FarmingEnergy Monitoring