Entropy Hunter: Fine-Tuning Qwen3-8B for Industrial Exergy Analysis
Can an 8-billion parameter model perform exergy analysis on industrial equipment? Entropy Hunter v0.4 proves it can — achieving a 92.7% benchmark score (Grade A-) via LoRA fine-tuning on Qwen3-8B for just $216 total cost. The role of JSON as a reasoning scaffold, 8 thermodynamic validation checks, and exergy analysis across 7 equipment types.
Olivenet Team
IoT & Automation Experts
Large language models produce impressive results on general tasks. But can an 8-billion parameter model perform exergy analysis on industrial thermodynamic equipment? On a consumer GPU, with a startup budget?
Entropy Hunter was built to answer this question. It's a domain-specific LLM specialized for industrial equipment exergy analysis via LoRA fine-tuning on Qwen3-8B. At v0.4, it achieved a 92.7% benchmark score (Grade A-) at a $216 total cost.
What Is Exergy Analysis?
Exergy is the amount of "usable energy" a system possesses to the extent it is not in equilibrium with its surroundings. While energy is conserved, exergy is not — it is destroyed in every real process due to irreversibilities.
In industrial plants, exergy analysis reveals which equipment causes how much loss, how much of that loss is avoidable, and where improvement potential is concentrated. Traditional energy analysis answers "how much energy is there?" while exergy analysis answers "how much of that energy is actually usable?"
Version Evolution
Version Evolution
From Base Qwen3-8B to v0.4: 4 iterations, 1 critical failure, 1 fundamental insight
Base Qwen3-8B
FNo fine-tuning, zero-shot
v0.1
DInitial LoRA fine-tuning
v0.2
C+Expanded dataset
v0.3
FJSON-free experiment (failed)
v0.4
A-Final — full pipeline
v0.3 JSON-free experiment: When structured JSON output was removed, score dropped to 52.4%. JSON format isn't just output — it serves as a reasoning scaffold.
JSON = Reasoning Scaffold. Structured output format forces the model to think step-by-step and improves consistency in exergy calculations.
The path from Base Qwen3-8B to v0.4 was not a straight line. Each version tested a different strategy, and the most critical lesson came from the v0.3 failure.
v0.1 (200 examples) applied initial fine-tuning — the model began learning exergy terminology but calculation consistency was low. v0.2 (600 examples) expanded the dataset, bringing the score to 78%.
v0.3 experimented by removing JSON output format — the model was forced to respond in free text. The result was dramatic: score dropped to 52.4%. This failure revealed that JSON format wasn't just output structure — it served as a scaffold for step-by-step reasoning. Each JSON field forced the model to complete a specific calculation step.
v0.4 applied the full pipeline with this insight: JSON scaffold preserved, quality control pipeline added, and trained on 1,235 validated examples. Result: 92.7%, Grade A-.
Training Pipeline
Training Pipeline
5-stage pipeline: From raw taxonomy to evaluation
Taxonomy
7 equipment types, 48 subtypes, 6 analysis families defined
Data Generation
Synthetic training data via Claude Opus 4.6 — structured Q&A in JSON format
Quality Control
8 thermodynamic validation checks — energy balance, entropy generation, efficiency bounds
LoRA Fine-Tuning
LoRA adapters on Qwen3-8B — rank 64, alpha 128, trained on consumer GPU
Evaluation
Automated benchmark across 6 analysis families — compared against CoolProp reference
Taxonomy
48 subtypes7 equipment types, 48 subtypes, 6 analysis families defined
Data Generation
1,500 samplesSynthetic training data via Claude Opus 4.6 — structured Q&A in JSON format
Quality Control
82.3% valid8 thermodynamic validation checks — energy balance, entropy generation, efficiency bounds
LoRA Fine-Tuning
1,235 samplesLoRA adapters on Qwen3-8B — rank 64, alpha 128, trained on consumer GPU
Evaluation
92.7% scoreAutomated benchmark across 6 analysis families — compared against CoolProp reference
The pipeline consists of 5 stages:
-
Taxonomy: 7 equipment types, 48 subtypes, and 6 analysis families were systematically defined. This taxonomy ensured the training data comprehensively represented industrial scenarios.
-
Data Generation: 1,500 structured Q&A examples were generated via Claude Opus 4.6. Each example contained temperature, pressure, and flow rate values simulating real industrial conditions.
-
Quality Control: 8 thermodynamic validation checks were applied. Energy balance, mass conservation, entropy generation, and other physical constraints were verified for every example.
-
LoRA Fine-Tuning: Validated 1,235 examples were used to train LoRA adapters on Qwen3-8B. Rank 64, alpha 128, single consumer GPU (24GB VRAM) with 4-bit quantization.
-
Evaluation: Automated benchmark across 6 analysis families. Comparison against CoolProp reference values with ±2% tolerance.
Equipment & Analysis Coverage
Equipment & Analysis Coverage
7 equipment types × 48 subtypes × 6 analysis families — real industrial scenarios
Equipment Types
Turbine
7 subtypesCompressor
6 subtypesPump
6 subtypesHeat Exchanger
7 subtypesBoiler
5 subtypesMixing Chamber
6 subtypesNozzle / Diffuser
11 subtypesAnalysis Families
Exergy Destruction
Irreversibility-driven exergy loss calculation
Exergetic Efficiency
Component exergy utilization effectiveness
Avoidable
Exergy destruction with improvement potential
Unavoidable
Destruction within technological limits
Improvement Potential
Van Gool improvement potential
Entropy Gen. Min.
Entropy generation minimization analysis
Entropy Hunter covers 7 core industrial equipment types: turbine, compressor, pump, heat exchanger, boiler, mixing chamber, and nozzle/diffuser. These equipment types are broken into 48 subtypes — reflecting real industrial configurations like steam turbines, centrifugal compressors, and plate heat exchangers.
The 6 analysis families cover different aspects of exergy analysis:
- ED (Exergy Destruction): Irreversibility-driven exergy loss
- EE (Exergetic Efficiency): Component exergy utilization effectiveness
- AV (Avoidable): Exergy destruction share with improvement potential
- UN (Unavoidable): Minimum destruction within technological limits
- EI (Improvement Potential): Van Gool improvement potential calculation
- EGM (Entropy Generation Minimization): Optimization via Bejan's EGM method
Benchmark Results
Benchmark Results
v0.4 performance across 6 analysis families — Base Qwen3-8B and v0.2 comparison
EGM is the weakest area: Entropy generation minimization shows the model struggles with abstract optimization reasoning.
100% score on AV and UN analysis families — avoidable/unavoidable exergy destruction decomposition is fully solved.
v0.4 performance across the 6 analysis families:
100% score on AV and UN families — showing the model fully learned avoidable/unavoidable exergy destruction decomposition. These families offer clear formula frameworks that, combined with JSON scaffold, produce consistent results.
ED (96.2%) and EE (94.5%) — exergy destruction calculation and efficiency evaluation performed with high accuracy. These categories require direct application of standard thermodynamic formulas.
EI (93.5%) — Van Gool improvement potential, a metric combining efficiency and exergy destruction values. The model handles this multi-step calculation well.
EGM (72.0%) — the weakest area. Entropy generation minimization requires abstract optimization reasoning, unlike other categories. The model is consistent here but accuracy is lower — likely at the capacity limits of an 8B model.
Compared to Base Qwen3-8B, an average +54.7 point improvement; compared to v0.2, a +14.7 point increase.
Quality Control
Thermodynamic Quality Checks
Every generated sample passes through 8 physical validation checks
Energy Balance
ΔE = Q − WMass Conservation
Σṁᵢₙ = ΣṁₒᵤₜEntropy Generation
Ṡgen ≥ 0Exergy Balance
Ėd ≥ 0Temperature Range
T > 0 KPressure Positivity
P > 0Efficiency Bounds
0 ≤ η ≤ 1Second Law
COP ≤ COPCarnotData quality is the foundation of model quality. Of 1,500 generated examples, 1,235 passed all 8 thermodynamic validation checks (82.3% validity rate).
The 8 checks:
- Energy Balance (ΔE = Q − W) — first law consistency
- Mass Conservation (Σṁᵢₙ = Σṁₒᵤₜ) — inlet-outlet mass flow balance
- Entropy Generation (Ṡgen ≥ 0) — second law violation check
- Exergy Balance (Ėd ≥ 0) — negative exergy destruction check
- Temperature Range (T > 0 K) — physically meaningful temperatures
- Pressure Positivity (P > 0) — physically meaningful pressures
- Efficiency Bounds (0 ≤ η ≤ 1) — physically possible efficiency values
- Second Law Compliance (COP ≤ COPCarnot) — Carnot limit not exceeded
The 265 failed examples (17.7%) typically failed on energy balance and entropy generation checks — showing that even Opus 4.6 occasionally produces inconsistent results in complex multi-step thermodynamic calculations.
Key Findings
6 Key Findings
The most important insights from the Entropy Hunter v0.4 development process
JSON = Reasoning Scaffold
When structured JSON output was removed, score dropped from 92.7% to 52.4%. JSON format forces step-by-step thinking and improves consistency in exergy calculations. Not just output format — it's a reasoning scaffold.
$216 Total Cost
Entire pipeline cost only $216: data generation (Opus 4.6 API) ~$180, fine-tuning (GPU) ~$28, evaluation ~$8. Frontier-model-level domain performance achievable on a startup budget.
Consumer GPU Sufficient
LoRA fine-tuning completed on a single consumer GPU (24GB VRAM) with 4-bit quantization. No full-parameter training needed to bring an 8B model to industrial-grade performance.
Thinking Mode Off
Qwen3-8B's thinking/reasoning mode performed better when disabled on the fine-tuned version. Extra reasoning steps conflict with JSON scaffold, reducing accuracy.
T=0.7 Optimal
Temperature 0.7 provided the best balance. T=0.0 is deterministic but narrow, T=1.0 is diverse but noisy. 0.7 is the sweet spot between thermodynamic terminology diversity and calculation consistency.
EGM at the Frontier
Entropy generation minimization (EGM) at 72% is the weakest area. EGM requires abstract optimization reasoning — at the limits of an 8B model's capacity. May require larger models or specialized training data.
Known Limitations
Known limitations of Entropy Hunter v0.4:
- Numerical JSON output dependency: The model performs poorly without structured JSON. Cannot perform exergy analysis in free text format — this is an 8B model limitation.
- Arithmetic variance: The same question can produce ±2-3% different numerical results across runs. Use T=0.0 for deterministic results, but this reduces diversity.
- EGM weakness: Entropy generation minimization at 72% is the weakest area. Abstract optimization reasoning is at the limits of 8B model capacity.
- Steam table approximation: The model uses approximate values learned from training data rather than precise databases like CoolProp or REFPROP. Accurate within ±2% tolerance, but CoolProp reference should be preferred for engineering design.
Methodology
- Base model: Qwen3-8B (Qwen/Qwen3-8B)
- Fine-tuning method: LoRA (Low-Rank Adaptation)
- LoRA configuration: rank=64, alpha=128, dropout=0.05
- Quantization: 4-bit (QLoRA, bitsandbytes)
- Hardware: Single consumer GPU, 24GB VRAM
- Training time: ~4 hours
- Training data: 1,235 validated examples (from 1,500 generated)
- Data generation: Claude Opus 4.6 API
- Reference library: CoolProp 7.2.0
- Tolerance: ±2% (industrial engineering standard)
- Total cost: $216 (data generation ~$180, fine-tuning ~$28, evaluation ~$8)
- Thinking mode: Disabled (better performance on fine-tuned version)
- Temperature: 0.7 (optimal balance)
- Benchmark: 6 analysis families (ED, EE, AV, UN, EI, EGM)
Resources
- Model: HuggingFace — olivenet/entropy-hunter-v0.4
- Source code: GitHub — olivenet-iot/entropy-hunter
- CoolProp: coolprop.org
- Qwen3-8B: HuggingFace — Qwen/Qwen3-8B
- LoRA: HuggingFace PEFT
About the Author
Olivenet Team
IoT & Automation Experts
Technology team providing industrial IoT, smart farming, and energy monitoring solutions in Northern Cyprus and Turkey.