跳到主要内容
基于机器学习的生态组合塘强化城市污水处理厂脱氮优化 | 极客日志
Python AI 算法
基于机器学习的生态组合塘强化城市污水处理厂脱氮优化 综述由AI生成 一项利用机器学习优化城市污水处理厂脱氮的研究。针对生态组合塘(ECPs)工艺在进水水质波动下难以动态调整的问题,研究采集三年运行数据,采用 XGBoost 模型预测出水总氮浓度。通过 SHAP 分析和部分依赖图(PDP)识别关键参数,结合 NSGA-II 和 TOPSIS 算法进行多目标优化。最终开发了图形用户界面(GUI),实现了出水总氮、能耗及外加碳源用量的同步降低,年碳减排量显著,为污水处理厂的节能与达标排放提供了有效方案。
路由之心 发布于 2026/4/5 更新于 2026/5/26 37 浏览论文信息
题目: Machine learning-based optimization of enhanced nitrogen removal in a full-scale urban wastewater treatment plant with ecological combination ponds。
期刊: Water Research https://doi.org/10.1016/j.watres.2025.123976
摘要
【背景与痛点】在中国小城镇采用生态组合塘(ECPs)工艺的城镇污水处理厂,正面临进水水质波动下难以动态调整运行参数的困境。为满足严格的总氮(TN)排放标准,常出现曝气过量、外加碳源投加过度的问题。
【研究目的与方法】为解决这些难题,亟需为采用生态组合塘的污水处理厂建立适宜的总氮强化去除模型。本研究采集了某采用生态组合塘工艺的全尺度城镇污水处理厂三年的运行数据,通过可解释性机器学习方法对出水总氮浓度进行预测与优化。
其中,XGBoost 模型在训练集和测试集的决定系数(R²)分别达到 0.997 和 0.911,均方根误差(RMSE)分别为 0.196 和 1.283。借助 SHAP 分析与部分依赖图,本研究确定了优化的运行参数:在提升总氮去除效果的同时,实现了能耗与化学需氧量(COD)碳源投加量的平衡削减。研究还开发了图形用户界面,以支持工艺运行参数的持续预测与协同优化,最终达成了出水总氮、能耗及外加碳源用量的同步降低——其中出水总氮浓度下降 17.50%,年 COD 碳源投加量减少 33.29%。值得注意的是,采用生态组合塘工艺的污水处理厂展现出显著的碳减排潜力:仅通过强化总氮去除、降低能耗与碳源投加量,即可实现年碳减排量 788.40 吨二氧化碳。
材料与方法
生态组合塘运行概况
该大型城市污水处理厂位于中国浙江,采用生态组合塘(ECPs)工艺,设计处理能力为 60,000 m³/d。整个生态组合塘系统分为 13 个区域,各区平均水深为 8.8 m,具体构造包括:第一好氧区(1–5 区)、缺氧区(6–8 区)、第二好氧区(9–12 区)以及沉淀区(13 区)。运行过程中,在第一缺氧区(6 区)和第二缺氧区(7 区)添加了一种聚合物外加碳源,并将混合液从沉淀区回流至第一缺氧区(6 区)。
数据集信息
本研究收集了一座采用生态组合塘(ECPs)工艺的大型城市污水处理厂的日常运行数据,数据集包含自 2021 年 1 月 1 日至 2023 年 12 月 31 日期间共 1095 组时间序列数据。该厂系统监测了包括废水温度、进水 pH 值及出水化学需氧量(COD)浓度在内的 16 项关键运行参数,并通过严格的特征选择方案,确定了 13 个作为后续预测模型驱动因素的重要参数,具体包括:进水流量(Flowrateinf)、进水氨氮(NH₄⁺-N_inf)、进水总氮(TN_inf)、5 号区溶解氧(DO_5)、5 号区总氮(TN_5)、6 号区化学需氧量(COD_6)、6 号区总氮(TN_6)、7 号区溶解氧(DO_7)、7 号区化学需氧量(COD_7)、7 号区总氮(TN_7)、外加碳源(ECS)、能耗(EC)以及出水总氮(TN_eff)。
此外,研究采用 KNN 算法对采集的数据集进行了缺失值处理与离群值清洗,剔除了占总数据集不足 1% 的包含离群值的记录。
参数分类 参数名称 (中) 英文缩写 备注 / 功能说明 进水特征 进水流量 Flowrate_inf 反映污水处理厂的实时水力负荷 进水氨氮 NH₄⁺-N_inf 硝化过程的主要底物来源 进水总氮
过程监控 5 区溶解氧 DO_5 第一好氧区末端的充氧状态
(中间分区) 5 区总氮 TN_5 第一好氧区后的氮素残留情况
7 区溶解氧 DO_7 第二缺氧区的环境状态 (需维持低 DO)
7 区化学需氧量 COD_7 第二缺氧区用于反硝化的有效碳源
操作/控制 外部碳源投加量 ECS 人为干预以增强反硝化的关键手段
预测目标 出水总氮 TN_eff 核心预测变量 ,衡量排放是否达标
模型开发与评估 在模型开发阶段,预处理后的数据集按 8:2 的比例划分为训练集(80%)和测试集(20%)。为提升模型的泛化能力,研究首先通过 Pearson 相关性分析剔除预测增益低且存在共线性的特征;同时,引入 SHAP(SHapley Additive exPlanations)值系统评估包括出水总氮在内的 16 个初始特征的重要性,剔除冗余项。基于对特征显著性、相关性及模型复杂度的综合平衡,最终构建了包含出水总氮在内的 13 个关键特征的精简数据集。
模型开发在 Anaconda 平台的 JupyterLab 环境中基于 Python 3.8 完成,并采用提前停止(Early Stopping)策略防止过拟合。通过网格搜索(Grid Search)进行超参数优化,并结合 5 折交叉验证评估模型性能以降低结果方差。评价指标选用决定系数和均方根误差。研究对比了多元线性回归(MLR)、支持向量机(SVM)、轻量级梯度提升机(LightGBM)、随机森林(RF)、极端梯度提升(XGBoost)和类别特征提升(CatBoost)六种机器学习算法。
结果与讨论
基于出水总氮(TN)浓度的模型评估与比较 结果显示,MLR 的预测能力有限(测试集 R² = 0.459),而其余五种非线性模型的性能表现优异,测试集 R² 均在 0.775 至 0.829 之间。其中,XGBoost 模型表现最为突出,训练集准确率接近完美(R² = 0.999,RMSE = 0.073),但在测试集上的泛化能力略弱(R² = 0.829)。
随后,研究通过 SHAP 分析对特征贡献度进行了量化。结果发现,进水 pH 值和进水氨氮(NH₄⁺-N)的特征重要性最低,将其判定为冗余特征并予以剔除;而出水 COD 浓度和废水温度也被剔除以降低模型复杂度。
经过特征修剪后,对各模型进行了重新训练。优化后的 XGBoost 模型性能显著提升,训练集和测试集的 R² 分别达到 0.997 和 0.911,RMSE 分别为 0.196 和 1.283。结果表明,由 SHAP 特征重要性评估引导的降维策略极大地改善了模型表现。
基于 SHAP 和 PDP 的模型灵敏度分析 研究采用 SHAP 和 PDP(部分依赖图)方法分析特征重要性,阐明 XGBoost 模型的预测机制,并识别影响出水总氮浓度的关键参数。结果显示,TN_7、EC(能耗)、DO_5、TN_6、进水流量以及 DO_7 是决定出水总氮浓度的最重要特征。
为了进一步探究特征对模型输出的具体影响,研究利用 PDP 考察了单特征及多特征交互作用对出水总氮的影响。出水总氮受硝化和反硝化过程的共同调节。DO_5 部分依赖图显示,维持较高的 DO_5 浓度有利于提高总氮去除效率;然而,由于该工艺中 5 号区的曝气兼具搅拌功能,导致其 DO_5 浓度时常超过 10 mg/L,这凸显了利用机器学习优化 DO_5 以降低能耗的必要性。
DO_7 与出水总氮浓度呈正相关。这是因为 7 号区为缺氧区,较高的溶解氧会干扰反硝化效率。同时,COD_6 和 COD_7 的 PDP 结果表明,适量的 COD 投加能增强脱氮性能,但过量投加则会增加废水负荷与化学品消耗,反而对反硝化产生不利影响。
在多特征交互分析方面,较低的 DO_7 与较高的 DO_5 结合可实现更高的脱氮效率。综上所述,强化脱氮效率的最优运行区间为:DO_5 控制在 5.8–8.8 mg/L,DO_7 在 0.3–2.8 mg/L,COD_6 在 28.0–35.5 mg/L,COD_7 在 0–26.4 mg/L,且 ECS 投加量控制在 1.9–2.3 m³/d。
识别并排序影响脱氮性能的关键参数 污水处理厂的生态组合塘(ECPs)旨在通过好氧硝化和缺氧反硝化实现脱氮。如图 3 所示,TN_7、EC(能耗)和 DO_5 是影响脱氮效率的三大核心参数。由于 5 号区作为最后的好氧区且溶解氧(DO)未受控制,导致后续的 6 号区(第一缺氧区)DO 浓度升高,即便添加外加碳源也难以发生反硝化作用。相比之下,7 号区(第二缺氧区)通过控制 DO 浓度并添加外加碳源,促进了显著的反硝化过程,因此 TN_7 成为影响出水总氮浓度最关键的参数。
此外,5 号区和 7 号区不受控的 DO 浓度导致了过度曝气,从而增加了电力消耗;同时,在进水水质波动的情况下,外加碳源的不规范投加也造成了化学品的过量使用,这两者共同推高了整体能耗,使 EC 成为影响脱氮效率的第二大关键参数。
强化脱氮与平衡能耗及 COD 投药量的参数优化 机器学习模型的研究结果强调了优化工艺参数以同步降低出水总氮(TN_eff)浓度、能耗(EC)和 COD 投药量的重要性。为了便于实际工程应用,本研究开发了一个集成 TN_eff、EC 和 COD 投药量模型的图形用户界面(GUI)。
该 GUI 基于 PDP 分析推导出的最优控制参数区间,并集成了简化版的 NSGA-II(带策略非支配排序遗传算法)和 TOPSIS(逼近理想解排序法)算法。通过该优化控制策略,用户可以灵活调整关键运行参数;在输入进水水质数据及核心工艺参数后,GUI 能够自动预测并优化出水总氮、能耗和外加碳源(ECS)的使用。
建议根据 PDP 识别出的区间进行工艺优化。例如,通过优化关键参数,出水总氮浓度从 14.29 mg/L 降至 11.59 mg/L,污水处理厂能耗每天减少了 128.66 kWh,COD 投加浓度降低了 34.29 mg/L,对应外加碳源投加量从 3.72 m³/d 降至 2.25 m³/d。全年来看,出水总氮浓度平均下降了 17.50%,而 COD 投药量减少了 33.29%。
代码复现
模型开发与可视化
from sklearn.model_selection import KFold, GridSearchCV, train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.preprocessing import StandardScaler
import pickle
from xgboost import XGBRegressor
import shap
file_path = r'data.csv'
df = pd.read_csv(file_path, encoding='GBK' )
X = df.drop(columns='Effluent TN' )
y = df['Effluent TN' ]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20 , random_state=42 )
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
with open ('scaler.pkl' , 'wb' ) as f:
pickle.dump(scaler, f)
param_grid = {
'n_estimators' : [100 , 300 , 500 , 1000 ],
'max_depth' : [3 , 5 , 7 , 9 , 12 ],
'learning_rate' : [0.01 , 0.05 , 0.1 , 0.2 ],
'subsample' : [0.6 , 0.8 , 1.0 ],
'colsample_bytree' : [0.6 , 0.8 , 1.0 ],
'gamma' : [0 , 1 , 5 ],
'reg_alpha' : [0 , 0.1 , 1 ],
'reg_lambda' : [1 , 1.5 , 2 ]
}
base_model = XGBRegressor(objective='reg:squarederror' , random_state=42 )
grid_search = GridSearchCV(
estimator=base_model,
param_grid=param_grid,
scoring='neg_mean_squared_error' ,
cv=5 ,
verbose=2 ,
n_jobs=-1
)
grid_search.fit(X_train_scaled, y_train)
best_params = grid_search.best_params_
print ("\n找到的最佳参数组合:" )
print (best_params)
xgboost_model = XGBRegressor(**best_params, objective='reg:squarederror' , random_state=42 )
kf = KFold(n_splits=5 , shuffle=True , random_state=42 )
train_r2_scores, train_rmse_scores = [], []
test_r2_scores, test_rmse_scores = [], []
all_y_train_pred, all_y_test_pred = [], []
all_y_train_true, all_y_test_true = [], []
for train_index, test_index in kf.split(X_train_scaled):
X_train_fold, X_test_fold = X_train_scaled[train_index], X_train_scaled[test_index]
y_train_fold, y_test_fold = y_train.iloc[train_index], y_train.iloc[test_index]
X_train_part, X_val_part, y_train_part, y_val_part = train_test_split(
X_train_fold, y_train_fold, test_size=0.2 , random_state=42 )
xgboost_model.fit(X_train_part, y_train_part, eval_set=[(X_val_part, y_val_part)], early_stopping_rounds=50 , verbose=False )
best_iteration = xgboost_model.best_iteration
xgboost_model.n_estimators = best_iteration
xgboost_model.fit(X_train_fold, y_train_fold)
y_train_pred = xgboost_model.predict(X_train_fold)
y_test_pred = xgboost_model.predict(X_test_fold)
all_y_train_pred.extend(y_train_pred)
all_y_test_pred.extend(y_test_pred)
all_y_train_true.extend(y_train_fold.values)
all_y_test_true.extend(y_test_fold.values)
train_r2_scores.append(r2_score(y_train_fold, y_train_pred))
train_rmse_scores.append(np.sqrt(mean_squared_error(y_train_fold, y_train_pred)))
test_r2_scores.append(r2_score(y_test_fold, y_test_pred))
test_rmse_scores.append(np.sqrt(mean_squared_error(y_test_fold, y_test_pred)))
print (f"\n训练集平均结果:R²: {np.mean(train_r2_scores):.3 f} , RMSE: {np.mean(train_rmse_scores):.3 f} " )
print (f"测试集平均结果:R²: {np.mean(test_r2_scores):.3 f} , RMSE: {np.mean(test_rmse_scores):.3 f} " )
plt.figure(figsize=(10 , 6 ), dpi=600 )
plt.scatter(all_y_train_true, all_y_train_pred, edgecolors='black' , c='darkgreen' , marker='^' , s=100 , alpha=0.6 , label='训练集' )
plt.scatter(all_y_test_true, all_y_test_pred, edgecolors='black' , c='lightyellow' , marker='o' , s=100 , alpha=0.6 , label='测试集' )
X_plot = np.array(all_y_test_true).reshape(-1 , 1 )
y_plot_pred = np.array(all_y_test_pred)
sorted_idx = np.argsort(X_plot.flatten())
X_sorted = X_plot[sorted_idx]
y_sorted = y_plot_pred[sorted_idx]
linear_model = sm.OLS(y_sorted, sm.add_constant(X_sorted)).fit()
y_pred_line = linear_model.predict(sm.add_constant(X_sorted))
predictions_summary = linear_model.get_prediction(sm.add_constant(X_sorted)).summary_frame(alpha=0.05 )
plt.plot(X_sorted, y_pred_line, color='red' , linewidth=2 , label='拟合回归线' )
plt.fill_between(X_sorted.flatten(), predictions_summary['obs_ci_lower' ], predictions_summary['obs_ci_upper' ], color='lightpink' , alpha=0.3 , label='95% 置信区间' )
plt.plot([0 , 17.5 ], [0 , 17.5 ], c='blue' , linestyle='--' , label='理想线 (y=x)' )
plt.legend(); plt.show()
explainer = shap.Explainer(xgboost_model, X_train_scaled)
shap_values = explainer(X_train_scaled)
shap.summary_plot(shap_values, X_train_scaled, feature_names=X_train.columns)
模型可解释性
SHAP 可解释性
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from xgboost import XGBRegressor
import numpy as np
import matplotlib.pyplot as plt
import shap
file_path = r'D:\vscode-water\TNeff-GUI-master\Modelset\Data.csv'
df = pd.read_csv(file_path, encoding='GBK' )
X = df.drop(columns='Effluent TN' )
y = df['Effluent TN' ]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2 , random_state=42 )
xgboost_model = XGBRegressor(
objective='reg:squarederror' ,
n_estimators=1000 ,
max_depth=6 ,
learning_rate=0.01 ,
subsample=0.8 ,
colsample_bytree=0.8 ,
random_state=42
)
xgboost_model.fit(X_train, y_train)
feature_importances = xgboost_model.feature_importances_
sorted_idx = np.argsort(feature_importances)[::-1 ]
plt.figure(figsize=(12 , 8 ))
plt.bar(range (len (feature_importances)), feature_importances[sorted_idx], color='deepskyblue' , alpha=0.3 )
plt.xticks(range (len (feature_importances)), [X.columns[i] for i in sorted_idx], rotation=90 , fontsize=18 )
plt.title('Feature Importance' , fontsize=20 )
plt.xlabel('Features' , fontsize=20 )
plt.ylabel('Importance' , fontsize=20 )
plt.show()
explainer = shap.TreeExplainer(xgboost_model)
shap_values = explainer.shap_values(X_train)
shap.summary_plot(shap_values, X_train, feature_names=X.columns)
单变量的 PDP
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.inspection import partial_dependence
from scipy.interpolate import splev, splrep
file_path = r'D:\vscode-water\TNeff-GUI-master\Modelset\Data.csv'
df = pd.read_csv(file_path, encoding='GBK' )
X = df.drop(columns='Effluent TN' )
y = df['Effluent TN' ]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2 , random_state=42 )
xgboost_model = XGBRegressor(
objective='reg:squarederror' ,
n_estimators=1000 ,
max_depth=6 ,
learning_rate=0.01 ,
subsample=0.8 ,
colsample_bytree=0.8 ,
random_state=42
)
xgboost_model.fit(X_train, y_train)
features_to_plot = [
'External Carbon Source' , 'COD of Zone 6' , 'COD of Zone 7' ,
'DO of Zone 7' , 'DO of Zone 5' , 'Influent NH4+-N' ,
'Influent TN' , 'Influent Flowrate' , 'TN of Zone 5' ,
'TN of Zone 6' , 'TN of Zone 7' , 'Energy Consumption'
]
sns.set_theme(style="ticks" , palette="deep" , font_scale=1.1 )
def plot_pdp (feature ):
pdp = partial_dependence(xgboost_model, X_train, [feature], kind="both" , grid_resolution=50 )
plot_x = pd.Series(pdp.grid_values[0 ]).rename('x' )
plot_y = pd.Series(pdp.average[0 ]).rename('y' )
plot_i = pdp.individual[0 ]
tck = splrep(plot_x, plot_y, s=30 )
xnew = np.linspace(plot_x.min (), plot_x.max (), 300 )
ynew = splev(xnew, tck, der=0 )
fig, ax = plt.subplots(figsize=(8 , 6 ))
for a in plot_i:
a_series = pd.Series(a)
df_i = pd.concat([plot_x, a_series.rename('y' )], axis=1 )
sns.lineplot(data=df_i, x="x" , y="y" , color='k' , linewidth=1.5 , linestyle='--' , alpha=0.6 , ax=ax)
ax.plot(xnew, ynew, color='peru' , linewidth=2 , label='Smoothed PDP' )
std_error = np.std(plot_y) / np.sqrt(len (plot_y))
lower_bound = plot_y - 1.96 * std_error
upper_bound = plot_y + 1.96 * std_error
ax.fill_between(plot_x, lower_bound, upper_bound, color='khaki' , alpha=0.3 , label='95% CI' )
sns.rugplot(data=X_train.sample(100 ), x=feature, height=0.05 , color='k' , alpha=0.3 , ax=ax)
ax.set_ylabel('Partial Dependence' )
ax.set_xlabel(feature)
x_min = plot_x.min () - 0.1 *(plot_x.max () - plot_x.min ())
x_max = plot_x.max () + 0.1 *(plot_x.max () - plot_x.min ())
ax.set_xlim(x_min, x_max)
ax.legend()
plt.savefig(f'./pdpplot_{feature} .png' , dpi=900 , bbox_inches='tight' )
plt.show()
for feature in features_to_plot:
plot_pdp(feature)
双变量的 PDP
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.inspection import PartialDependenceDisplay
file_path = r'D:\vscode-water\TNeff-GUI-master\Modelset\Data.csv'
df = pd.read_csv(file_path, encoding='GBK' )
X = df.drop(columns='Effluent TN' )
y = df['Effluent TN' ]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2 , random_state=42 )
xgboost_model = XGBRegressor(
objective='reg:squarederror' ,
n_estimators=1000 ,
max_depth=6 ,
learning_rate=0.01 ,
subsample=0.8 ,
colsample_bytree=0.8 ,
random_state=42
)
xgboost_model.fit(X_train, y_train)
target_feature = 'External Carbon Source'
related_features = ['COD of Zone 6' , 'COD of Zone 7' ]
for feature in related_features:
if feature == target_feature: continue
feature_idx = X.columns.get_loc(feature)
target_idx = X.columns.get_loc(target_feature)
plt.figure(figsize=(8 , 6 ))
PartialDependenceDisplay.from_estimator(
xgboost_model, X_train, [(target_idx, feature_idx)],
feature_names=X.columns, kind='average'
)
plt.title(f'2D Partial Dependence: {target_feature} _feature vs {feature} ' )
plt.show()
TOPSIS import numpy as np
def topsis (data, weights ):
if len (data) == 0 : return np.array([])
if data.ndim == 1 : data = data.reshape(1 , -1 )
norm_data = (data - data.min (axis=0 )) / (data.max (axis=0 ) - data.min (axis=0 ) + 1e-10 )
weighted_data = norm_data * weights
ideal_best = np.min (weighted_data, axis=0 )
ideal_worst = np.max (weighted_data, axis=0 )
dist_best = np.sqrt(np.sum ((weighted_data - ideal_best) ** 2 , axis=1 ))
dist_worst = np.sqrt(np.sum ((weighted_data - ideal_worst) ** 2 , axis=1 ))
scores = dist_worst / (dist_best + dist_worst + 1e-10 )
return scores
data = np.array([
[12.5 , 300 , 50 ],
[10.0 , 450 , 80 ],
[8.5 , 600 , 120 ],
[11.0 , 350 , 60 ]
])
weights = np.array([0.5 , 0.3 , 0.2 ])
scores = topsis(data, weights)
print ("各策略的 TOPSIS 综合得分 (得分越高越优):" )
for i, score in enumerate (scores):
print (f"策略 {i+1 } : {score:.4 f} " )
best_index = np.argmax(scores)
print (f"\n最佳推荐方案是:策略 {best_index + 1 } " )
NSGA-II.py import numpy as np
import random
from deap import base, creator, tools, algorithms
import xgboost as xgb
import pandas as pd
from sklearn.preprocessing import StandardScaler
default_data = {
'Influent NH4+-N' : 29.6 ,
'Influent TN' : 35.9 ,
'Influent Flowrate' : 38808.4 ,
'DO of Zone 5' : 7.1 ,
'TN of Zone 5' : 22.2 ,
'COD of Zone 6' : 55.0 ,
'TN of Zone 6' : 13.8 ,
'DO of Zone 7' : 4.06 ,
'COD of Zone 7' : 31.9 ,
'TN of Zone 7' : 11.7 ,
'External Carbon Source' : 3.3
}
variable_ranges = [
(5.8 , 7.5 ),
(28.0 , 35.5 ),
(0.3 , 2.8 ),
(1.0 , 26.4 ),
(1.9 , 2.3 )
]
df = pd.DataFrame([default_data])
scaler = StandardScaler()
scaler.fit(df)
model_tn = xgb.Booster()
model_power = xgb.Booster()
prediction_cache = {}
def predict_outputs (data ):
if data.ndim == 1 : key = tuple (data); data = data.reshape(1 , -1 )
else : return model_power.predict(xgb.DMatrix(data)), model_tn.predict(xgb.DMatrix(data))
if key in prediction_cache:
return np.array([prediction_cache[key][0 ]]), np.array([prediction_cache[key][1 ]])
dmatrix = xgb.DMatrix(data)
pred_energy = model_power.predict(dmatrix)
pred_tn = model_tn.predict(dmatrix)
prediction_cache[key] = (pred_energy[0 ], pred_tn[0 ])
return pred_energy, pred_tn
def evaluate (individual, fixed_water_params, true_energy, true_tn ):
full_input = np.concatenate([fixed_water_params, individual])
energy, tn = predict_outputs(np.array(full_input))
delta_energy = energy[0 ] - true_energy
delta_tn = tn[0 ] - true_tn
return delta_energy, delta_tn
def optimize_parameters (fixed_water_params, initial_operating_params ):
if not hasattr (creator, 'FitnessMin' ):
creator.create("FitnessMin" , base.Fitness, weights=(-1.0 , -1.0 ))
if not hasattr (creator, 'Individual' ):
creator.create("Individual" , list , fitness=creator.FitnessMin)
toolbox = base.Toolbox()
toolbox.register("attr_float" , lambda : [random.uniform(r[0 ], r[1 ]) for r in variable_ranges])
toolbox.register("individual" , tools.initIterate, creator.Individual, toolbox.attr_float)
toolbox.register("population" , tools.initRepeat, list , toolbox.individual)
toolbox.register("mate" , tools.cxSimulatedBinaryBounded, low=[r[0 ] for r in variable_ranges], up=[r[1 ] for r in variable_ranges], eta=15.0 )
def custom_mutate (individual ):
individual = tools.mutPolynomialBounded(individual, low=[r[0 ] for r in variable_ranges], up=[r[1 ] for r in variable_ranges], eta=20.0 , indpb=0.3 )[0 ]
for i in range (len (individual)):
low, up = variable_ranges[i]
individual[i] = max (low, min (up, individual[i]))
if isinstance (individual[i], complex ) or np.isnan(individual[i]) or np.isinf(individual[i]):
individual[i] = (low + up) / 2
return individual
toolbox.register("mutate" , custom_mutate)
toolbox.register("select" , tools.selNSGA2)
full_input = np.concatenate([fixed_water_params, initial_operating_params])
true_energy, true_tn = predict_outputs(full_input)
def fitness_func (ind ):
return evaluate(ind, fixed_water_params, true_energy[0 ], true_tn[0 ])
toolbox.register("evaluate" , fitness_func)
pop = toolbox.population(n=100 )
pop[0 ][:] = initial_operating_params.copy()
for ind in pop: ind.fitness.values = toolbox.evaluate(ind)
hof = tools.ParetoFront()
hof.update(pop)
best_fitness = min ([ind.fitness.values[0 ] for ind in pop])
stall_count = 0
for gen in range (20 ):
offspring = algorithms.varOr(pop, toolbox, lambda_=100 , cxpb=0.5 , mutpb=0.5 )
for ind in offspring:
if not ind.fitness.valid: ind.fitness.values = toolbox.evaluate(ind)
pop = toolbox.select(pop + offspring, k=20 )
hof.update(pop)
current_best = min ([ind.fitness.values[0 ] for ind in pop])
if current_best < best_fitness * 0.99 :
best_fitness = current_best
stall_count = 0
else :
stall_count += 1
if stall_count >= 5 : break
n_samples = min (60 , len (hof))
pareto_samples = [hof[i] for i in np.linspace(0 , len (hof) - 1 , n_samples, dtype=int )]
return pareto_samples, true_energy[0 ], true_tn[0 ]
模型可视化 (GUI)
import tkinter as tk
from tkinter import ttk
import xgboost as xgb
import pandas as pd
from sklearn.preprocessing import StandardScaler
default_data = {
'Influent NH4+-N' : 29.6 ,
'Influent TN' : 35.9 ,
'Influent Flowrate' : 38808.4 ,
'DO of Zone 5' : 7.1 ,
'TN of Zone 5' : 22.2 ,
'COD of Zone 6' : 55.0 ,
'TN of Zone 6' : 13.8 ,
'DO of Zone 7' : 4.06 ,
'COD of Zone 7' : 31.9 ,
'TN of Zone 7' : 11.7 ,
'External Carbon Source' : 3.3
}
df = pd.DataFrame([default_data])
scaler = StandardScaler()
scaler.fit(df)
model_tn = xgb.Booster()
model_tn.load_model('model_effluent_TN.json' )
model_power = xgb.Booster()
model_power.load_model('model_energy_consumption.json' )
run_count = 0
first_external_carbon_source = None
original_tn = tk.StringVar()
original_power = tk.StringVar()
optimized_tn = tk.StringVar()
optimized_power = tk.StringVar()
energy_savings_var = tk.StringVar()
carbon_savings_var = tk.StringVar()
output_text = tk.StringVar()
units = {
"Influent NH4+-N" : "5.4 - 60.7 mg/L" ,
"Influent TN" : "9.8 - 108.0 mg/L" ,
"TN of Zone 5" : "4.8 - 44.6 mg/L" ,
"TN of Zone 6" : "3.0 - 42.3 mg/L" ,
"TN of Zone 7" : "2.8 - 38.9 mg/L" ,
"Influent Flowrate" : "0 - 69970.0 m³/d" ,
"DO of Zone 5" : "5.8 - 8.8 mg/L" ,
"COD of Zone 6" : "28.0 - 35.5 mg/L" ,
"DO of Zone 7" : "0.3 - 2.8 mg/L" ,
"COD of Zone 7" : "0 - 26.4 mg/L" ,
"External Carbon Source" : "1.9 - 2.3 m³/d"
}
root = tk.Tk()
root.title("Effluent TN Prediction and Optimization" )
root.geometry("1200x800" )
root.configure(bg="#e8f0f2" )
main_frame = tk.Frame(root, bg="#e8f0f2" , padx=20 , pady=20 )
main_frame.pack(expand=True )
title_label = tk.Label(main_frame, text="Effluent TN Prediction and Optimization" , font=("Arial" , 18 , "bold" ), bg="#e8f0f2" )
title_label.grid(row=0 , column=0 , columnspan=8 , pady=15 )
left_labels = [
("Influent NH₄⁺-N" , "Influent NH4+-N" ),
("Influent TN" , "Influent TN" ),
("TN of Zone 5" , "TN of Zone 5" ),
("TN of Zone 6" , "TN of Zone 6" ),
("TN of Zone 7" , "TN of Zone 7" )
]
right_labels = [
("Influent Flowrate" , "Influent Flowrate" ),
("DO of Zone 5" , "DO of Zone 5" ),
("COD of Zone 6" , "COD of Zone 6" ),
("DO of Zone 7" , "DO of Zone 7" ),
("COD of Zone 7" , "COD of Zone 7" ),
("External Carbon Source" , "External Carbon Source" )
]
feature_entry_map = {}
tk.Label(main_frame, text="Water Quality Parameters" , font=("Arial" , 14 , "bold" ), bg="#e8f0f2" ).grid(row=1 , column=1 , columnspan=2 , pady=10 )
for i, (label, feature) in enumerate (left_labels, start=2 ):
tk.Label(main_frame, text=label, font=("Arial" , 12 ), bg="#e8f0f2" ).grid(row=i, column=0 , padx=10 , pady=5 , sticky="e" )
entry = tk.Entry(main_frame, font=("Arial" , 12 ), width=20 )
entry.grid(row=i, column=1 , padx=10 , pady=5 , sticky="ew" )
feature_entry_map[feature] = entry
unit_text = units.get(feature, "" )
tk.Label(main_frame, text=f"({unit_text} )" , font=("Arial" , 12 ), bg="#e8f0f2" ).grid(row=i, column=2 , padx=10 , pady=5 , sticky="w" )
tk.Label(main_frame, text="Controllable Parameters" , font=("Arial" , 14 , "bold" ), bg="#e8f0f2" ).grid(row=1 , column=3 , columnspan=2 , pady=10 )
for i, (label, feature) in enumerate (right_labels, start=2 ):
tk.Label(main_frame, text=label, font=("Arial" , 12 ), bg="#e8f0f2" ).grid(row=i, column=3 , padx=10 , pady=5 , sticky="e" )
entry = tk.Entry(main_frame, font=("Arial" , 12 ), width=20 )
entry.grid(row=i, column=4 , padx=10 , pady=5 , sticky="ew" )
feature_entry_map[feature] = entry
unit_text = units.get(feature, "" )
tk.Label(main_frame, text=f"({unit_text} )" , font=("Arial" , 12 ), bg="#e8f0f2" ).grid(row=i, column=5 , padx=10 , pady=5 , sticky="w" )
def clear_inputs ():
global run_count, first_external_carbon_source
run_count = 0
first_external_carbon_source = None
for entry in feature_entry_map.values():
entry.delete(0 , tk.END)
original_tn.set ("" )
original_power.set ("" )
optimized_tn.set ("" )
optimized_power.set ("" )
energy_savings_var.set ("" )
carbon_savings_var.set ("" )
output_text.set ("" )
def predict ():
global run_count, first_external_carbon_source
if run_count >= 2 :
output_text.set ("The prediction limit has been reached. Please click the Reset button to reset and try again." )
return
inputs = []
for feature, entry in feature_entry_map.items():
value = entry.get().strip()
if not value:
inputs.append(default_data[feature])
else :
try :
validated = float (value)
inputs.append(validated)
except ValueError:
output_text.set (f"Invalid input for {feature} . Please enter a valid number." )
return
try :
inputs_df = pd.DataFrame([inputs], columns=default_data.keys())
inputs_scaled = scaler.transform(inputs_df)
dmatrix = xgb.DMatrix(inputs_scaled)
effluent_tn_pred = model_tn.predict(dmatrix)[0 ]
power_consumption_pred = model_power.predict(dmatrix)[0 ]
if run_count == 0 :
original_tn.set (f"{effluent_tn_pred:.4 f} " )
original_power.set (f"{power_consumption_pred:.4 f} " )
first_external_carbon_source = inputs_df["External Carbon Source" ].iloc[0 ]
elif run_count == 1 :
optimized_tn.set (f"{effluent_tn_pred:.4 f} " )
optimized_power.set (f"{power_consumption_pred:.4 f} " )
original_power_value = float (original_power.get()) if original_power.get() else 0.0
energy_savings = original_power_value - power_consumption_pred
energy_savings_var.set (f"{energy_savings:.4 f} " )
second_external_carbon_source = inputs_df["External Carbon Source" ].iloc[0 ]
carbon_savings = first_external_carbon_source - second_external_carbon_source
carbon_savings_var.set (f"{carbon_savings:.4 f} " )
run_count += 1
except Exception as e:
output_text.set (f"An error occurred during prediction:\n{e} " )
predict_button = tk.Button(main_frame, text="Predict" , font=("Arial" , 14 ), command=predict, bg="#4CAF50" , fg="white" , width=18 )
predict_button.grid(row=8 , column=0 , columnspan=4 , pady=20 )
reset_button = tk.Button(main_frame, text="Reset" , font=("Arial" , 14 ), command=clear_inputs, bg="#FF5722" , fg="white" , width=18 )
reset_button.grid(row=8 , column=4 , columnspan=4 , pady=20 )
output_frame = tk.Frame(main_frame, bg="#e8f0f2" )
output_frame.grid(row=9 , column=0 , columnspan=8 , padx=10 , pady=20 )
tk.Label(output_frame, text="Original Effluent TN (mg/L):" , font=("Arial" , 12 ), bg="#e8f0f2" ).grid(row=0 , column=0 , sticky="e" )
tk.Entry(output_frame, textvariable=original_tn, font=("Arial" , 12 ), state="readonly" ).grid(row=0 , column=1 )
tk.Label(output_frame, text="Optimized Effluent TN (mg/L):" , font=("Arial" , 12 ), bg="#e8f0f2" ).grid(row=0 , column=2 , sticky="e" )
tk.Entry(output_frame, textvariable=optimized_tn, font=("Arial" , 12 ), state="readonly" ).grid(row=0 , column=3 )
tk.Label(output_frame, text="Original Energy Consumption (kWh):" , font=("Arial" , 12 ), bg="#e8f0f2" ).grid(row=1 , column=0 , sticky="e" )
tk.Entry(output_frame, textvariable=original_power, font=("Arial" , 12 )).grid(row=1 , column=1 )
tk.Label(output_frame, text="Optimized Energy Consumption (kWh):" , font=("Arial" , 12 ), bg="#e8f0f2" ).grid(row=1 , column=2 , sticky="e" )
tk.Entry(output_frame, textvariable=optimized_power, font=("Arial" , 12 ), state="readonly" ).grid(row=1 , column=3 )
tk.Label(output_frame, text="Energy Savings (kWh):" , font=("Arial" , 12 ), bg="#e8f0f2" ).grid(row=2 , column=0 , sticky="e" )
tk.Entry(output_frame, textvariable=energy_savings_var, font=("Arial" , 12 ), state="readonly" ).grid(row=2 , column=1 )
tk.Label(output_frame, text="Carbon Source Savings (m³/d):" , font=("Arial" , 12 ), bg="#e8f0f2" ).grid(row=2 , column=2 , sticky="e" )
tk.Entry(output_frame, textvariable=carbon_savings_var, font=("Arial" , 12 ), state="readonly" ).grid(row=2 , column=3 )
output_label = tk.Label(output_frame, textvariable=output_text, font=("Arial" , 12 ), bg="#e8f0f2" , fg="#FF5722" )
output_label.grid(row=3 , column=0 , columnspan=4 )
root.mainloop()
相关免费在线工具 加密/解密文本 使用加密算法(如AES、TripleDES、Rabbit或RC4)加密和解密文本明文。 在线工具,加密/解密文本在线工具,online
RSA密钥对生成器 生成新的随机RSA私钥和公钥pem证书。 在线工具,RSA密钥对生成器在线工具,online
Mermaid 预览与可视化编辑 基于 Mermaid.js 实时预览流程图、时序图等图表,支持源码编辑与即时渲染。 在线工具,Mermaid 预览与可视化编辑在线工具,online
随机西班牙地址生成器 随机生成西班牙地址(支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选),支持数量快捷选择、显示全部与下载。 在线工具,随机西班牙地址生成器在线工具,online
Gemini 图片去水印 基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印,支持批量处理与下载。 在线工具,Gemini 图片去水印在线工具,online
curl 转代码 解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。 在线工具,curl 转代码在线工具,online