PythonAI算法

StructBERT 中文情感识别实战：短视频弹幕实时情绪热力图构建

介绍基于百度 StructBERT 模型构建短视频弹幕实时情绪分析系统。内容包括环境部署、API 服务搭建、使用 Dash 和 Plotly 实现情绪热力图可视化，以及批量处理、缓存优化和生产环境部署建议。系统可实时监控观众情绪变化，辅助内容质量评估与精彩片段识别。

古灵精怪发布于 2026/4/6更新于 2026/5/2032 浏览

StructBERT 中文情感识别实战案例：短视频弹幕实时情绪热力图构建

1. 引言：从弹幕看情绪，一个被忽略的数据金矿

你有没有想过，当你在刷短视频时，那些飞速滚动的弹幕里藏着什么秘密？

'哈哈哈笑死我了'、'泪目了'、'就这？'、'前方高能预警'……这些一闪而过的文字，其实是观众情绪最直接的表达。对于内容创作者和平台运营者来说，如果能实时看懂这些情绪，就能知道观众到底喜欢什么、讨厌什么，什么时候该加把劲，什么时候该调整方向。

但问题来了：弹幕数量庞大、更新极快，人工分析根本不可能。这时候，AI 情感分析技术就成了我们的'情绪翻译官'。

今天我要分享的，就是如何用百度的 StructBERT 中文情感分类模型，构建一个短视频弹幕的实时情绪热力图。这个方案不仅能告诉你当前视频的'情绪温度'，还能帮你发现内容中的'情绪爆点'，让数据驱动的内容优化成为可能。

2. 为什么选择 StructBERT？一个兼顾效果与效率的选择

在开始实战之前，我们先聊聊为什么选 StructBERT 这个模型。市面上情感分析的模型不少，但真正适合实时弹幕分析的，需要满足几个关键条件：

2.1 速度快，响应及时

弹幕是实时滚动的，分析速度必须够快。StructBERT 的 base 版本在保证准确率的同时，推理速度相当不错，单条文本分析通常在几十毫秒内完成，完全能满足实时需求。

2.2 准确率高，理解到位

中文的情感表达很微妙，有时候字面意思和实际情感完全相反（比如'你可真行'可能是夸奖也可能是讽刺）。StructBERT 基于百度的预训练，对中文的语言结构和上下文有很好的理解能力，能准确识别这些复杂的情感。

2.3 轻量级，部署简单

模型文件大小适中，不需要特别高的硬件配置，普通的云服务器就能跑起来。这对于大多数中小团队来说，部署成本可控。

2.4 三分类，够用就好

StructBERT 将情感分为三类：正面、负面、中性。对于弹幕分析来说，这个分类粒度刚刚好——太细了反而增加复杂度，太粗了又不够用。

下面这个表格对比了几种常见的情感分析方案：

方案类型	优点	缺点	适合场景
规则匹配	速度快、规则可控	覆盖不全、无法理解复杂表达	简单关键词过滤
传统机器学习	可解释性强	特征工程复杂、准确率有限	结构化文本分析
深度学习大模型	准确率高、理解深入	速度慢、资源消耗大	深度内容分析
StructBERT（本文方案）	速度快、准确率适中、部署简单	分类粒度较粗	实时弹幕分析

从对比可以看出，StructBERT 在速度、准确率和部署成本之间找到了一个很好的平衡点，特别适合我们这种需要实时处理大量短文本的场景。

3. 环境准备：5 分钟快速部署情感分析服务

好了，理论说完了，咱们直接上手。首先你需要把 StructBERT 情感分析服务跑起来。别担心，整个过程很简单，跟着步骤走就行。

3.1 基础环境检查

确保你的服务器或本地环境满足以下要求：

操作系统：Linux（推荐 Ubuntu 18.04+）或 macOS
Python 版本：3.7-3.9（3.8 最稳定）
内存：至少 4GB（处理大量弹幕时建议 8GB+）
磁盘空间：2GB 以上空闲空间

3.2 一键部署脚本

#!/bin/bash 
echo "开始部署 StructBERT 情感分析服务..."
echo "======================================"
# 1. 创建项目目录
mkdir -p ~/nlp_structbert_sentiment
cd ~/nlp_structbert_sentiment
echo "步骤 1：创建虚拟环境..."
python3 -m venv venv
source venv/bin/activate
echo "步骤 2：安装依赖包..."
pip install torch==1.10.0 --index-url https://download.pytorch.org/whl/cpu
pip install transformers==4.18.0
pip install flask==2.1.0
pip install gradio==3.4.1
pip install pandas==1.4.2
pip install supervisor==4.2.4
echo "步骤 3：下载模型文件..."
# 创建模型目录
mkdir -p models
cd models
# 下载 StructBERT 模型（这里以 Hugging Face 模型为例）
# 如果你有百度官方的模型文件，可以直接放到这个目录
echo "正在下载模型文件，这可能需要几分钟..."
# 实际部署时，你需要从百度 AI 开放平台或 Hugging Face 获取模型
# 这里先创建一个示例配置文件
cat > config.json << 'EOF'
{
 "model_type": "bert",
 "hidden_size": 768,
 "num_hidden_layers": 12,
 "num_attention_heads": 12,
 "vocab_size": 21128,
 "type_vocab_size": 2,
 "max_position_embeddings": 512
}
EOF
echo "步骤 4：创建 WebUI 应用..."
cd ~/nlp_structbert_sentiment
cat > webui.py << 'EOF'
import gradio as gr
import pandas as pd
from transformers import BertTokenizer, BertForSequenceClassification
import torch
import json

# 加载模型和分词器
print("正在加载模型...")
model_path = "./models"
tokenizer = BertTokenizer.from_pretrained(model_path)
model = BertForSequenceClassification.from_pretrained(model_path)
model.eval()

# 情感标签
labels = ["负面", "中性", "正面"]

def analyze_sentiment(text):
    """分析单条文本情感"""
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        outputs = model(**inputs)
        probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
        pred_label = torch.argmax(probabilities, dim=-1).item()
        confidence = probabilities[0][pred_label].item()
        result = {
            "text": text,
            "sentiment": labels[pred_label],
            "confidence": round(confidence, 4),
            "probabilities": {
                "负面": round(probabilities[0][0].item(), 4),
                "中性": round(probabilities[0][1].item(), 4),
                "正面": round(probabilities[0][2].item(), 4)
            }
        }
        return result

def batch_analyze(texts):
    """批量分析情感"""
    texts_list = texts.strip().split('\n')
    results = []
    for text in texts_list:
        if text.strip():
            result = analyze_sentiment(text.strip())
            results.append(result)
    # 转换为 DataFrame 方便显示
    df = pd.DataFrame([{
        "文本": r["text"],
        "情感倾向": r["sentiment"],
        "置信度": r["confidence"],
        "负面概率": r["probabilities"]["负面"],
        "中性概率": r["probabilities"]["中性"],
        "正面概率": r["probabilities"]["正面"]
    } for r in results])
    return df

# 创建 Gradio 界面
with gr.Blocks(title="StructBERT 中文情感分析") as demo:
    gr.Markdown("# StructBERT 中文情感分析系统")
    gr.Markdown("输入中文文本，分析情感倾向（正面/负面/中性）")
    with gr.Tab("单文本分析"):
        with gr.Row():
            with gr.Column():
                input_text = gr.Textbox(label="输入文本", placeholder="请输入要分析的中文文本...", lines=3)
                analyze_btn = gr.Button("开始分析", variant="primary")
            with gr.Column():
                output_json = gr.JSON(label="分析结果")
            analyze_btn.click(analyze_sentiment, inputs=input_text, outputs=output_json)
    with gr.Tab("批量分析"):
        with gr.Row():
            with gr.Column():
                batch_input = gr.Textbox(label="批量输入", placeholder="每行一条文本...", lines=10)
                batch_btn = gr.Button("开始批量分析", variant="primary")
            with gr.Column():
                batch_output = gr.Dataframe(label="分析结果", headers=["文本", "情感倾向", "置信度", "负面概率", "中性概率", "正面概率"])
            batch_btn.click(batch_analyze, inputs=batch_input, outputs=batch_output)
    gr.Markdown("### 使用说明")
    gr.Markdown("""
1. **单文本分析**：在左侧输入文本，点击'开始分析'查看结果
2. **批量分析**：在批量输入框中每行输入一条文本，点击'开始批量分析'
3. **结果说明**：
   - 情感倾向：正面、负面或中性
   - 置信度：模型对判断的把握程度（0-1 之间）
   - 概率分布：三种情感的具体概率值
""")
if __name__ == "__main__":
    demo.launch(server_name="0.0.0.0", server_port=7860)
EOF
echo "步骤 5：创建 API 服务..."
cat > api.py << 'EOF'
from flask import Flask, request, jsonify
from transformers import BertTokenizer, BertForSequenceClassification
import torch
import json

app = Flask(__name__)

# 加载模型
print("正在加载模型...")
model_path = "./models"
tokenizer = BertTokenizer.from_pretrained(model_path)
model = BertForSequenceClassification.from_pretrained(model_path)
model.eval()

labels = ["negative", "neutral", "positive"]

@app.route('/health', methods=['GET'])
def health_check():
    """健康检查接口"""
    return jsonify({"status": "healthy", "model": "structbert-sentiment"})

@app.route('/predict', methods=['POST'])
def predict():
    """单文本情感预测"""
    try:
        data = request.get_json()
        text = data.get('text', '')
        if not text:
            return jsonify({"error": "text parameter is required"}), 400
        # 情感分析
        inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
        with torch.no_grad():
            outputs = model(**inputs)
            probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
            pred_label = torch.argmax(probabilities, dim=-1).item()
            confidence = probabilities[0][pred_label].item()
            result = {
                "text": text,
                "sentiment": labels[pred_label],
                "confidence": round(confidence, 4),
                "probabilities": {
                    "negative": round(probabilities[0][0].item(), 4),
                    "neutral": round(probabilities[0][1].item(), 4),
                    "positive": round(probabilities[0][2].item(), 4)
                }
            }
            return jsonify(result)
    except Exception as e:
        return jsonify({"error": str(e)}), 500

@app.route('/batch_predict', methods=['POST'])
def batch_predict():
    """批量情感预测"""
    try:
        data = request.get_json()
        texts = data.get('texts', [])
        if not texts or not isinstance(texts, list):
            return jsonify({"error": "texts parameter must be a non-empty list"}), 400
        results = []
        for text in texts:
            inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
            with torch.no_grad():
                outputs = model(**inputs)
                probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
                pred_label = torch.argmax(probabilities, dim=-1).item()
                confidence = probabilities[0][pred_label].item()
                results.append({
                    "text": text,
                    "sentiment": labels[pred_label],
                    "confidence": round(confidence, 4),
                    "probabilities": {
                        "negative": round(probabilities[0][0].item(), 4),
                        "neutral": round(probabilities[0][1].item(), 4),
                        "positive": round(probabilities[0][2].item(), 4)
                    }
                })
        return jsonify({"results": results, "count": len(results)})
    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080, debug=False)
EOF
echo "步骤 6：配置 Supervisor 进程管理..."
sudo bash -c 'cat > /etc/supervisor/conf.d/structbert.conf << EOF
[program:nlp_structbert_api]
command=/root/nlp_structbert_sentiment/venv/bin/python api.py
directory=/root/nlp_structbert_sentiment
autostart=true
autorestart=true
stderr_logfile=/var/log/structbert_api.err.log
stdout_logfile=/var/log/structbert_api.out.log

[program:nlp_structbert_webui]
command=/root/nlp_structbert_sentiment/venv/bin/python webui.py
directory=/root/nlp_structbert_sentiment
autostart=true
autorestart=true
stderr_logfile=/var/log/structbert_webui.err.log
stdout_logfile=/var/log/structbert_webui.out.log
EOF'
echo "步骤 7：启动服务..."
sudo supervisorctl reread
sudo supervisorctl update
sudo supervisorctl start nlp_structbert_api nlp_structbert_webui
echo "======================================"
echo "部署完成！"
echo "WebUI 访问地址：http://localhost:7860"
echo "API 访问地址：http://localhost:8080"
echo ""
echo "常用管理命令："
echo "查看状态：sudo supervisorctl status"
echo "重启 API：sudo supervisorctl restart nlp_structbert_api"
echo "重启 WebUI：sudo supervisorctl restart nlp_structbert_webui"
echo "查看日志：sudo supervisorctl tail -f nlp_structbert_api"

chmod +x deploy.sh
./deploy.sh

{ "text": "这部电影太好看了！", "sentiment": "正面", "confidence": 0.9567, "probabilities": { "负面": 0.0123, "中性": 0.0310, "正面": 0.9567 } }

curl -X POST http://localhost:8080/predict \ 
-H "Content-Type: application/json" \ 
-d '{"text": "这个产品质量太差了"}'

{ "text": "这个产品质量太差了", "sentiment": "negative", "confidence": 0.9234, "probabilities": { "negative": 0.9234, "neutral": 0.0567, "positive": 0.0199 } }

┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │ │ │ │ 弹幕数据源 │───▶│ 情感分析服务 │───▶│ 情绪热力图 │ │ (B 站/抖音等) │ │ (StructBERT) │ │ 可视化系统 │ │ │ │ │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ │ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ 实时数据采集 │ │ 情绪数据聚合 │ │ 实时数据更新 │ │ (WebSocket/API)│ │ (时间窗口统计) │ │ (WebSocket) │ └─────────────────┘ └─────────────────┘ └─────────────────┘

# emotion_heatmap.py
import asyncio
import websockets
import json
import time
import requests
from collections import defaultdict
from datetime import datetime, timedelta
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import dash
from dash import dcc, html
from dash.dependencies import Input, Output
import threading

class DanmakuEmotionAnalyzer:
    """弹幕情绪分析器"""
    def __init__(self, api_url="http://localhost:8080"):
        self.api_url = api_url
        self.danmaku_buffer = [] # 弹幕缓冲区
        self.emotion_history = [] # 情绪历史记录
        self.time_windows = [10, 30, 60] # 时间窗口（秒）
        # 情绪统计
        self.emotion_stats = {
            "positive": 0,
            "negative": 0,
            "neutral": 0,
            "total": 0
        }
        # 时间窗口统计
        self.window_stats = defaultdict(lambda: {
            "positive": 0,
            "negative": 0,
            "neutral": 0,
            "total": 0
        })

    def analyze_emotion(self, text):
        """分析单条弹幕情感"""
        try:
            response = requests.post(
                f"{self.api_url}/predict", json={"text": text}, timeout=2
            )
            if response.status_code == 200:
                result = response.json()
                return result["sentiment"], result["confidence"]
            else:
                return "neutral", 0.5
        except Exception as e:
            print(f"情感分析失败：{e}")
            return "neutral", 0.5

    def process_danmaku(self, danmaku_text, timestamp=None):
        """处理单条弹幕"""
        if timestamp is None:
            timestamp = time.time()
        # 情感分析
        emotion, confidence = self.analyze_emotion(danmaku_text)
        # 记录结果
        record = {
            "text": danmaku_text,
            "emotion": emotion,
            "confidence": confidence,
            "timestamp": timestamp,
            "time_str": datetime.fromtimestamp(timestamp).strftime("%H:%M:%S")
        }
        # 更新统计
        self.emotion_stats[emotion] += 1
        self.emotion_stats["total"] += 1
        # 更新时间窗口统计
        for window in self.time_windows:
            window_key = int(timestamp // window) * window
            self.window_stats[window_key][emotion] += 1
            self.window_stats[window_key]["total"] += 1
        # 添加到历史记录（保留最近 1000 条）
        self.emotion_history.append(record)
        if len(self.emotion_history) > 1000:
            self.emotion_history.pop(0)
        return record

    def get_realtime_stats(self, window_seconds=30):
        """获取实时统计"""
        current_time = time.time()
        cutoff_time = current_time - window_seconds
        # 过滤时间窗口内的弹幕
        recent_danmaku = [
            d for d in self.emotion_history if d["timestamp"] > cutoff_time
        ]
        if not recent_danmaku:
            return {
                "positive_ratio": 0,
                "negative_ratio": 0,
                "neutral_ratio": 0,
                "total_count": 0,
                "emotion_trend": "neutral"
            }
        # 计算情绪比例
        total = len(recent_danmaku)
        positive = sum(1 for d in recent_danmaku if d["emotion"] == "positive")
        negative = sum(1 for d in recent_danmaku if d["emotion"] == "negative")
        neutral = total - positive - negative
        # 判断情绪趋势
        if positive > negative and positive > neutral:
            trend = "positive"
        elif negative > positive and negative > neutral:
            trend = "negative"
        else:
            trend = "neutral"
        return {
            "positive_ratio": positive / total,
            "negative_ratio": negative / total,
            "neutral_ratio": neutral / total,
            "total_count": total,
            "emotion_trend": trend
        }

    def get_heatmap_data(self, window_size=60):
        """获取热力图数据"""
        if not self.emotion_history:
            return {"timestamps": [], "emotions": [], "intensity": []}
        # 按时间窗口聚合
        heatmap_data = defaultdict(lambda: {"positive": 0, "negative": 0, "neutral": 0})
        for record in self.emotion_history:
            window_key = int(record["timestamp"] // window_size) * window_size
            heatmap_data[window_key][record["emotion"]] += 1
        # 转换为列表格式
        timestamps = []
        emotions = []
        intensity = []
        for window_key, counts in sorted(heatmap_data.items()):
            time_str = datetime.fromtimestamp(window_key).strftime("%H:%M:%S")
            total = sum(counts.values())
            if total > 0:
                for emotion in ["positive", "neutral", "negative"]:
                    timestamps.append(time_str)
                    emotions.append(emotion)
                    intensity.append(counts[emotion] / total * 100)
        # 转换为百分比
        return {
            "timestamps": timestamps,
            "emotions": emotions,
            "intensity": intensity
        }

class DanmakuSimulator:
    """弹幕模拟器（用于演示）"""
    def __init__(self):
        # 模拟不同情绪的弹幕样本
        self.positive_samples = [
            "哈哈哈笑死我了", "太好看了吧", "666", "神仙操作", "爱了爱了",
            "前方高能", "泪目了", "太感动了", "这个特效绝了", "UP 主太有才了",
            "收藏了", "三连了"
        ]
        self.negative_samples = [
            "就这？", "太水了", "无聊", "取关了", "广告太多了", "浪费时间",
            "不好看", "什么鬼", "退钱", "辣眼睛", "太坑了", "失望"
        ]
        self.neutral_samples = [
            "来了", "第一", "打卡", "第几？", "有人吗", "几点开播",
            "这是什么游戏", "背景音乐是什么", "UP 主哪里人", "多久更新一次"
        ]

    def generate_danmaku(self, emotion_probabilities=None):
        """生成模拟弹幕"""
        if emotion_probabilities is None:
            emotion_probabilities = {"positive": 0.5, "negative": 0.2, "neutral": 0.3}
        import random
        # 根据概率选择情绪类型
        rand = random.random()
        if rand < emotion_probabilities["positive"]:
            emotion = "positive"
            samples = self.positive_samples
        elif rand < emotion_probabilities["positive"] + emotion_probabilities["negative"]:
            emotion = "negative"
            samples = self.negative_samples
        else:
            emotion = "neutral"
            samples = self.neutral_samples
        # 随机选择弹幕文本
        text = random.choice(samples)
        # 添加一些随机变化
        if random.random() < 0.3:
            text = text + "！" * random.randint(1, 3)
        if random.random() < 0.2:
            text = "【" + text + "】"
        return text, emotion

def create_dashboard(analyzer):
    """创建实时情绪热力图仪表盘"""
    app = dash.Dash(__name__)
    app.layout = html.Div([
        html.H1("短视频弹幕实时情绪热力图", style={'textAlign': 'center'}),
        html.Div([
            html.Div([
                html.H3("实时情绪统计"),
                html.Div(id="realtime-stats", style={'fontSize': '20px'}),
                dcc.Graph(id="emotion-pie-chart"),
            ], style={'width': '30%', 'display': 'inline-block', 'verticalAlign': 'top'}),
            html.Div([
                html.H3("情绪热力图"),
                dcc.Graph(id="emotion-heatmap"),
                dcc.Interval(interval=2000, id='interval-component', n_intervals=0),
            ], style={'width': '70%', 'display': 'inline-block'}),
        ]),
        html.Div([
            html.H3("最近弹幕"),
            html.Div(id="recent-danmaku", style={
                'height': '200px', 'overflowY': 'scroll', 'border': '1px solid #ddd', 'padding': '10px'
            })
        ]),
        html.Div([
            html.H3("情绪趋势图"),
            dcc.Graph(id="emotion-trend-chart"),
            dcc.Interval(interval=5000, id='trend-interval', n_intervals=0),
        ]),
    ])

    @app.callback(
        [
            Output('realtime-stats', 'children'),
            Output('emotion-pie-chart', 'figure'),
            Output('recent-danmaku', 'children'),
            Output('emotion-heatmap', 'figure'),
            Output('emotion-trend-chart', 'figure')
        ],
        [
            Input('interval-component', 'n_intervals'),
            Input('trend-interval', 'n_intervals')
        ]
    )
    def update_dashboard(n, n_trend):
        # 获取实时统计
        stats = analyzer.get_realtime_stats(window_seconds=30)
        # 1. 实时统计文本
        stats_text = html.Div([
            html.P(f"总弹幕数：{analyzer.emotion_stats['total']}"),
            html.P(f"实时弹幕/30 秒：{stats['total_count']}"),
            html.P(f"正面情绪：{stats['positive_ratio']*100:.1f}%"),
            html.P(f"负面情绪：{stats['negative_ratio']*100:.1f}%"),
            html.P(f"中性情绪：{stats['neutral_ratio']*100:.1f}%"),
            html.P(f"情绪趋势：{stats['emotion_trend']}"),
        ])
        # 2. 饼图
        pie_fig = go.Figure(data=[go.Pie(
            labels=['正面', '负面', '中性'],
            values=[stats['positive_ratio'], stats['negative_ratio'], stats['neutral_ratio']],
            hole=.3,
            marker_colors=['#2E86AB', '#A23B72', '#F18F01']
        )])
        pie_fig.update_layout(title_text="实时情绪分布")
        # 3. 最近弹幕
        recent_danmaku = analyzer.emotion_history[-10:] # 最近 10 条
        danmaku_list = []
        for dm in reversed(recent_danmaku):
            emotion_color = {
                "positive": "#2E86AB", "negative": "#A23B72", "neutral": "#F18F01"
            }.get(dm["emotion"], "#000000")
            danmaku_list.append(html.P([
                html.Span(f"[{dm['time_str']}] ", style={'color': '#666'}),
                html.Span(dm["text"], style={'color': emotion_color}),
                html.Span(f" ({dm['emotion']})", style={'color': '#999', 'fontSize': '12px'})
            ]))
        # 4. 热力图
        heatmap_data = analyzer.get_heatmap_data(window_size=10) # 10 秒一个窗口
        heatmap_fig = go.Figure(data=go.Heatmap(
            z=heatmap_data["intensity"],
            x=heatmap_data["timestamps"],
            y=heatmap_data["emotions"],
            colorscale='RdBu',
            zmin=0, zmax=100,
            hoverongaps=False
        ))
        heatmap_fig.update_layout(
            title="情绪热力图（颜色越深表示比例越高）",
            xaxis_title="时间",
            yaxis_title="情绪类型",
            height=400
        )
        # 5. 趋势图
        # 获取最近 5 分钟的情绪趋势
        trend_data = []
        current_time = time.time()
        for i in range(30): # 30 个时间点，每 10 秒一个
            window_start = current_time - (30 - i) * 10
            window_end = window_start + 10
            window_danmaku = [
                d for d in analyzer.emotion_history if window_start <= d["timestamp"] < window_end
            ]
            if window_danmaku:
                positive = sum(1 for d in window_danmaku if d["emotion"] == "positive")
                negative = sum(1 for d in window_danmaku if d["emotion"] == "negative")
                neutral = sum(1 for d in window_danmaku if d["emotion"] == "neutral")
                total = len(window_danmaku)
                trend_data.append({
                    "time": datetime.fromtimestamp(window_start).strftime("%H:%M:%S"),
                    "positive": positive / total * 100 if total > 0 else 0,
                    "negative": negative / total * 100 if total > 0 else 0,
                    "neutral": neutral / total * 100 if total > 0 else 0,
                })
        if trend_data:
            trend_df = pd.DataFrame(trend_data)
            trend_fig = go.Figure()
            trend_fig.add_trace(go.Scatter(
                x=trend_df["time"], y=trend_df["positive"], mode='lines+markers',
                name='正面', line=dict(color='#2E86AB', width=2)
            ))
            trend_fig.add_trace(go.Scatter(
                x=trend_df["time"], y=trend_df["negative"], mode='lines+markers',
                name='负面', line=dict(color='#A23B72', width=2)
            ))
            trend_fig.add_trace(go.Scatter(
                x=trend_df["time"], y=trend_df["neutral"], mode='lines+markers',
                name='中性', line=dict(color='#F18F01', width=2)
            ))
            trend_fig.update_layout(
                title="情绪趋势（最近 5 分钟）",
                xaxis_title="时间",
                yaxis_title="比例 (%)",
                height=300
            )
        else:
            trend_fig = go.Figure()
            trend_fig.update_layout(title="暂无数据")
        return stats_text, pie_fig, danmaku_list, heatmap_fig, trend_fig
    return app

def simulate_danmaku_stream(analyzer, duration=300):
    """模拟弹幕流"""
    simulator = DanmakuSimulator()
    # 模拟视频不同阶段的情绪变化
    emotion_scenarios = [
        {"positive": 0.7, "negative": 0.1, "neutral": 0.2}, # 开头：积极
        {"positive": 0.4, "negative": 0.3, "neutral": 0.3}, # 中间：中性
        {"positive": 0.2, "negative": 0.6, "neutral": 0.2}, # 争议部分：消极
        {"positive": 0.8, "negative": 0.1, "neutral": 0.1}, # 高潮：积极
        {"positive": 0.5, "negative": 0.2, "neutral": 0.3}, # 结尾：中性偏积极
    ]
    start_time = time.time()
    scenario_duration = duration / len(emotion_scenarios)
    while time.time() - start_time < duration:
        # 根据时间选择当前情绪场景
        elapsed = time.time() - start_time
        scenario_index = min(int(elapsed / scenario_duration), len(emotion_scenarios) - 1)
        current_scenario = emotion_scenarios[scenario_index]
        # 生成弹幕
        danmaku_text, expected_emotion = simulator.generate_danmaku(current_scenario)
        # 处理弹幕
        analyzer.process_danmaku(danmaku_text)
        # 打印日志
        print(f"[{datetime.now().strftime('%H:%M:%S')}] {danmaku_text} -> {expected_emotion}")
        # 随机间隔（模拟真实弹幕频率）
        time.sleep(0.5 + random.random() * 2)
    print("弹幕模拟结束")

if __name__ == "__main__":
    import random
    # 创建分析器
    analyzer = DanmakuEmotionAnalyzer()
    # 启动弹幕模拟（在后台线程）
    import threading
    sim_thread = threading.Thread(target=simulate_danmaku_stream, args=(analyzer, 600)) # 模拟 10 分钟
    sim_thread.daemon = True
    sim_thread.start()
    # 启动仪表盘
    app = create_dashboard(analyzer)
    app.run_server(debug=True, port=8050)

# 批量处理弹幕，减少 API 调用次数
def batch_analyze_danmaku(self, danmaku_list):
    """批量分析弹幕情感"""
    texts = [dm["text"] for dm in danmaku_list]
    try:
        response = requests.post(
            f"{self.api_url}/batch_predict", json={"texts": texts}, timeout=5
        )
        if response.status_code == 200:
            results = response.json()["results"]
            for i, result in enumerate(results):
                danmaku_list[i]["emotion"] = result["sentiment"]
                danmaku_list[i]["confidence"] = result["confidence"]
        else:
            # 失败时使用默认值
            for dm in danmaku_list:
                dm["emotion"] = "neutral"
                dm["confidence"] = 0.5
    except Exception as e:
        print(f"批量分析失败：{e}")
        for dm in danmaku_list:
            dm["emotion"] = "neutral"
            dm["confidence"] = 0.5
    return danmaku_list

from functools import lru_cache
from datetime import datetime, timedelta

class CachedEmotionAnalyzer:
    """带缓存的情感分析器"""
    def __init__(self, api_url):
        self.api_url = api_url
        self.cache = {}
        self.cache_ttl = 3600 # 缓存 1 小时

    @lru_cache(maxsize=10000)
    def analyze_with_cache(self, text):
        """带缓存的情感分析"""
        # 先检查缓存
        cache_key = hash(text)
        if cache_key in self.cache:
            cached_result, cached_time = self.cache[cache_key]
            if datetime.now() - cached_time < timedelta(seconds=self.cache_ttl):
                return cached_result
        # 缓存未命中，调用 API
        result = self._call_api(text)
        # 更新缓存
        self.cache[cache_key] = (result, datetime.now())
        return result

    def _call_api(self, text):
        # 实际调用 API 的代码
        pass

# Dockerfile
FROM python:3.8-slim
WORKDIR /app
# 安装依赖
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 复制代码
COPY . .
# 下载模型
RUN python download_model.py
# 暴露端口
EXPOSE 8080 7860 8050
# 启动脚本
COPY start.sh .
RUN chmod +x start.sh
CMD ["./start.sh"]

; /etc/supervisor/conf.d/structbert_prod.conf
[program:danmaku_analyzer]
command=/app/venv/bin/python emotion_heatmap.py
directory=/app
user=www-data
autostart=true
autorestart=true
startretries=3
stderr_logfile=/var/log/danmaku_analyzer.err.log
stdout_logfile=/var/log/danmaku_analyzer.out.log
environment=PYTHONPATH="/app",API_URL="http://localhost:8080"
; 限制资源使用
[group:danmaku]
programs=danmaku_analyzer
priority=999

# monitoring.py
from prometheus_client import start_http_server, Counter, Gauge, Histogram
import time

# 定义监控指标
DANMAKU_TOTAL = Counter('danmaku_total', 'Total danmaku processed')
EMOTION_POSITIVE = Counter('emotion_positive', 'Positive emotions detected')
EMOTION_NEGATIVE = Counter('emotion_negative', 'Negative emotions detected')
EMOTION_NEUTRAL = Counter('emotion_neutral', 'Neutral emotions detected')
PROCESSING_TIME = Histogram('processing_time_seconds', 'Time spent processing danmaku')

class MonitoredAnalyzer(DanmakuEmotionAnalyzer):
    """带监控的分析器"""
    def process_danmaku(self, danmaku_text, timestamp=None):
        start_time = time.time()
        # 调用父类方法
        result = super().process_danmaku(danmaku_text, timestamp)
        # 记录处理时间
        PROCESSING_TIME.observe(time.time() - start_time)
        # 记录情绪统计
        DANMAKU_TOTAL.inc()
        if result["emotion"] == "positive":
            EMOTION_POSITIVE.inc()
        elif result["emotion"] == "negative":
            EMOTION_NEGATIVE.inc()
        else:
            EMOTION_NEUTRAL.inc()
        return result

# 启动监控服务器
start_http_server(8000)

StructBERT 中文情感识别实战：短视频弹幕实时情绪热力图构建

StructBERT 中文情感识别实战案例：短视频弹幕实时情绪热力图构建

1. 引言：从弹幕看情绪，一个被忽略的数据金矿

2. 为什么选择 StructBERT？一个兼顾效果与效率的选择

2.1 速度快，响应及时

2.2 准确率高，理解到位

2.3 轻量级，部署简单

2.4 三分类，够用就好

3. 环境准备：5 分钟快速部署情感分析服务

3.1 基础环境检查

3.2 一键部署脚本

StructBERT 中文情感识别实战：短视频弹幕实时情绪热力图构建

StructBERT 中文情感识别实战案例：短视频弹幕实时情绪热力图构建

1. 引言：从弹幕看情绪，一个被忽略的数据金矿

2. 为什么选择 StructBERT？一个兼顾效果与效率的选择

2.1 速度快，响应及时

2.2 准确率高，理解到位

2.3 轻量级，部署简单

2.4 三分类，够用就好

3. 环境准备：5 分钟快速部署情感分析服务

3.1 基础环境检查

3.2 一键部署脚本

更多推荐文章

相关免费在线工具

3.3 快速测试服务是否正常

4. 实战案例：构建短视频弹幕实时情绪热力图

4.1 系统架构设计

4.2 完整实现代码

4.3 运行效果展示

4.4 实际应用场景

5. 性能优化与生产部署建议

5.1 性能优化技巧

5.2 生产环境部署配置

5.3 监控与告警

6. 总结与展望

6.1 项目回顾

6.2 实用建议

6.3 未来展望

更多推荐文章

相关免费在线工具

StructBERT 中文情感识别实战：短视频弹幕实时情绪热力图构建

StructBERT 中文情感识别实战案例：短视频弹幕实时情绪热力图构建

1. 引言：从弹幕看情绪，一个被忽略的数据金矿

2. 为什么选择 StructBERT？一个兼顾效果与效率的选择

2.1 速度快，响应及时

2.2 准确率高，理解到位

2.3 轻量级，部署简单

2.4 三分类，够用就好

3. 环境准备：5 分钟快速部署情感分析服务

3.1 基础环境检查

3.2 一键部署脚本

StructBERT 中文情感识别实战：短视频弹幕实时情绪热力图构建

StructBERT 中文情感识别实战案例：短视频弹幕实时情绪热力图构建

1. 引言：从弹幕看情绪，一个被忽略的数据金矿

2. 为什么选择 StructBERT？一个兼顾效果与效率的选择

2.1 速度快，响应及时

2.2 准确率高，理解到位

2.3 轻量级，部署简单

2.4 三分类，够用就好

3. 环境准备：5 分钟快速部署情感分析服务

3.1 基础环境检查

3.2 一键部署脚本

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

3.3 快速测试服务是否正常

4. 实战案例：构建短视频弹幕实时情绪热力图

4.1 系统架构设计

4.2 完整实现代码

4.3 运行效果展示

4.4 实际应用场景

5. 性能优化与生产部署建议

5.1 性能优化技巧

5.2 生产环境部署配置

5.3 监控与告警

6. 总结与展望

6.1 项目回顾

6.2 实用建议

6.3 未来展望

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具