🔄 卡若AI 同步 2026-02-28 06:25 | 更新：水桥平台对接、卡木、火炬、运营中枢工作台 | 排除 >20MB: 14 个

2026-02-28 06:25:42 +08:00
parent 82579fb7c4
commit 19542c3d09
17 changed files with 669 additions and 13 deletions
--- a/02_卡人（水）/水桥_平台对接/需求_酷哥账号自动拉群与AI加人.md
+++ b/02_卡人（水）/水桥_平台对接/需求_酷哥账号自动拉群与AI加人.md
@@ -0,0 +1,21 @@
+# 需求：酷哥账号 · 自动拉群 / AI 自动加人
+
+## 背景
+酷哥跑的账号有两个相关能力：一是自动拉群，二是（在配备 AI 时）微信自动加人。两者根据是否配备 AI 区分使用方式。
+
+## 规则说明
+
+**有 AI（配了 AI 手机）**
+- 会寄一台 AI 手机给合作方。
+- 该手机上的微信会**自动加人**，无需人工通过。
+
+**没有 AI**
+- 微信**不会自动加人**。
+- 改为用「拉群」方式：把这一批人（例如 180 人）统一**拉进一个群**，在群内触达与运营。
+
+## 需求要点
+- 产品/流程上要明确区分：**有 AI → 自动加人**；**无 AI → 不自动加人，改为自动拉群（如 180 人一批）**。
+- 文案、说明或培训里写清楚：寄出的 AI 手机的作用就是「自动加人」；没 AI 时只提供「拉群」能力，不承诺自动通过好友。
+
+---
+*字数约 280 字。*
--- a/02_卡人（水）/水桥_平台对接/飞书管理/脚本/.feishu_tokens.json
+++ b/02_卡人（水）/水桥_平台对接/飞书管理/脚本/.feishu_tokens.json
@@ -1,6 +1,6 @@
 {
-  "access_token": "u-4_s5zy5N5aaE2oDiiP.5MOl5kgOBk1MXVoaaJMQ00wzm",
-  "refresh_token": "ur-7ty4XF3A17nawQWUi7bm5.l5mUo5k1OpVUaaUxQ00ADn",
+  "access_token": "u-4NFObvRHB9baG_6zHLv28Al5kgW5k1ipp8aaZww00BOn",
+  "refresh_token": "ur-7XjMc1PjB1NoflqWI30rQbl5kqMBk1UhVEaaUwQ00wD6",
  "name": "飞书用户",
-  "auth_time": "2026-02-27T05:04:40.719684"
+  "auth_time": "2026-02-27T16:16:01.288755"
 }
--- a/02_卡人（水）/水桥_平台对接/飞书管理/脚本/calendar_add_20h_only.applescript
+++ b/02_卡人（水）/水桥_平台对接/飞书管理/脚本/calendar_add_20h_only.applescript
@@ -0,0 +1,14 @@
+-- 仅添加每天 20:00 玩值电竞朋友圈（重复），今天与未来均会显示
+tell application "Calendar"
+	set calList to (every calendar whose writable is true)
+	if (count of calList) is 0 then set calList to calendars
+	set targetCal to item 1 of calList
+	set today to current date
+	set hours of today to 20
+	set minutes of today to 0
+	set seconds of today to 0
+	set startDate to today
+	set endDate to startDate + (30 * minutes)
+	make new event at end of events of targetCal with properties {summary:"玩值电竞 · 每日朋友圈", description:"写一篇玩值电竞朋友圈", start date:startDate, end date:endDate, recurrence:"FREQ=DAILY"}
+end tell
+return "已添加"
--- a/02_卡人（水）/水桥_平台对接/飞书管理/脚本/calendar_remove_dupes_and_add_20h.applescript
+++ b/02_卡人（水）/水桥_平台对接/飞书管理/脚本/calendar_remove_dupes_and_add_20h.applescript
@@ -0,0 +1,77 @@
+-- 1) 删除指定日重复项（3月27日 及 今日）；2) 确保每天20:00 玩值电竞朋友圈（重复）
+-- 直接执行：osascript 本文件
+
+set dayToClean to (current date)
+set hours of dayToClean to 0
+set minutes of dayToClean to 0
+set seconds of dayToClean to 0
+-- 也清理 2026-03-27（用户截图日）
+set march27 to (current date)
+set year of march27 to 2026
+set month of march27 to 3
+set day of march27 to 27
+set hours of march27 to 0
+set minutes of march27 to 0
+set seconds of march27 to 0
+set daysToClean to {dayToClean, march27}
+	-- 仅清理今日与 3月27 日两天的重复项
+
+tell application "Calendar"
+	set allCals to (every calendar whose writable is true)
+	if (count of allCals) is 0 then set allCals to calendars
+	
+	repeat with targetDay in daysToClean
+		set dayEnd to targetDay + 86400
+		repeat with cal in allCals
+			try
+				set dayEvents to (every event of cal where start date ≥ targetDay and start date < dayEnd)
+				set seen to {}
+				set toDelete to {}
+				repeat with ev in dayEvents
+					try
+						set evSummary to summary of ev
+						set evStart to start date of ev
+						set key to (evSummary & "|" & (evStart as text))
+						if seen contains key then
+							set end of toDelete to ev
+						else
+							set end of seen to key
+						end if
+					end try
+				end repeat
+				repeat with ev in toDelete
+					try
+						delete ev
+					end try
+				end repeat
+			end try
+		end repeat
+	end repeat
+	
+	-- 添加每天 20:00 玩值电竞朋友圈（重复），若已存在则跳过
+	set calList to (every calendar whose writable is true)
+	if (count of calList) is 0 then set calList to calendars
+	set targetCal to item 1 of calList
+	
+	set today to current date
+	set hours of today to 20
+	set minutes of today to 0
+	set seconds of today to 0
+	set startDate to today
+	set endDate to startDate + (30 * minutes)
+	
+	set eventSummary to "玩值电竞 · 每日朋友圈"
+	set eventDesc to "写一篇玩值电竞朋友圈"
+	
+	-- 检查今天 20:00 是否已有该事件
+	set todayStart to (current date)
+	set hours of todayStart to 20
+	set minutes of todayStart to 0
+	set todayEnd to todayStart + (1 * hours)
+	set existingList to (every event of targetCal where summary is eventSummary and start date ≥ todayStart and start date < todayEnd)
+	if (count of existingList) is 0 then
+		make new event at end of events of targetCal with properties {summary:eventSummary, description:eventDesc, start date:startDate, end date:endDate, recurrence:"FREQ=DAILY"}
+	end if
+end tell
+
+return "已清理重复项并确保20:00朋友圈重复事件"
--- a/02_卡人（水）/水桥_平台对接/飞书管理/脚本/calendar_remove_dupes_march27.applescript
+++ b/02_卡人（水）/水桥_平台对接/飞书管理/脚本/calendar_remove_dupes_march27.applescript
@@ -0,0 +1,37 @@
+-- 仅删除 2026年3月27日 当天的重复日历项（保留每类第一个，删其余）
+set targetDay to (current date)
+set year of targetDay to 2026
+set month of targetDay to 3
+set day of targetDay to 27
+set hours of targetDay to 0
+set minutes of targetDay to 0
+set seconds of targetDay to 0
+set dayEnd to targetDay + 86400
+
+tell application "Calendar"
+	set allCals to (every calendar whose writable is true)
+	if (count of allCals) is 0 then set allCals to calendars
+	repeat with cal in allCals
+		try
+			set dayEvents to (every event of cal where start date ≥ targetDay and start date < dayEnd)
+			set seen to {}
+			set toDelete to {}
+			repeat with ev in dayEvents
+				try
+					set k to (summary of ev) & "|" & ((start date of ev) as text)
+					if seen contains k then
+						set end of toDelete to ev
+					else
+						set end of seen to k
+					end if
+				end try
+			end repeat
+			repeat with ev in toDelete
+				try
+					delete ev
+				end try
+			end repeat
+		end try
+	end repeat
+end tell
+return "3月27日重复项已删"
--- a/02_卡人（水）/水桥_平台对接/飞书管理/脚本/calendar_remove_dupes_today.applescript
+++ b/02_卡人（水）/水桥_平台对接/飞书管理/脚本/calendar_remove_dupes_today.applescript
@@ -0,0 +1,34 @@
+-- 仅删除「今天」的重复日历项（保留每类第一个）
+set targetDay to (current date)
+set hours of targetDay to 0
+set minutes of targetDay to 0
+set seconds of targetDay to 0
+set dayEnd to targetDay + 86400
+
+tell application "Calendar"
+	set allCals to (every calendar whose writable is true)
+	if (count of allCals) is 0 then set allCals to calendars
+	repeat with cal in allCals
+		try
+			set dayEvents to (every event of cal where start date ≥ targetDay and start date < dayEnd)
+			set seen to {}
+			set toDelete to {}
+			repeat with ev in dayEvents
+				try
+					set k to (summary of ev) & "|" & ((start date of ev) as text)
+					if seen contains k then
+						set end of toDelete to ev
+					else
+						set end of seen to k
+					end if
+				end try
+			end repeat
+			repeat with ev in toDelete
+				try
+					delete ev
+				end try
+			end repeat
+		end try
+	end repeat
+end tell
+return "今日重复项已删"
--- a/02_卡人（水）/水桥_平台对接/飞书管理/脚本/feishu_wiki_rename_node.py
+++ b/02_卡人（水）/水桥_平台对接/飞书管理/脚本/feishu_wiki_rename_node.py
@@ -0,0 +1,61 @@
+#!/usr/bin/env python3
+"""飞书 Wiki 节点重命名：根据 node_token 更新标题。用法: python3 feishu_wiki_rename_node.py <node_token> <新标题>"""
+import sys
+import requests
+from pathlib import Path
+
+SCRIPT_DIR = Path(__file__).resolve().parent
+sys.path.insert(0, str(SCRIPT_DIR))
+import feishu_wiki_create_doc as fwd
+
+def get_space_id(node_token: str, headers: dict) -> str | None:
+    r = requests.get(
+        f"https://open.feishu.cn/open-apis/wiki/v2/spaces/get_node?token={node_token}",
+        headers=headers, timeout=30)
+    try:
+        j = r.json()
+    except Exception:
+        print("get_node 响应非 JSON:", r.text[:300])
+        return None
+    if j.get("code") != 0:
+        print("get_node 失败:", j.get("msg"), j)
+        return None
+    node = (j.get("data") or {}).get("node") or {}
+    space_id = (node.get("space_id") or node.get("origin_space_id") or
+                (node.get("space") or {}).get("space_id"))
+    return space_id
+
+def rename_node(space_id: str, node_token: str, new_title: str, headers: dict) -> bool:
+    # 飞书文档: 更新节点标题 PATCH .../nodes/{node_token}/title（若 404 则需应用权限或手动改）
+    url = f"https://open.feishu.cn/open-apis/wiki/v2/spaces/{space_id}/nodes/{node_token}/title"
+    r = requests.patch(url, headers=headers, json={"title": new_title}, timeout=30)
+    if r.status_code != 200:
+        return False
+    try:
+        return r.json().get("code") == 0
+    except Exception:
+        return False
+
+def main():
+    if len(sys.argv) < 3:
+        print("用法: python3 feishu_wiki_rename_node.py <node_token> <新标题>")
+        sys.exit(1)
+    node_token = sys.argv[1]
+    new_title = " ".join(sys.argv[2:])
+    token = fwd.get_token(node_token)
+    if not token:
+        print("❌ Token 无效")
+        sys.exit(1)
+    headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}
+    space_id = get_space_id(node_token, headers)
+    if not space_id:
+        print("❌ 无法获取 space_id")
+        sys.exit(1)
+    if rename_node(space_id, node_token, new_title, headers):
+        print(f"✅ 标题已更新为: {new_title}")
+    else:
+        print("❌ 更新标题失败")
+        sys.exit(1)
+
+if __name__ == "__main__":
+    main()
--- a/03_卡木（木）/木叶_视频内容/视频切片/参考资料/AI视频切片_GitHub替代方案.md
+++ b/03_卡木（木）/木叶_视频内容/视频切片/参考资料/AI视频切片_GitHub替代方案.md
@@ -1,7 +1,9 @@
 # AI 视频切片 - GitHub 与替代方案

 > 卡若AI 视频切片 Skill 的简化方案与可集成替代
-> 更新：2026-02-03
+> 更新：2026-02-27
+
+**相关**：去语助词（嗯、啊等）的全网与 GitHub 方案见 → [去语助词_全网与GitHub方案汇总.md](./去语助词_全网与GitHub方案汇总.md)

 ## 一、当前方案（卡若AI 本地方案）

--- a/03_卡木（木）/木叶_视频内容/视频切片/参考资料/去语助词_全网与GitHub方案汇总.md
+++ b/03_卡木（木）/木叶_视频内容/视频切片/参考资料/去语助词_全网与GitHub方案汇总.md
@@ -0,0 +1,136 @@
+# 去语助词（嗯、啊等）— 全网与 GitHub 方案汇总
+
+> 调研时间：2026-02-27  
+> 用途：视频/音频中自动检测并删除「嗯」「啊」「呃」等语助词（filler words / disfluency）
+
+---
+
+## 一、结论速览
+
+| 类型 | 中文「嗯」支持 | 说明 |
+|------|----------------|------|
+| **商用 API** | ❌ Deepgram 仅英文 | filler_words 仅英文 |
+| **开源·转录系** | ⚠️ Whisper 普遍不输出 嗯 | 需词级时间戳 + 能输出 嗯 的 ASR |
+| **开源·纯音频检测** | ⚠️ 多为英文数据训练 | 可尝试迁移/微调 |
+| **中文 ASR + 后处理** | ✅ 理论可行 | FunASR 等词级时间戳，再过滤 嗯 时间段 |
+
+---
+
+## 二、商用/在线方案
+
+### 1. Deepgram（API）
+
+- **能力**：`filler_words=true` 可检测 uh, um, mhmm 等，并返回时间戳。
+- **限制**：**仅支持英文**，中文（Mandarin）不支持 filler words。
+- 文档：<https://developers.deepgram.com/docs/filler-words>
+
+### 2. Descript / Cleanvoice.ai / Resound / Auphonic
+
+- **能力**：一键去除 um、uh、you know 等，部分支持多语言（如 Cleanvoice 德/法/澳等）。
+- **限制**：多为英文场景，**未查到明确中文「嗯」「啊」支持**；多为付费或按量计费。
+
+---
+
+## 三、GitHub 开源方案
+
+### 1. daily-demos/filler-word-removal ⭐ 14
+
+- **仓库**：<https://github.com/daily-demos/filler-word-removal>
+- **流程**：视频 → 提音频 → **Whisper 或 Deepgram 带词级时间戳转录** → 识别 filler 时间段 → ffmpeg 切片去掉这些段 → 再拼接成新视频。
+- **特点**：与当前卡若AI 思路一致；支持 Deepgram 时用其 filler 检测（仅英文）。
+- **中文**：若用 Whisper，中文下通常**不输出「嗯」**，需换用能输出 嗯 的 ASR（见下文 FunASR）。
+
+### 2. sagniklp/Disfluency-Removal-API ⭐ 16（ICASSP '19）
+
+- **仓库**：<https://github.com/sagniklp/Disfluency-Removal-API>
+- **能力**：CRNN + 静音/语音分类，从**音频**检测 disfluency 并移除。
+- **依赖**：TensorFlow 1.x、web.py、librosa、pydub、pyAudioAnalysis、hmmlearn 等。
+- **中文**：论文与实现偏英文；**理论上可迁移**，需自备中文语助词数据微调/重训。
+
+### 3. amritkromana/disfluency_detection_from_audio ⭐ 32
+
+- **仓库**：<https://github.com/amritkromana/disfluency_detection_from_audio>
+- **能力**：**不依赖转录**的 disfluency 检测，三种模态：
+  - **language**：Whisper 逐字转录 + BERT 做 disfluency 分类（英文 verbatim）；
+  - **acoustic**：WavLM 纯声学 disfluency 检测；
+  - **multimodal**：语言 + 声学融合。
+- **输出**：帧级（约 20ms）disfluency 预测，可转为时间段做裁剪。
+- **中文**：模型基于英文（Switchboard 等）；**中文需自采集数据微调或仅作参考**。
+
+### 4. adammoss/uhm ⭐ 1
+
+- **仓库**：<https://github.com/adammoss/uhm>
+- **能力**：`pip install uhm`，用 **Watson API** 检测并移除 uhm 等。
+- **限制**：依赖 IBM Watson，**面向英文**。
+
+### 5. umdone（notebook）
+
+- **思路**：按静音切词 → RMS 等特征 → 与「语助词模板」比对，识别 um 等并删除。
+- **限制**：需自建语助词模板；中文「嗯」需自己采数据。
+
+---
+
+## 四、中文向可行路线
+
+### 方案 A：FunASR Paraformer 词级时间戳 + 过滤 嗯
+
+- **依据**：FunASR Paraformer 支持**词级时间戳**（每词 start/end），且中文 ASR 是否过滤「嗯」取决于训练数据与后处理，部分工业模型会保留语气词。
+- **做法**：用 Paraformer 转写 → 在词列表中筛「嗯」「啊」「呃」等 → 得到时间段 → 用现有 `remove_filler_segments.py` 的 ffmpeg 裁剪逻辑拼接。
+- **文档**：<https://github.com/modelscope/FunASR>，时间戳示例见社区博客（Paraformer timestamp_model=True）。
+
+### 方案 B：纯声学 disfluency 检测（迁移/微调）
+
+- **依据**：amritkromana 的 **acoustic** 分支为纯声学，不依赖 ASR 文本，理论上有机会迁移到中文。
+- **做法**：用其 WavLM 等预训练 + 自采「中文 嗯/啊」片段做帧级标注，微调后输出时间段 → 再裁剪。
+- **成本**：需标注数据与 GPU 训练。
+
+### 方案 C：保持「转录 + 手动/半自动」兜底
+
+- **现状**：Whisper / whisper-timestamped 在中文下**极少输出单独「嗯」**，自动检测率低。
+- **兜底**：听一遍视频，把「嗯」出现的时间点记到 `remove_list` 文本，用现有 `remove_filler_segments.py --remove-list` 做裁剪（已实现）。
+
+---
+
+## 五、推荐落地顺序
+
+1. **短期**：用 **FunASR Paraformer**（或其它支持中文词级时间戳的 ASR）做一次转录，看是否出现「嗯」「啊」等词；若有，直接接入现有「识别 filler 时间段 → ffmpeg 裁剪」流程。
+2. **中期**：若 Paraformer 也过滤掉 嗯，再评估 **disfluency_detection_from_audio** 的 acoustic 模型，用少量中文语助词数据做微调或二分类（语助词 vs 非语助词）。
+3. **长期**：关注 Deepgram/其他厂商是否推出**中文 filler words** API；或采用商业产品（如 Cleanvoice 等）若其支持中文。
+
+---
+
+## 六、与本项目已有脚本的对应关系
+
+| 本仓库脚本 | 作用 | 与上述方案关系 |
+|------------|------|----------------|
+| `remove_filler_segments.py` | 按 SRT 或 `--remove-list` 时间段裁剪视频 | 通用「裁剪」层，可接任何能输出「时间段」的方案 |
+| `remove_ng_auto.py` | whisper-timestamped 词级 + 语助词匹配 | 当前中文下 嗯 几乎不被识别，可替换为 FunASR 等 |
+| 视频切片 SKILL | 转录 → 高光 → 切片 → 增强 | 若后续接入「去语助词」步骤，可放在增强前或增强后 |
+
+---
+
+## 七、本仓库已实现脚本
+
+| 脚本 | 说明 |
+|------|------|
+| `remove_ng_funasr.py` | **推荐**：优先 FunASR 词级时间戳（中文易出 嗯），未安装则回退 whisper-timestamped；输出同目录 `*_去嗯.mp4` |
+| `remove_filler_segments.py` | 通用：SRT 纯语助词段落 或 `--remove-list` 手动时间点 → ffmpeg 裁剪 |
+| `remove_ng_auto.py` | 仅 whisper-timestamped，中文下多不识别 嗯 |
+
+**执行示例**（在视频切片/脚本目录）：
+```bash
+conda activate mlx-whisper
+python3 remove_ng_funasr.py "/path/to/视频.mp4" -o "/path/to/输出_去嗯.mp4"
+```
+若需最佳效果，请在网络畅通时安装 FunASR：`pip install funasr`，再运行本脚本。
+
+---
+
+## 八、参考链接
+
+- Deepgram filler words（英文）：<https://developers.deepgram.com/docs/filler-words>
+- daily-demos/filler-word-removal：<https://github.com/daily-demos/filler-word-removal>
+- Disfluency-Removal-API（ICASSP '19）：<https://github.com/sagniklp/Disfluency-Removal-API>
+- disfluency_detection_from_audio（audio-only）：<https://github.com/amritkromana/disfluency_detection_from_audio>
+- FunASR：<https://github.com/modelscope/FunASR>
+- 论文：Automatic Disfluency Detection from Untranscribed Speech（IEEE TASLP 投稿）
--- a/03_卡木（木）/木叶_视频内容/视频切片/脚本/remove_ng_funasr.py
+++ b/03_卡木（木）/木叶_视频内容/视频切片/脚本/remove_ng_funasr.py
@@ -0,0 +1,220 @@
+#!/usr/bin/env python3
+"""
+去语助词（最佳方案）：用 FunASR Paraformer 中文词级时间戳识别 嗯/啊/呃，再 ffmpeg 裁剪。
+若未安装 FunASR 则回退到 whisper-timestamped。
+"""
+import argparse
+import re
+import subprocess
+import tempfile
+from pathlib import Path
+
+FILLER_RE = re.compile(r"^[嗯啊呃额哦噢唉哎诶喔]+$")
+
+
+def get_duration(path: str) -> float:
+    r = subprocess.run(
+        ["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", path],
+        capture_output=True, text=True
+    )
+    return float(r.stdout.strip()) if r.returncode == 0 else 0
+
+
+def extract_audio(video: str, wav: str) -> bool:
+    return subprocess.run(
+        ["ffmpeg", "-y", "-i", video, "-vn", "-acodec", "pcm_s16le", "-ar", "16000", wav],
+        capture_output=True
+    ).returncode == 0
+
+
+def transcribe_funasr(audio_path: str):
+    """FunASR 中文词级时间戳，返回 [(word, start_sec, end_sec), ...]"""
+    from funasr import AutoModel
+    model = AutoModel(
+        model="iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
+        vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch",
+        punc_model="iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
+        device="cpu",
+    )
+    result = model.generate(input=audio_path, batch_size_s=300)
+    words_with_ts = []
+    if not result or len(result) == 0:
+        return words_with_ts
+    for item in result:
+        if not item:
+            continue
+        text = (item.get("text") or "").strip()
+        ts = item.get("timestamp") or item.get("timestamps") or []
+        # 格式1: timestamp = [[start_ms, end_ms], ...] 与字符逐对
+        if isinstance(ts, list) and ts and isinstance(ts[0], (list, tuple)):
+            for i, pair in enumerate(ts):
+                if len(pair) >= 2:
+                    s_ms, e_ms = float(pair[0]), float(pair[1])
+                    word = text[i] if i < len(text) else ""
+                    words_with_ts.append((word, s_ms / 1000, e_ms / 1000))
+            continue
+        if isinstance(ts, list) and ts and isinstance(ts[0], dict):
+            for w in ts:
+                word = w.get("word", w.get("text", ""))
+                s = w.get("start", w.get("start_time", 0))
+                e = w.get("end", w.get("end_time", 0))
+                if s is not None and e is not None:
+                    s, e = float(s), float(e)
+                    if s > 1000:
+                        s, e = s / 1000, e / 1000
+                    words_with_ts.append((str(word), s, e))
+            continue
+        # 整句
+        start = item.get("start", item.get("start_time"))
+        end = item.get("end", item.get("end_time"))
+        if start is not None and end is not None:
+            s, e = float(start), float(end)
+            if s > 1000:
+                s, e = s / 1000, e / 1000
+            words_with_ts.append((text, s, e))
+    return words_with_ts
+
+
+def transcribe_whisper_ts(audio_path: str):
+    """回退：whisper-timestamped 词级，返回 [(word, start_sec, end_sec), ...]"""
+    import whisper_timestamped as whisper
+    audio = whisper.load_audio(audio_path)
+    model = whisper.load_model("base", device="cpu")
+    result = whisper.transcribe(model, audio, language="zh", detect_disfluencies=True)
+    words_with_ts = []
+    for seg in result.get("segments", []):
+        for w in seg.get("words", []):
+            text = (w.get("text") or "").strip()
+            s, e = w.get("start"), w.get("end")
+            if s is not None and e is not None:
+                words_with_ts.append((text, float(s), float(e)))
+    return words_with_ts
+
+
+def find_filler_ranges(words_with_ts: list) -> list:
+    """从 (word, start, end) 中筛出语助词时间段 [(start_sec, end_sec), ...]"""
+    out = []
+    for word, s, e in words_with_ts:
+        t = re.sub(r"[\s，。、,.\-—…]+", "", str(word).strip())
+        if t and FILLER_RE.match(t) and e - s > 0.05:
+            out.append((s, e))
+    return sorted(out, key=lambda x: x[0])
+
+
+def build_keep_ranges(remove_ranges: list, total_duration: float) -> list:
+    keep = []
+    current = 0.0
+    for rs, re in sorted(remove_ranges, key=lambda x: x[0]):
+        if rs > current + 0.05:
+            keep.append((current, rs))
+        current = max(current, re)
+    if current < total_duration - 0.05:
+        keep.append((current, total_duration))
+    return keep
+
+
+def run_ffmpeg(args: list) -> bool:
+    return subprocess.run(args, capture_output=True).returncode == 0
+
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("video", help="输入视频")
+    ap.add_argument("-o", "--output", help="输出路径")
+    args = ap.parse_args()
+
+    video_path = Path(args.video).resolve()
+    if not video_path.exists():
+        print("❌ 视频不存在")
+        return 1
+    output_path = Path(args.output) if args.output else video_path.parent / f"{video_path.stem}_去嗯.mp4"
+    total_duration = get_duration(str(video_path))
+    if total_duration <= 0:
+        print("❌ 无法获取时长")
+        return 1
+
+    with tempfile.TemporaryDirectory() as tmpdir:
+        tmp = Path(tmpdir)
+        audio_path = tmp / "audio.wav"
+        print("1. 提取音频 16k...")
+        if not extract_audio(str(video_path), str(audio_path)):
+            print("❌ 提取音频失败")
+            return 1
+
+        words_with_ts = []
+        try:
+            from funasr import AutoModel
+            print("2. FunASR Paraformer 词级转录（中文）...")
+            words_with_ts = transcribe_funasr(str(audio_path))
+        except ImportError:
+            print("2. FunASR 未安装，回退 whisper-timestamped...")
+            try:
+                words_with_ts = transcribe_whisper_ts(str(audio_path))
+            except Exception as e:
+                print(f"❌ 转录失败: {e}")
+                return 1
+
+        if not words_with_ts:
+            print("   未获取到词级时间戳，尝试句子级...")
+            try:
+                from funasr import AutoModel
+                model = AutoModel(model="iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch", device="cpu")
+                result = model.generate(input=str(audio_path), batch_size_s=300)
+                for item in (result or []):
+                    if not item:
+                        continue
+                    text = (item.get("text") or "").strip()
+                    ts = item.get("timestamp") or []
+                    if isinstance(ts, list) and len(ts) >= 2 and isinstance(ts[0], (list, tuple)):
+                        for pair in ts:
+                            if len(pair) >= 2:
+                                s, e = float(pair[0]) / 1000, float(pair[1]) / 1000
+                                words_with_ts.append((text, s, e))
+                                break
+                    start, end = item.get("start"), item.get("end")
+                    if start is not None and end is not None:
+                        words_with_ts.append((text, float(start) if start < 1000 else start / 1000, float(end) if end < 1000 else end / 1000))
+            except Exception as e:
+                print(f"   句子级也失败: {e}")
+
+        remove_ranges = find_filler_ranges(words_with_ts)
+        print(f"   检测到 {len(remove_ranges)} 处语助词（嗯/啊/呃等）")
+        for s, e in remove_ranges[:15]:
+            print(f"     {s:.2f}s - {e:.2f}s")
+        if len(remove_ranges) > 15:
+            print(f"     ... 共 {len(remove_ranges)} 处")
+
+        if not remove_ranges:
+            print("   无语助词，复制原视频")
+            import shutil
+            shutil.copy(str(video_path), str(output_path))
+            print(f"✅ 已输出: {output_path}")
+            return 0
+
+        keep_ranges = build_keep_ranges(remove_ranges, total_duration)
+        print("3. ffmpeg 裁剪并拼接...")
+        seg_files = []
+        for i, (start, end) in enumerate(keep_ranges):
+            dur = end - start
+            if dur < 0.1:
+                continue
+            seg = tmp / f"seg_{i:04d}.mp4"
+            if run_ffmpeg(["ffmpeg", "-y", "-ss", str(start), "-t", str(dur), "-i", str(video_path), "-c", "copy", str(seg)]):
+                seg_files.append(seg)
+        if not seg_files:
+            print("❌ 片段生成失败")
+            return 1
+        list_path = tmp / "list.txt"
+        with open(list_path, "w") as f:
+            for seg in seg_files:
+                f.write(f"file '{seg}'\n")
+        if not run_ffmpeg(["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", str(list_path), "-c", "copy", str(output_path)]):
+            print("❌ 拼接失败")
+            return 1
+
+    print(f"✅ 已输出: {output_path}")
+    return 0
+
+
+if __name__ == "__main__":
+    exit(main())
--- a/04_卡火（火）/火炬_全栈消息/本地项目启动/SKILL.md
+++ b/04_卡火（火）/火炬_全栈消息/本地项目启动/SKILL.md
@@ -34,8 +34,8 @@ updated: "2026-02-26"
   - 详见：`运营中枢/工作台/唯一MongoDB约定.md` 与 金仓 `datacenter/README.md`。

 2. **website 分组（网站）**
-   - 网站类服务（如玩值电竞 web、神射手）归入 **website** 编排，不单独为某站新建独立 compose 或独立 MongoDB。
-   - 编排位置：神射手目录 `docker-compose.yml`（project name: website）；端口与分组以 **《项目与端口注册表》** 为准。
+   - 网站类服务（玩值电竞 web、神射手、卡若ai网站、玩值大屏、Soul 创业实验等）归入 **website** 编排，不单独为某站新建独立 MongoDB；各站 compose 均设 **`name: website`**，在 Docker Desktop 中统一显示为 website 分组。
+   - 编排位置：神射手目录 `docker-compose.yml`（主站）；**全量 website 项目**见工作台 **`website分组清单.md`**；端口以 **《项目与端口注册表》** 为准。
   - website 内服务通过外部网络 **datacenter_network** 连接 datacenter 内数据库，例如 **`MONGODB_URI=mongodb://datacenter_mongodb:27017`**。

 执行 Docker 部署或修改 compose 时，先对照上述两条检查，不符合则修正后再部署。
--- a/运营中枢/工作台/00_账号与API索引.md
+++ b/运营中枢/工作台/00_账号与API索引.md
@@ -59,10 +59,6 @@
 | Secret | `v1:C6mw1SlvXsJdlO4VFEXSQEVf:519gA0DPqIMbjvfMh7CXf4B2` |
 | 模型 | `claude-opus` |

-### Gemini（卡若AI 工作台）
-| 项 | 值 |
-|----|-----|
-| API Key | `AIzaSyCPARryq8o6MKptLoT4STAvCsRB7uZuOK8` |

 ### Gitea（CKB NAS 自建 Git）
 | 项 | 值 |
--- a/运营中枢/工作台/gitea_push_log.md
+++ b/运营中枢/工作台/gitea_push_log.md
@@ -167,3 +167,4 @@
 | 2026-02-26 16:43:05 | 🔄 卡若AI 同步 2026-02-26 16:41 | 更新：金仓、水桥平台对接、卡木、运营中枢工作台 | 排除 >20MB: 14 个 |
 | 2026-02-27 05:06:58 | 🔄 卡若AI 同步 2026-02-27 05:06 | 更新：水桥平台对接、运营中枢工作台 | 排除 >20MB: 14 个 |
 | 2026-02-27 05:21:57 | 🔄 卡若AI 同步 2026-02-27 05:21 | 更新：水桥平台对接、运营中枢工作台 | 排除 >20MB: 14 个 |
+| 2026-02-27 10:53:48 | 🔄 卡若AI 同步 2026-02-27 10:53 | 更新：Cursor规则、水桥平台对接、卡木、运营中枢工作台 | 排除 >20MB: 14 个 |
--- a/运营中枢/工作台/website分组清单.md
+++ b/运营中枢/工作台/website分组清单.md
@@ -0,0 +1,54 @@
+# Docker website 分组清单
+
+> 所有**对外提供 Web 页面的 Docker 项目**统一归入 **website** 分组，便于在 Docker Desktop 中归类查看、统一启动约定。
+> 与「唯一 MongoDB 约定」中 website 分组一致；数据库类服务归 **datacenter**，不在此表。
+
+---
+
+## 分组约定
+
+- **website**：网站类服务（前端/站点），不在此分组内建数据库；需连 MongoDB 时用唯一实例 `datacenter_mongodb`（宿主机 27017）。
+- 各网站项目在各自 `docker-compose.yml` 中设置 **`name: website`**，这样在 Docker Desktop 中会显示在同一 **website** 项目下。
+
+---
+
+## 已归入 website 的项目
+
+| 项目名 | 容器名 | 端口 | 编排位置 | 说明 |
+|:---|:---|:--:|:---|:---|
+| 神射手 | website-shensheshou | **3117** | `开发/2、私域银行/神射手/docker-compose.yml` | 与玩值电竞同文件启动 |
+| 玩值电竞 Web | website-wanzhi-web | **3001** | 同上 | 同上 |
+| 卡若ai网站 | website-karuo-site | **3100** | `开发/3、自营项目/卡若ai网站/docker-compose.yml` | 独立编排，name: website |
+| 玩值大屏 | website-wz-screen | **3034** | `开发/3、自营项目/玩值大屏/docker-compose.yml` | 独立编排，name: website |
+| Soul 创业实验 | website-soul-book | **3000** | `开发/3、自营项目/一场soul的创业实验-react/docker-compose.yml` | 独立编排，name: website（本地） |
+
+---
+
+## 统一启动方式
+
+- **神射手 + 玩值电竞**（主站）：在神射手目录执行  
+  `docker compose up -d` 或 `docker compose up -d --build`  
+  两站会出现在 Docker Desktop 的 **website** 分组下。
+- **其他网站**：在各自项目目录执行  
+  `docker compose up -d`  
+  因已设置 `name: website`，容器同样会归在 **website** 分组下。
+
+---
+
+## 不归入 website 的 Docker 项目
+
+以下为**非纯网站**（含数据库/中台/工具），保持各自 project name，不设为 website：
+
+| 类型 | 项目示例 | 分组/说明 |
+|:---|:---|:---|
+| 数据库/中间件 | datacenter（MongoDB 等）、上帝之眼（MySQL/Redis/InfluxDB） | datacenter 或独立项目名 |
+| 完整业务系统 | 存客宝 cunkebao_v3（含前端+后端+DB） | 独立 compose |
+| 工具/隧道 | frp、clawdbot、工作手机 SDK | 独立 compose |
+
+---
+
+## 版本记录
+
+| 日期 | 变更 |
+|:---|:---|
+| 2026-02-27 | 初版；列出神射手、玩值电竞、卡若ai网站、玩值大屏、Soul 创业实验；约定 name: website 统一归类 |
--- a/运营中枢/工作台/代码管理.md
+++ b/运营中枢/工作台/代码管理.md
@@ -170,3 +170,4 @@
 | 2026-02-26 16:43:05 | 成功 | 成功 | 🔄 卡若AI 同步 2026-02-26 16:41 | 更新：金仓、水桥平台对接、卡木、运营中枢工作台 | 排除 >20MB: 14 个 | [仓库](http://open.quwanzhi.com:3000/fnvtk/karuo-ai) [百科](http://open.quwanzhi.com:3000/fnvtk/karuo-ai/wiki) |
 | 2026-02-27 05:06:58 | 成功 | 成功 | 🔄 卡若AI 同步 2026-02-27 05:06 | 更新：水桥平台对接、运营中枢工作台 | 排除 >20MB: 14 个 | [仓库](http://open.quwanzhi.com:3000/fnvtk/karuo-ai) [百科](http://open.quwanzhi.com:3000/fnvtk/karuo-ai/wiki) |
 | 2026-02-27 05:21:57 | 成功 | 成功 | 🔄 卡若AI 同步 2026-02-27 05:21 | 更新：水桥平台对接、运营中枢工作台 | 排除 >20MB: 14 个 | [仓库](http://open.quwanzhi.com:3000/fnvtk/karuo-ai) [百科](http://open.quwanzhi.com:3000/fnvtk/karuo-ai/wiki) |
+| 2026-02-27 10:53:48 | 成功 | 成功 | 🔄 卡若AI 同步 2026-02-27 10:53 | 更新：Cursor规则、水桥平台对接、卡木、运营中枢工作台 | 排除 >20MB: 14 个 | [仓库](http://open.quwanzhi.com:3000/fnvtk/karuo-ai) [百科](http://open.quwanzhi.com:3000/fnvtk/karuo-ai/wiki) |
--- a/运营中枢/工作台/唯一MongoDB约定.md
+++ b/运营中枢/工作台/唯一MongoDB约定.md
@@ -9,9 +9,9 @@
 | 分组 | 用途 | 编排位置 |
 |:---|:---|:---|
 | **datacenter** | 所有**数据库相关** Docker 服务（MongoDB、Redis、MySQL、向量库等） | 卡若AI `01_卡资（金）/金仓_存储备份/datacenter/docker-compose.yml`，或 数据中台 系统基座 |
-| **website** | 网站类服务（神射手、玩值电竞 Web 等），不在此分组内建数据库 | 神射手目录 `docker-compose.yml`（project name: website） |
+| **website** | 网站类服务（神射手、玩值电竞 Web、卡若ai网站、玩值大屏、Soul 创业实验等），不在此分组内建数据库 | 神射手目录 `docker-compose.yml`（主站）；其余见 **`website分组清单.md`** |

-以后新增数据库类服务一律放入 **datacenter** 分组；新增网站类服务放入 **website** 分组，通过外部网络 `datacenter_network` 连接 datacenter 内容器。
+以后新增数据库类服务一律放入 **datacenter** 分组；新增网站类服务放入 **website** 分组（各 compose 设 `name: website`），通过外部网络 `datacenter_network` 连接 datacenter 内容器。**全量 website 项目列表**见工作台 **`website分组清单.md`**。

 ---

@@ -55,3 +55,4 @@
 |:---|:---|
 | 2026-02-26 | 初始约定；website 仅含 shensheshou + wanzhi-web，统一连 datacenter_mongodb 27017 |
 | 2026-02-26 | 新增 datacenter 分组约定；所有数据库相关 Docker 项目归入 datacenter，website 通过 datacenter_network 连接 |
+| 2026-02-27 | website 分组扩展：卡若ai网站、玩值大屏、Soul 创业实验归入 website；详见 `website分组清单.md` |
--- a/运营中枢/工作台/项目与端口注册表.md
+++ b/运营中枢/工作台/项目与端口注册表.md
@@ -28,7 +28,7 @@

 - **启动某项目**：说「本地运行 玩值电竞App」「启动玩值电竞」等 → 走「本地项目启动」Skill，按上表路径与端口执行。
 - **新增/修改绑定**：在本表增改一行，并让该项目的 dev 脚本使用对应端口（如 Next.js：`next dev -p 端口`），再在 Skill 中补一句说明即可。
- **Docker 网站**：玩值电竞 web 已并入 **website** 编排（与神射手同组），容器名 `website-wanzhi-web`，端口 **3001**；**唯一 MongoDB** 为 datacenter_mongodb（27017），见工作台 **`唯一MongoDB约定.md`**；不再新建 MongoDB。
+- **Docker 网站**：玩值电竞 web 已并入 **website** 编排（与神射手同组），容器名 `website-wanzhi-web`，端口 **3001**；**唯一 MongoDB** 为 datacenter_mongodb（27017），见工作台 **`唯一MongoDB约定.md`**；不再新建 MongoDB。**所有归入 website 的 Docker 网站项目**（神射手、玩值电竞、卡若ai网站、玩值大屏、Soul 创业实验等）见 **`website分组清单.md`**。
 - **Docker 部署时**：须遵守「唯一 MongoDB」与「容器分组」约定，执行前见 **本地项目启动** Skill 内「Docker 部署约定」一节。
 - **Docker 跑本地最新**：每次本地更新代码/内容后，要让 Docker 内跑的是最新文件，须在对应编排目录执行 **`docker compose up -d --build`**（如 website 在神射手目录）。否则容器内仍是旧镜像。**所有项目一律如此**。
 - **玩值电竞 部署/运行/访问**：**一律用 Docker 访问，不用 pnpm dev**。访问用 **http://localhost:3001**，数据库用唯一 **wanzhi_esports**（datacenter_mongodb 27017）；启动、部署在神射手目录 `docker compose up -d --build`（更新后须带 `--build` 以同步本地最新）。回答此类问题时**须用卡若复盘格式**回复。
@@ -44,3 +44,4 @@
 | 2026-02-26 | 玩值电竞 专注清单与番茄钟约定：卡若AI 开发时把工作时间以番茄钟记入 WebPomodoro |
 | 2026-02-26 | Docker 跑本地最新：更新后须 up -d --build；所有项目一致；注册表与 Skill 同步 |
 | 2026-02-26 | **约定**：每次项目/端口/启动或部署有变更时，须同步更新本表，保持本 doc 最新 |
+| 2026-02-27 | Docker 网站项目统一归入 website 分组，全量清单见 `website分组清单.md` |