🔄 卡若AI 同步 2026-02-28 06:25 | 更新:水桥平台对接、卡木、火炬、运营中枢工作台 | 排除 >20MB: 14 个
This commit is contained in:
21
02_卡人(水)/水桥_平台对接/需求_酷哥账号自动拉群与AI加人.md
Normal file
21
02_卡人(水)/水桥_平台对接/需求_酷哥账号自动拉群与AI加人.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# 需求:酷哥账号 · 自动拉群 / AI 自动加人
|
||||
|
||||
## 背景
|
||||
酷哥跑的账号有两个相关能力:一是自动拉群,二是(在配备 AI 时)微信自动加人。两者根据是否配备 AI 区分使用方式。
|
||||
|
||||
## 规则说明
|
||||
|
||||
**有 AI(配了 AI 手机)**
|
||||
- 会寄一台 AI 手机给合作方。
|
||||
- 该手机上的微信会**自动加人**,无需人工通过。
|
||||
|
||||
**没有 AI**
|
||||
- 微信**不会自动加人**。
|
||||
- 改为用「拉群」方式:把这一批人(例如 180 人)统一**拉进一个群**,在群内触达与运营。
|
||||
|
||||
## 需求要点
|
||||
- 产品/流程上要明确区分:**有 AI → 自动加人**;**无 AI → 不自动加人,改为自动拉群(如 180 人一批)**。
|
||||
- 文案、说明或培训里写清楚:寄出的 AI 手机的作用就是「自动加人」;没 AI 时只提供「拉群」能力,不承诺自动通过好友。
|
||||
|
||||
---
|
||||
*字数约 280 字。*
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"access_token": "u-4_s5zy5N5aaE2oDiiP.5MOl5kgOBk1MXVoaaJMQ00wzm",
|
||||
"refresh_token": "ur-7ty4XF3A17nawQWUi7bm5.l5mUo5k1OpVUaaUxQ00ADn",
|
||||
"access_token": "u-4NFObvRHB9baG_6zHLv28Al5kgW5k1ipp8aaZww00BOn",
|
||||
"refresh_token": "ur-7XjMc1PjB1NoflqWI30rQbl5kqMBk1UhVEaaUwQ00wD6",
|
||||
"name": "飞书用户",
|
||||
"auth_time": "2026-02-27T05:04:40.719684"
|
||||
"auth_time": "2026-02-27T16:16:01.288755"
|
||||
}
|
||||
14
02_卡人(水)/水桥_平台对接/飞书管理/脚本/calendar_add_20h_only.applescript
Normal file
14
02_卡人(水)/水桥_平台对接/飞书管理/脚本/calendar_add_20h_only.applescript
Normal file
@@ -0,0 +1,14 @@
|
||||
-- 仅添加每天 20:00 玩值电竞朋友圈(重复),今天与未来均会显示
|
||||
tell application "Calendar"
|
||||
set calList to (every calendar whose writable is true)
|
||||
if (count of calList) is 0 then set calList to calendars
|
||||
set targetCal to item 1 of calList
|
||||
set today to current date
|
||||
set hours of today to 20
|
||||
set minutes of today to 0
|
||||
set seconds of today to 0
|
||||
set startDate to today
|
||||
set endDate to startDate + (30 * minutes)
|
||||
make new event at end of events of targetCal with properties {summary:"玩值电竞 · 每日朋友圈", description:"写一篇玩值电竞朋友圈", start date:startDate, end date:endDate, recurrence:"FREQ=DAILY"}
|
||||
end tell
|
||||
return "已添加"
|
||||
@@ -0,0 +1,77 @@
|
||||
-- 1) 删除指定日重复项(3月27日 及 今日);2) 确保每天20:00 玩值电竞朋友圈(重复)
|
||||
-- 直接执行:osascript 本文件
|
||||
|
||||
set dayToClean to (current date)
|
||||
set hours of dayToClean to 0
|
||||
set minutes of dayToClean to 0
|
||||
set seconds of dayToClean to 0
|
||||
-- 也清理 2026-03-27(用户截图日)
|
||||
set march27 to (current date)
|
||||
set year of march27 to 2026
|
||||
set month of march27 to 3
|
||||
set day of march27 to 27
|
||||
set hours of march27 to 0
|
||||
set minutes of march27 to 0
|
||||
set seconds of march27 to 0
|
||||
set daysToClean to {dayToClean, march27}
|
||||
-- 仅清理今日与 3月27 日两天的重复项
|
||||
|
||||
tell application "Calendar"
|
||||
set allCals to (every calendar whose writable is true)
|
||||
if (count of allCals) is 0 then set allCals to calendars
|
||||
|
||||
repeat with targetDay in daysToClean
|
||||
set dayEnd to targetDay + 86400
|
||||
repeat with cal in allCals
|
||||
try
|
||||
set dayEvents to (every event of cal where start date ≥ targetDay and start date < dayEnd)
|
||||
set seen to {}
|
||||
set toDelete to {}
|
||||
repeat with ev in dayEvents
|
||||
try
|
||||
set evSummary to summary of ev
|
||||
set evStart to start date of ev
|
||||
set key to (evSummary & "|" & (evStart as text))
|
||||
if seen contains key then
|
||||
set end of toDelete to ev
|
||||
else
|
||||
set end of seen to key
|
||||
end if
|
||||
end try
|
||||
end repeat
|
||||
repeat with ev in toDelete
|
||||
try
|
||||
delete ev
|
||||
end try
|
||||
end repeat
|
||||
end try
|
||||
end repeat
|
||||
end repeat
|
||||
|
||||
-- 添加每天 20:00 玩值电竞朋友圈(重复),若已存在则跳过
|
||||
set calList to (every calendar whose writable is true)
|
||||
if (count of calList) is 0 then set calList to calendars
|
||||
set targetCal to item 1 of calList
|
||||
|
||||
set today to current date
|
||||
set hours of today to 20
|
||||
set minutes of today to 0
|
||||
set seconds of today to 0
|
||||
set startDate to today
|
||||
set endDate to startDate + (30 * minutes)
|
||||
|
||||
set eventSummary to "玩值电竞 · 每日朋友圈"
|
||||
set eventDesc to "写一篇玩值电竞朋友圈"
|
||||
|
||||
-- 检查今天 20:00 是否已有该事件
|
||||
set todayStart to (current date)
|
||||
set hours of todayStart to 20
|
||||
set minutes of todayStart to 0
|
||||
set todayEnd to todayStart + (1 * hours)
|
||||
set existingList to (every event of targetCal where summary is eventSummary and start date ≥ todayStart and start date < todayEnd)
|
||||
if (count of existingList) is 0 then
|
||||
make new event at end of events of targetCal with properties {summary:eventSummary, description:eventDesc, start date:startDate, end date:endDate, recurrence:"FREQ=DAILY"}
|
||||
end if
|
||||
end tell
|
||||
|
||||
return "已清理重复项并确保20:00朋友圈重复事件"
|
||||
@@ -0,0 +1,37 @@
|
||||
-- 仅删除 2026年3月27日 当天的重复日历项(保留每类第一个,删其余)
|
||||
set targetDay to (current date)
|
||||
set year of targetDay to 2026
|
||||
set month of targetDay to 3
|
||||
set day of targetDay to 27
|
||||
set hours of targetDay to 0
|
||||
set minutes of targetDay to 0
|
||||
set seconds of targetDay to 0
|
||||
set dayEnd to targetDay + 86400
|
||||
|
||||
tell application "Calendar"
|
||||
set allCals to (every calendar whose writable is true)
|
||||
if (count of allCals) is 0 then set allCals to calendars
|
||||
repeat with cal in allCals
|
||||
try
|
||||
set dayEvents to (every event of cal where start date ≥ targetDay and start date < dayEnd)
|
||||
set seen to {}
|
||||
set toDelete to {}
|
||||
repeat with ev in dayEvents
|
||||
try
|
||||
set k to (summary of ev) & "|" & ((start date of ev) as text)
|
||||
if seen contains k then
|
||||
set end of toDelete to ev
|
||||
else
|
||||
set end of seen to k
|
||||
end if
|
||||
end try
|
||||
end repeat
|
||||
repeat with ev in toDelete
|
||||
try
|
||||
delete ev
|
||||
end try
|
||||
end repeat
|
||||
end try
|
||||
end repeat
|
||||
end tell
|
||||
return "3月27日重复项已删"
|
||||
@@ -0,0 +1,34 @@
|
||||
-- 仅删除「今天」的重复日历项(保留每类第一个)
|
||||
set targetDay to (current date)
|
||||
set hours of targetDay to 0
|
||||
set minutes of targetDay to 0
|
||||
set seconds of targetDay to 0
|
||||
set dayEnd to targetDay + 86400
|
||||
|
||||
tell application "Calendar"
|
||||
set allCals to (every calendar whose writable is true)
|
||||
if (count of allCals) is 0 then set allCals to calendars
|
||||
repeat with cal in allCals
|
||||
try
|
||||
set dayEvents to (every event of cal where start date ≥ targetDay and start date < dayEnd)
|
||||
set seen to {}
|
||||
set toDelete to {}
|
||||
repeat with ev in dayEvents
|
||||
try
|
||||
set k to (summary of ev) & "|" & ((start date of ev) as text)
|
||||
if seen contains k then
|
||||
set end of toDelete to ev
|
||||
else
|
||||
set end of seen to k
|
||||
end if
|
||||
end try
|
||||
end repeat
|
||||
repeat with ev in toDelete
|
||||
try
|
||||
delete ev
|
||||
end try
|
||||
end repeat
|
||||
end try
|
||||
end repeat
|
||||
end tell
|
||||
return "今日重复项已删"
|
||||
61
02_卡人(水)/水桥_平台对接/飞书管理/脚本/feishu_wiki_rename_node.py
Normal file
61
02_卡人(水)/水桥_平台对接/飞书管理/脚本/feishu_wiki_rename_node.py
Normal file
@@ -0,0 +1,61 @@
|
||||
#!/usr/bin/env python3
|
||||
"""飞书 Wiki 节点重命名:根据 node_token 更新标题。用法: python3 feishu_wiki_rename_node.py <node_token> <新标题>"""
|
||||
import sys
|
||||
import requests
|
||||
from pathlib import Path
|
||||
|
||||
SCRIPT_DIR = Path(__file__).resolve().parent
|
||||
sys.path.insert(0, str(SCRIPT_DIR))
|
||||
import feishu_wiki_create_doc as fwd
|
||||
|
||||
def get_space_id(node_token: str, headers: dict) -> str | None:
|
||||
r = requests.get(
|
||||
f"https://open.feishu.cn/open-apis/wiki/v2/spaces/get_node?token={node_token}",
|
||||
headers=headers, timeout=30)
|
||||
try:
|
||||
j = r.json()
|
||||
except Exception:
|
||||
print("get_node 响应非 JSON:", r.text[:300])
|
||||
return None
|
||||
if j.get("code") != 0:
|
||||
print("get_node 失败:", j.get("msg"), j)
|
||||
return None
|
||||
node = (j.get("data") or {}).get("node") or {}
|
||||
space_id = (node.get("space_id") or node.get("origin_space_id") or
|
||||
(node.get("space") or {}).get("space_id"))
|
||||
return space_id
|
||||
|
||||
def rename_node(space_id: str, node_token: str, new_title: str, headers: dict) -> bool:
|
||||
# 飞书文档: 更新节点标题 PATCH .../nodes/{node_token}/title(若 404 则需应用权限或手动改)
|
||||
url = f"https://open.feishu.cn/open-apis/wiki/v2/spaces/{space_id}/nodes/{node_token}/title"
|
||||
r = requests.patch(url, headers=headers, json={"title": new_title}, timeout=30)
|
||||
if r.status_code != 200:
|
||||
return False
|
||||
try:
|
||||
return r.json().get("code") == 0
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 3:
|
||||
print("用法: python3 feishu_wiki_rename_node.py <node_token> <新标题>")
|
||||
sys.exit(1)
|
||||
node_token = sys.argv[1]
|
||||
new_title = " ".join(sys.argv[2:])
|
||||
token = fwd.get_token(node_token)
|
||||
if not token:
|
||||
print("❌ Token 无效")
|
||||
sys.exit(1)
|
||||
headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json"}
|
||||
space_id = get_space_id(node_token, headers)
|
||||
if not space_id:
|
||||
print("❌ 无法获取 space_id")
|
||||
sys.exit(1)
|
||||
if rename_node(space_id, node_token, new_title, headers):
|
||||
print(f"✅ 标题已更新为: {new_title}")
|
||||
else:
|
||||
print("❌ 更新标题失败")
|
||||
sys.exit(1)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -1,7 +1,9 @@
|
||||
# AI 视频切片 - GitHub 与替代方案
|
||||
|
||||
> 卡若AI 视频切片 Skill 的简化方案与可集成替代
|
||||
> 更新:2026-02-03
|
||||
> 更新:2026-02-27
|
||||
|
||||
**相关**:去语助词(嗯、啊等)的全网与 GitHub 方案见 → [去语助词_全网与GitHub方案汇总.md](./去语助词_全网与GitHub方案汇总.md)
|
||||
|
||||
## 一、当前方案(卡若AI 本地方案)
|
||||
|
||||
|
||||
136
03_卡木(木)/木叶_视频内容/视频切片/参考资料/去语助词_全网与GitHub方案汇总.md
Normal file
136
03_卡木(木)/木叶_视频内容/视频切片/参考资料/去语助词_全网与GitHub方案汇总.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# 去语助词(嗯、啊等)— 全网与 GitHub 方案汇总
|
||||
|
||||
> 调研时间:2026-02-27
|
||||
> 用途:视频/音频中自动检测并删除「嗯」「啊」「呃」等语助词(filler words / disfluency)
|
||||
|
||||
---
|
||||
|
||||
## 一、结论速览
|
||||
|
||||
| 类型 | 中文「嗯」支持 | 说明 |
|
||||
|------|----------------|------|
|
||||
| **商用 API** | ❌ Deepgram 仅英文 | filler_words 仅英文 |
|
||||
| **开源·转录系** | ⚠️ Whisper 普遍不输出 嗯 | 需词级时间戳 + 能输出 嗯 的 ASR |
|
||||
| **开源·纯音频检测** | ⚠️ 多为英文数据训练 | 可尝试迁移/微调 |
|
||||
| **中文 ASR + 后处理** | ✅ 理论可行 | FunASR 等词级时间戳,再过滤 嗯 时间段 |
|
||||
|
||||
---
|
||||
|
||||
## 二、商用/在线方案
|
||||
|
||||
### 1. Deepgram(API)
|
||||
|
||||
- **能力**:`filler_words=true` 可检测 uh, um, mhmm 等,并返回时间戳。
|
||||
- **限制**:**仅支持英文**,中文(Mandarin)不支持 filler words。
|
||||
- 文档:<https://developers.deepgram.com/docs/filler-words>
|
||||
|
||||
### 2. Descript / Cleanvoice.ai / Resound / Auphonic
|
||||
|
||||
- **能力**:一键去除 um、uh、you know 等,部分支持多语言(如 Cleanvoice 德/法/澳等)。
|
||||
- **限制**:多为英文场景,**未查到明确中文「嗯」「啊」支持**;多为付费或按量计费。
|
||||
|
||||
---
|
||||
|
||||
## 三、GitHub 开源方案
|
||||
|
||||
### 1. daily-demos/filler-word-removal ⭐ 14
|
||||
|
||||
- **仓库**:<https://github.com/daily-demos/filler-word-removal>
|
||||
- **流程**:视频 → 提音频 → **Whisper 或 Deepgram 带词级时间戳转录** → 识别 filler 时间段 → ffmpeg 切片去掉这些段 → 再拼接成新视频。
|
||||
- **特点**:与当前卡若AI 思路一致;支持 Deepgram 时用其 filler 检测(仅英文)。
|
||||
- **中文**:若用 Whisper,中文下通常**不输出「嗯」**,需换用能输出 嗯 的 ASR(见下文 FunASR)。
|
||||
|
||||
### 2. sagniklp/Disfluency-Removal-API ⭐ 16(ICASSP '19)
|
||||
|
||||
- **仓库**:<https://github.com/sagniklp/Disfluency-Removal-API>
|
||||
- **能力**:CRNN + 静音/语音分类,从**音频**检测 disfluency 并移除。
|
||||
- **依赖**:TensorFlow 1.x、web.py、librosa、pydub、pyAudioAnalysis、hmmlearn 等。
|
||||
- **中文**:论文与实现偏英文;**理论上可迁移**,需自备中文语助词数据微调/重训。
|
||||
|
||||
### 3. amritkromana/disfluency_detection_from_audio ⭐ 32
|
||||
|
||||
- **仓库**:<https://github.com/amritkromana/disfluency_detection_from_audio>
|
||||
- **能力**:**不依赖转录**的 disfluency 检测,三种模态:
|
||||
- **language**:Whisper 逐字转录 + BERT 做 disfluency 分类(英文 verbatim);
|
||||
- **acoustic**:WavLM 纯声学 disfluency 检测;
|
||||
- **multimodal**:语言 + 声学融合。
|
||||
- **输出**:帧级(约 20ms)disfluency 预测,可转为时间段做裁剪。
|
||||
- **中文**:模型基于英文(Switchboard 等);**中文需自采集数据微调或仅作参考**。
|
||||
|
||||
### 4. adammoss/uhm ⭐ 1
|
||||
|
||||
- **仓库**:<https://github.com/adammoss/uhm>
|
||||
- **能力**:`pip install uhm`,用 **Watson API** 检测并移除 uhm 等。
|
||||
- **限制**:依赖 IBM Watson,**面向英文**。
|
||||
|
||||
### 5. umdone(notebook)
|
||||
|
||||
- **思路**:按静音切词 → RMS 等特征 → 与「语助词模板」比对,识别 um 等并删除。
|
||||
- **限制**:需自建语助词模板;中文「嗯」需自己采数据。
|
||||
|
||||
---
|
||||
|
||||
## 四、中文向可行路线
|
||||
|
||||
### 方案 A:FunASR Paraformer 词级时间戳 + 过滤 嗯
|
||||
|
||||
- **依据**:FunASR Paraformer 支持**词级时间戳**(每词 start/end),且中文 ASR 是否过滤「嗯」取决于训练数据与后处理,部分工业模型会保留语气词。
|
||||
- **做法**:用 Paraformer 转写 → 在词列表中筛「嗯」「啊」「呃」等 → 得到时间段 → 用现有 `remove_filler_segments.py` 的 ffmpeg 裁剪逻辑拼接。
|
||||
- **文档**:<https://github.com/modelscope/FunASR>,时间戳示例见社区博客(Paraformer timestamp_model=True)。
|
||||
|
||||
### 方案 B:纯声学 disfluency 检测(迁移/微调)
|
||||
|
||||
- **依据**:amritkromana 的 **acoustic** 分支为纯声学,不依赖 ASR 文本,理论上有机会迁移到中文。
|
||||
- **做法**:用其 WavLM 等预训练 + 自采「中文 嗯/啊」片段做帧级标注,微调后输出时间段 → 再裁剪。
|
||||
- **成本**:需标注数据与 GPU 训练。
|
||||
|
||||
### 方案 C:保持「转录 + 手动/半自动」兜底
|
||||
|
||||
- **现状**:Whisper / whisper-timestamped 在中文下**极少输出单独「嗯」**,自动检测率低。
|
||||
- **兜底**:听一遍视频,把「嗯」出现的时间点记到 `remove_list` 文本,用现有 `remove_filler_segments.py --remove-list` 做裁剪(已实现)。
|
||||
|
||||
---
|
||||
|
||||
## 五、推荐落地顺序
|
||||
|
||||
1. **短期**:用 **FunASR Paraformer**(或其它支持中文词级时间戳的 ASR)做一次转录,看是否出现「嗯」「啊」等词;若有,直接接入现有「识别 filler 时间段 → ffmpeg 裁剪」流程。
|
||||
2. **中期**:若 Paraformer 也过滤掉 嗯,再评估 **disfluency_detection_from_audio** 的 acoustic 模型,用少量中文语助词数据做微调或二分类(语助词 vs 非语助词)。
|
||||
3. **长期**:关注 Deepgram/其他厂商是否推出**中文 filler words** API;或采用商业产品(如 Cleanvoice 等)若其支持中文。
|
||||
|
||||
---
|
||||
|
||||
## 六、与本项目已有脚本的对应关系
|
||||
|
||||
| 本仓库脚本 | 作用 | 与上述方案关系 |
|
||||
|------------|------|----------------|
|
||||
| `remove_filler_segments.py` | 按 SRT 或 `--remove-list` 时间段裁剪视频 | 通用「裁剪」层,可接任何能输出「时间段」的方案 |
|
||||
| `remove_ng_auto.py` | whisper-timestamped 词级 + 语助词匹配 | 当前中文下 嗯 几乎不被识别,可替换为 FunASR 等 |
|
||||
| 视频切片 SKILL | 转录 → 高光 → 切片 → 增强 | 若后续接入「去语助词」步骤,可放在增强前或增强后 |
|
||||
|
||||
---
|
||||
|
||||
## 七、本仓库已实现脚本
|
||||
|
||||
| 脚本 | 说明 |
|
||||
|------|------|
|
||||
| `remove_ng_funasr.py` | **推荐**:优先 FunASR 词级时间戳(中文易出 嗯),未安装则回退 whisper-timestamped;输出同目录 `*_去嗯.mp4` |
|
||||
| `remove_filler_segments.py` | 通用:SRT 纯语助词段落 或 `--remove-list` 手动时间点 → ffmpeg 裁剪 |
|
||||
| `remove_ng_auto.py` | 仅 whisper-timestamped,中文下多不识别 嗯 |
|
||||
|
||||
**执行示例**(在视频切片/脚本目录):
|
||||
```bash
|
||||
conda activate mlx-whisper
|
||||
python3 remove_ng_funasr.py "/path/to/视频.mp4" -o "/path/to/输出_去嗯.mp4"
|
||||
```
|
||||
若需最佳效果,请在网络畅通时安装 FunASR:`pip install funasr`,再运行本脚本。
|
||||
|
||||
---
|
||||
|
||||
## 八、参考链接
|
||||
|
||||
- Deepgram filler words(英文):<https://developers.deepgram.com/docs/filler-words>
|
||||
- daily-demos/filler-word-removal:<https://github.com/daily-demos/filler-word-removal>
|
||||
- Disfluency-Removal-API(ICASSP '19):<https://github.com/sagniklp/Disfluency-Removal-API>
|
||||
- disfluency_detection_from_audio(audio-only):<https://github.com/amritkromana/disfluency_detection_from_audio>
|
||||
- FunASR:<https://github.com/modelscope/FunASR>
|
||||
- 论文:Automatic Disfluency Detection from Untranscribed Speech(IEEE TASLP 投稿)
|
||||
220
03_卡木(木)/木叶_视频内容/视频切片/脚本/remove_ng_funasr.py
Normal file
220
03_卡木(木)/木叶_视频内容/视频切片/脚本/remove_ng_funasr.py
Normal file
@@ -0,0 +1,220 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
去语助词(最佳方案):用 FunASR Paraformer 中文词级时间戳识别 嗯/啊/呃,再 ffmpeg 裁剪。
|
||||
若未安装 FunASR 则回退到 whisper-timestamped。
|
||||
"""
|
||||
import argparse
|
||||
import re
|
||||
import subprocess
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
FILLER_RE = re.compile(r"^[嗯啊呃额哦噢唉哎诶喔]+$")
|
||||
|
||||
|
||||
def get_duration(path: str) -> float:
|
||||
r = subprocess.run(
|
||||
["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", path],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
return float(r.stdout.strip()) if r.returncode == 0 else 0
|
||||
|
||||
|
||||
def extract_audio(video: str, wav: str) -> bool:
|
||||
return subprocess.run(
|
||||
["ffmpeg", "-y", "-i", video, "-vn", "-acodec", "pcm_s16le", "-ar", "16000", wav],
|
||||
capture_output=True
|
||||
).returncode == 0
|
||||
|
||||
|
||||
def transcribe_funasr(audio_path: str):
|
||||
"""FunASR 中文词级时间戳,返回 [(word, start_sec, end_sec), ...]"""
|
||||
from funasr import AutoModel
|
||||
model = AutoModel(
|
||||
model="iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
|
||||
vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch",
|
||||
punc_model="iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
|
||||
device="cpu",
|
||||
)
|
||||
result = model.generate(input=audio_path, batch_size_s=300)
|
||||
words_with_ts = []
|
||||
if not result or len(result) == 0:
|
||||
return words_with_ts
|
||||
for item in result:
|
||||
if not item:
|
||||
continue
|
||||
text = (item.get("text") or "").strip()
|
||||
ts = item.get("timestamp") or item.get("timestamps") or []
|
||||
# 格式1: timestamp = [[start_ms, end_ms], ...] 与字符逐对
|
||||
if isinstance(ts, list) and ts and isinstance(ts[0], (list, tuple)):
|
||||
for i, pair in enumerate(ts):
|
||||
if len(pair) >= 2:
|
||||
s_ms, e_ms = float(pair[0]), float(pair[1])
|
||||
word = text[i] if i < len(text) else ""
|
||||
words_with_ts.append((word, s_ms / 1000, e_ms / 1000))
|
||||
continue
|
||||
if isinstance(ts, list) and ts and isinstance(ts[0], dict):
|
||||
for w in ts:
|
||||
word = w.get("word", w.get("text", ""))
|
||||
s = w.get("start", w.get("start_time", 0))
|
||||
e = w.get("end", w.get("end_time", 0))
|
||||
if s is not None and e is not None:
|
||||
s, e = float(s), float(e)
|
||||
if s > 1000:
|
||||
s, e = s / 1000, e / 1000
|
||||
words_with_ts.append((str(word), s, e))
|
||||
continue
|
||||
# 整句
|
||||
start = item.get("start", item.get("start_time"))
|
||||
end = item.get("end", item.get("end_time"))
|
||||
if start is not None and end is not None:
|
||||
s, e = float(start), float(end)
|
||||
if s > 1000:
|
||||
s, e = s / 1000, e / 1000
|
||||
words_with_ts.append((text, s, e))
|
||||
return words_with_ts
|
||||
|
||||
|
||||
def transcribe_whisper_ts(audio_path: str):
|
||||
"""回退:whisper-timestamped 词级,返回 [(word, start_sec, end_sec), ...]"""
|
||||
import whisper_timestamped as whisper
|
||||
audio = whisper.load_audio(audio_path)
|
||||
model = whisper.load_model("base", device="cpu")
|
||||
result = whisper.transcribe(model, audio, language="zh", detect_disfluencies=True)
|
||||
words_with_ts = []
|
||||
for seg in result.get("segments", []):
|
||||
for w in seg.get("words", []):
|
||||
text = (w.get("text") or "").strip()
|
||||
s, e = w.get("start"), w.get("end")
|
||||
if s is not None and e is not None:
|
||||
words_with_ts.append((text, float(s), float(e)))
|
||||
return words_with_ts
|
||||
|
||||
|
||||
def find_filler_ranges(words_with_ts: list) -> list:
|
||||
"""从 (word, start, end) 中筛出语助词时间段 [(start_sec, end_sec), ...]"""
|
||||
out = []
|
||||
for word, s, e in words_with_ts:
|
||||
t = re.sub(r"[\s,。、,.\-—…]+", "", str(word).strip())
|
||||
if t and FILLER_RE.match(t) and e - s > 0.05:
|
||||
out.append((s, e))
|
||||
return sorted(out, key=lambda x: x[0])
|
||||
|
||||
|
||||
def build_keep_ranges(remove_ranges: list, total_duration: float) -> list:
|
||||
keep = []
|
||||
current = 0.0
|
||||
for rs, re in sorted(remove_ranges, key=lambda x: x[0]):
|
||||
if rs > current + 0.05:
|
||||
keep.append((current, rs))
|
||||
current = max(current, re)
|
||||
if current < total_duration - 0.05:
|
||||
keep.append((current, total_duration))
|
||||
return keep
|
||||
|
||||
|
||||
def run_ffmpeg(args: list) -> bool:
|
||||
return subprocess.run(args, capture_output=True).returncode == 0
|
||||
|
||||
|
||||
def main():
|
||||
ap = argparse.ArgumentParser()
|
||||
ap.add_argument("video", help="输入视频")
|
||||
ap.add_argument("-o", "--output", help="输出路径")
|
||||
args = ap.parse_args()
|
||||
|
||||
video_path = Path(args.video).resolve()
|
||||
if not video_path.exists():
|
||||
print("❌ 视频不存在")
|
||||
return 1
|
||||
output_path = Path(args.output) if args.output else video_path.parent / f"{video_path.stem}_去嗯.mp4"
|
||||
total_duration = get_duration(str(video_path))
|
||||
if total_duration <= 0:
|
||||
print("❌ 无法获取时长")
|
||||
return 1
|
||||
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
tmp = Path(tmpdir)
|
||||
audio_path = tmp / "audio.wav"
|
||||
print("1. 提取音频 16k...")
|
||||
if not extract_audio(str(video_path), str(audio_path)):
|
||||
print("❌ 提取音频失败")
|
||||
return 1
|
||||
|
||||
words_with_ts = []
|
||||
try:
|
||||
from funasr import AutoModel
|
||||
print("2. FunASR Paraformer 词级转录(中文)...")
|
||||
words_with_ts = transcribe_funasr(str(audio_path))
|
||||
except ImportError:
|
||||
print("2. FunASR 未安装,回退 whisper-timestamped...")
|
||||
try:
|
||||
words_with_ts = transcribe_whisper_ts(str(audio_path))
|
||||
except Exception as e:
|
||||
print(f"❌ 转录失败: {e}")
|
||||
return 1
|
||||
|
||||
if not words_with_ts:
|
||||
print(" 未获取到词级时间戳,尝试句子级...")
|
||||
try:
|
||||
from funasr import AutoModel
|
||||
model = AutoModel(model="iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch", device="cpu")
|
||||
result = model.generate(input=str(audio_path), batch_size_s=300)
|
||||
for item in (result or []):
|
||||
if not item:
|
||||
continue
|
||||
text = (item.get("text") or "").strip()
|
||||
ts = item.get("timestamp") or []
|
||||
if isinstance(ts, list) and len(ts) >= 2 and isinstance(ts[0], (list, tuple)):
|
||||
for pair in ts:
|
||||
if len(pair) >= 2:
|
||||
s, e = float(pair[0]) / 1000, float(pair[1]) / 1000
|
||||
words_with_ts.append((text, s, e))
|
||||
break
|
||||
start, end = item.get("start"), item.get("end")
|
||||
if start is not None and end is not None:
|
||||
words_with_ts.append((text, float(start) if start < 1000 else start / 1000, float(end) if end < 1000 else end / 1000))
|
||||
except Exception as e:
|
||||
print(f" 句子级也失败: {e}")
|
||||
|
||||
remove_ranges = find_filler_ranges(words_with_ts)
|
||||
print(f" 检测到 {len(remove_ranges)} 处语助词(嗯/啊/呃等)")
|
||||
for s, e in remove_ranges[:15]:
|
||||
print(f" {s:.2f}s - {e:.2f}s")
|
||||
if len(remove_ranges) > 15:
|
||||
print(f" ... 共 {len(remove_ranges)} 处")
|
||||
|
||||
if not remove_ranges:
|
||||
print(" 无语助词,复制原视频")
|
||||
import shutil
|
||||
shutil.copy(str(video_path), str(output_path))
|
||||
print(f"✅ 已输出: {output_path}")
|
||||
return 0
|
||||
|
||||
keep_ranges = build_keep_ranges(remove_ranges, total_duration)
|
||||
print("3. ffmpeg 裁剪并拼接...")
|
||||
seg_files = []
|
||||
for i, (start, end) in enumerate(keep_ranges):
|
||||
dur = end - start
|
||||
if dur < 0.1:
|
||||
continue
|
||||
seg = tmp / f"seg_{i:04d}.mp4"
|
||||
if run_ffmpeg(["ffmpeg", "-y", "-ss", str(start), "-t", str(dur), "-i", str(video_path), "-c", "copy", str(seg)]):
|
||||
seg_files.append(seg)
|
||||
if not seg_files:
|
||||
print("❌ 片段生成失败")
|
||||
return 1
|
||||
list_path = tmp / "list.txt"
|
||||
with open(list_path, "w") as f:
|
||||
for seg in seg_files:
|
||||
f.write(f"file '{seg}'\n")
|
||||
if not run_ffmpeg(["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", str(list_path), "-c", "copy", str(output_path)]):
|
||||
print("❌ 拼接失败")
|
||||
return 1
|
||||
|
||||
print(f"✅ 已输出: {output_path}")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
exit(main())
|
||||
@@ -34,8 +34,8 @@ updated: "2026-02-26"
|
||||
- 详见:`运营中枢/工作台/唯一MongoDB约定.md` 与 金仓 `datacenter/README.md`。
|
||||
|
||||
2. **website 分组(网站)**
|
||||
- 网站类服务(如玩值电竞 web、神射手)归入 **website** 编排,不单独为某站新建独立 compose 或独立 MongoDB。
|
||||
- 编排位置:神射手目录 `docker-compose.yml`(project name: website);端口与分组以 **《项目与端口注册表》** 为准。
|
||||
- 网站类服务(玩值电竞 web、神射手、卡若ai网站、玩值大屏、Soul 创业实验等)归入 **website** 编排,不单独为某站新建独立 MongoDB;各站 compose 均设 **`name: website`**,在 Docker Desktop 中统一显示为 website 分组。
|
||||
- 编排位置:神射手目录 `docker-compose.yml`(主站);**全量 website 项目**见工作台 **`website分组清单.md`**;端口以 **《项目与端口注册表》** 为准。
|
||||
- website 内服务通过外部网络 **datacenter_network** 连接 datacenter 内数据库,例如 **`MONGODB_URI=mongodb://datacenter_mongodb:27017`**。
|
||||
|
||||
执行 Docker 部署或修改 compose 时,先对照上述两条检查,不符合则修正后再部署。
|
||||
|
||||
@@ -59,10 +59,6 @@
|
||||
| Secret | `v1:C6mw1SlvXsJdlO4VFEXSQEVf:519gA0DPqIMbjvfMh7CXf4B2` |
|
||||
| 模型 | `claude-opus` |
|
||||
|
||||
### Gemini(卡若AI 工作台)
|
||||
| 项 | 值 |
|
||||
|----|-----|
|
||||
| API Key | `AIzaSyCPARryq8o6MKptLoT4STAvCsRB7uZuOK8` |
|
||||
|
||||
### Gitea(CKB NAS 自建 Git)
|
||||
| 项 | 值 |
|
||||
|
||||
@@ -167,3 +167,4 @@
|
||||
| 2026-02-26 16:43:05 | 🔄 卡若AI 同步 2026-02-26 16:41 | 更新:金仓、水桥平台对接、卡木、运营中枢工作台 | 排除 >20MB: 14 个 |
|
||||
| 2026-02-27 05:06:58 | 🔄 卡若AI 同步 2026-02-27 05:06 | 更新:水桥平台对接、运营中枢工作台 | 排除 >20MB: 14 个 |
|
||||
| 2026-02-27 05:21:57 | 🔄 卡若AI 同步 2026-02-27 05:21 | 更新:水桥平台对接、运营中枢工作台 | 排除 >20MB: 14 个 |
|
||||
| 2026-02-27 10:53:48 | 🔄 卡若AI 同步 2026-02-27 10:53 | 更新:Cursor规则、水桥平台对接、卡木、运营中枢工作台 | 排除 >20MB: 14 个 |
|
||||
|
||||
54
运营中枢/工作台/website分组清单.md
Normal file
54
运营中枢/工作台/website分组清单.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# Docker website 分组清单
|
||||
|
||||
> 所有**对外提供 Web 页面的 Docker 项目**统一归入 **website** 分组,便于在 Docker Desktop 中归类查看、统一启动约定。
|
||||
> 与「唯一 MongoDB 约定」中 website 分组一致;数据库类服务归 **datacenter**,不在此表。
|
||||
|
||||
---
|
||||
|
||||
## 分组约定
|
||||
|
||||
- **website**:网站类服务(前端/站点),不在此分组内建数据库;需连 MongoDB 时用唯一实例 `datacenter_mongodb`(宿主机 27017)。
|
||||
- 各网站项目在各自 `docker-compose.yml` 中设置 **`name: website`**,这样在 Docker Desktop 中会显示在同一 **website** 项目下。
|
||||
|
||||
---
|
||||
|
||||
## 已归入 website 的项目
|
||||
|
||||
| 项目名 | 容器名 | 端口 | 编排位置 | 说明 |
|
||||
|:---|:---|:--:|:---|:---|
|
||||
| 神射手 | website-shensheshou | **3117** | `开发/2、私域银行/神射手/docker-compose.yml` | 与玩值电竞同文件启动 |
|
||||
| 玩值电竞 Web | website-wanzhi-web | **3001** | 同上 | 同上 |
|
||||
| 卡若ai网站 | website-karuo-site | **3100** | `开发/3、自营项目/卡若ai网站/docker-compose.yml` | 独立编排,name: website |
|
||||
| 玩值大屏 | website-wz-screen | **3034** | `开发/3、自营项目/玩值大屏/docker-compose.yml` | 独立编排,name: website |
|
||||
| Soul 创业实验 | website-soul-book | **3000** | `开发/3、自营项目/一场soul的创业实验-react/docker-compose.yml` | 独立编排,name: website(本地) |
|
||||
|
||||
---
|
||||
|
||||
## 统一启动方式
|
||||
|
||||
- **神射手 + 玩值电竞**(主站):在神射手目录执行
|
||||
`docker compose up -d` 或 `docker compose up -d --build`
|
||||
两站会出现在 Docker Desktop 的 **website** 分组下。
|
||||
- **其他网站**:在各自项目目录执行
|
||||
`docker compose up -d`
|
||||
因已设置 `name: website`,容器同样会归在 **website** 分组下。
|
||||
|
||||
---
|
||||
|
||||
## 不归入 website 的 Docker 项目
|
||||
|
||||
以下为**非纯网站**(含数据库/中台/工具),保持各自 project name,不设为 website:
|
||||
|
||||
| 类型 | 项目示例 | 分组/说明 |
|
||||
|:---|:---|:---|
|
||||
| 数据库/中间件 | datacenter(MongoDB 等)、上帝之眼(MySQL/Redis/InfluxDB) | datacenter 或独立项目名 |
|
||||
| 完整业务系统 | 存客宝 cunkebao_v3(含前端+后端+DB) | 独立 compose |
|
||||
| 工具/隧道 | frp、clawdbot、工作手机 SDK | 独立 compose |
|
||||
|
||||
---
|
||||
|
||||
## 版本记录
|
||||
|
||||
| 日期 | 变更 |
|
||||
|:---|:---|
|
||||
| 2026-02-27 | 初版;列出神射手、玩值电竞、卡若ai网站、玩值大屏、Soul 创业实验;约定 name: website 统一归类 |
|
||||
@@ -170,3 +170,4 @@
|
||||
| 2026-02-26 16:43:05 | 成功 | 成功 | 🔄 卡若AI 同步 2026-02-26 16:41 | 更新:金仓、水桥平台对接、卡木、运营中枢工作台 | 排除 >20MB: 14 个 | [仓库](http://open.quwanzhi.com:3000/fnvtk/karuo-ai) [百科](http://open.quwanzhi.com:3000/fnvtk/karuo-ai/wiki) |
|
||||
| 2026-02-27 05:06:58 | 成功 | 成功 | 🔄 卡若AI 同步 2026-02-27 05:06 | 更新:水桥平台对接、运营中枢工作台 | 排除 >20MB: 14 个 | [仓库](http://open.quwanzhi.com:3000/fnvtk/karuo-ai) [百科](http://open.quwanzhi.com:3000/fnvtk/karuo-ai/wiki) |
|
||||
| 2026-02-27 05:21:57 | 成功 | 成功 | 🔄 卡若AI 同步 2026-02-27 05:21 | 更新:水桥平台对接、运营中枢工作台 | 排除 >20MB: 14 个 | [仓库](http://open.quwanzhi.com:3000/fnvtk/karuo-ai) [百科](http://open.quwanzhi.com:3000/fnvtk/karuo-ai/wiki) |
|
||||
| 2026-02-27 10:53:48 | 成功 | 成功 | 🔄 卡若AI 同步 2026-02-27 10:53 | 更新:Cursor规则、水桥平台对接、卡木、运营中枢工作台 | 排除 >20MB: 14 个 | [仓库](http://open.quwanzhi.com:3000/fnvtk/karuo-ai) [百科](http://open.quwanzhi.com:3000/fnvtk/karuo-ai/wiki) |
|
||||
|
||||
@@ -9,9 +9,9 @@
|
||||
| 分组 | 用途 | 编排位置 |
|
||||
|:---|:---|:---|
|
||||
| **datacenter** | 所有**数据库相关** Docker 服务(MongoDB、Redis、MySQL、向量库等) | 卡若AI `01_卡资(金)/金仓_存储备份/datacenter/docker-compose.yml`,或 数据中台 系统基座 |
|
||||
| **website** | 网站类服务(神射手、玩值电竞 Web 等),不在此分组内建数据库 | 神射手目录 `docker-compose.yml`(project name: website) |
|
||||
| **website** | 网站类服务(神射手、玩值电竞 Web、卡若ai网站、玩值大屏、Soul 创业实验等),不在此分组内建数据库 | 神射手目录 `docker-compose.yml`(主站);其余见 **`website分组清单.md`** |
|
||||
|
||||
以后新增数据库类服务一律放入 **datacenter** 分组;新增网站类服务放入 **website** 分组,通过外部网络 `datacenter_network` 连接 datacenter 内容器。
|
||||
以后新增数据库类服务一律放入 **datacenter** 分组;新增网站类服务放入 **website** 分组(各 compose 设 `name: website`),通过外部网络 `datacenter_network` 连接 datacenter 内容器。**全量 website 项目列表**见工作台 **`website分组清单.md`**。
|
||||
|
||||
---
|
||||
|
||||
@@ -55,3 +55,4 @@
|
||||
|:---|:---|
|
||||
| 2026-02-26 | 初始约定;website 仅含 shensheshou + wanzhi-web,统一连 datacenter_mongodb 27017 |
|
||||
| 2026-02-26 | 新增 datacenter 分组约定;所有数据库相关 Docker 项目归入 datacenter,website 通过 datacenter_network 连接 |
|
||||
| 2026-02-27 | website 分组扩展:卡若ai网站、玩值大屏、Soul 创业实验归入 website;详见 `website分组清单.md` |
|
||||
|
||||
@@ -28,7 +28,7 @@
|
||||
|
||||
- **启动某项目**:说「本地运行 玩值电竞App」「启动玩值电竞」等 → 走「本地项目启动」Skill,按上表路径与端口执行。
|
||||
- **新增/修改绑定**:在本表增改一行,并让该项目的 dev 脚本使用对应端口(如 Next.js:`next dev -p 端口`),再在 Skill 中补一句说明即可。
|
||||
- **Docker 网站**:玩值电竞 web 已并入 **website** 编排(与神射手同组),容器名 `website-wanzhi-web`,端口 **3001**;**唯一 MongoDB** 为 datacenter_mongodb(27017),见工作台 **`唯一MongoDB约定.md`**;不再新建 MongoDB。
|
||||
- **Docker 网站**:玩值电竞 web 已并入 **website** 编排(与神射手同组),容器名 `website-wanzhi-web`,端口 **3001**;**唯一 MongoDB** 为 datacenter_mongodb(27017),见工作台 **`唯一MongoDB约定.md`**;不再新建 MongoDB。**所有归入 website 的 Docker 网站项目**(神射手、玩值电竞、卡若ai网站、玩值大屏、Soul 创业实验等)见 **`website分组清单.md`**。
|
||||
- **Docker 部署时**:须遵守「唯一 MongoDB」与「容器分组」约定,执行前见 **本地项目启动** Skill 内「Docker 部署约定」一节。
|
||||
- **Docker 跑本地最新**:每次本地更新代码/内容后,要让 Docker 内跑的是最新文件,须在对应编排目录执行 **`docker compose up -d --build`**(如 website 在神射手目录)。否则容器内仍是旧镜像。**所有项目一律如此**。
|
||||
- **玩值电竞 部署/运行/访问**:**一律用 Docker 访问,不用 pnpm dev**。访问用 **http://localhost:3001**,数据库用唯一 **wanzhi_esports**(datacenter_mongodb 27017);启动、部署在神射手目录 `docker compose up -d --build`(更新后须带 `--build` 以同步本地最新)。回答此类问题时**须用卡若复盘格式**回复。
|
||||
@@ -44,3 +44,4 @@
|
||||
| 2026-02-26 | 玩值电竞 专注清单与番茄钟约定:卡若AI 开发时把工作时间以番茄钟记入 WebPomodoro |
|
||||
| 2026-02-26 | Docker 跑本地最新:更新后须 up -d --build;所有项目一致;注册表与 Skill 同步 |
|
||||
| 2026-02-26 | **约定**:每次项目/端口/启动或部署有变更时,须同步更新本表,保持本 doc 最新 |
|
||||
| 2026-02-27 | Docker 网站项目统一归入 website 分组,全量清单见 `website分组清单.md` |
|
||||
|
||||
Reference in New Issue
Block a user