hackernews summarize telegram bot的编写踩坑记录

项目地址：https://github.com/bdim404/HackerNews-Summarize-Telegram-Bot

欢迎各位大佬来进行pr，我也将根据本项目继续深入学习。

首先学习到的是python写telegram bot的最基本的库：python-telegram-bot

使用的最基本的用法是：

1
2
3
4
5


#引入我们在编写时telegram的库
from telegram import Update
from telegram.ext import ApplicationBuilder, CommandHandler, ContextTypes

#编写按钮的函数要在def前加async，在执行返回信息的步骤里面需要在用法前加await

1
2


$ # This installs the latest stable release
$ pip install python-telegram-bot --upgrade

这是我学习使用python写bot的第一个最简单的模版，来自https://python-telegram-bot.org/

1
2
3
4
5
6
7
8
9


from telegram import Update
from telegram.ext import ApplicationBuilder, CommandHandler, ContextTypes

async def hello(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    await update.message.reply_text(f'Hello {update.effective_user.first_name}')

app = ApplicationBuilder().token("YOUR TOKEN HERE").build()
app.add_handler(CommandHandler("hello", hello))
app.run_polling()

第一个借鉴的是timerbothttps://docs.python-telegram-bot.org/en/v20.6/examples.timerbot.html 。在 timerbot.py 这个程序里面，我发现它还可以在telegrm定义按钮的函数里面写一些逻辑进去：

1
2
3
4
5
6


async def unset(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    """Remove the job if the user changed their mind."""
    chat_id = update.message.chat_id
    job_removed = remove_job_if_exists(str(chat_id), context)
    text = "Timer successfully cancelled!" if job_removed else "You have no active timer."
    await update.message.reply_text(text)

此外，telegram这个库所提供的功能十分强大，拥有很多用法，包括针对bot接收到信息进行“预处理（我的理解是这样的）”等等，~~以及async这个定义telegram函数的方式，~~ 它通过await这个来进行传参，（~~我还不知道这种东西应该叫方法还是啥~~）哦知道了，async和await都是使用异步的方法来定义函数，其实和本身telegarm这个库没有关系….

Echobot.py （https://docs.python-telegram-bot.org/en/v20.6/examples.echobot.html）看这个example我才知道用户发送给机器人的内容会存储到update.message.text这个变量里面。

1
2
3


async def echo(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    """Echo the user message."""
    await update.message.reply_text(update.message.text)

了解了一下logging这个库（https://docs.python.org/zh-cn/3/howto/logging.html），主要是用来记录报错以及各种日志：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


# myapp.py
import logging
import mylib

def main():
    logging.basicConfig(filename='myapp.log', level=logging.INFO)
    logging.info('Started')
    mylib.do_something()
    logging.info('Finished')

if __name__ == '__main__':
    main()

如果你运行 myapp.py ，你应该在 myapp.log 中看到：

1
2
3


INFO:root:Started
INFO:root:Doing something
INFO:root:Finished

我参考了一下一位大佬写的telegram-ChatGPTbot，他将API和bot的token都放到了.env这个配置文件里面，使用了一个dotenv这个库然后将该文件中的配置文件加载到需要用的文件中。

1
2
3
4
5
6


# Read .env file
load_dotenv()
#这样就读取了.env的文件内容，后续将里面的内容赋值的时候只需要
#使用os.environ.get()或者os.environ[]这个方法即可。
        'api_key': os.environ['OPENAI_API_KEY'],
        'show_usage': os.environ.get('SHOW_USAGE', 'false').lower() == 'true',

我在summarize-bot 1.0中并没有使用.env这种方法来加载配置文件，但是考虑到设定权限以及方便代码迭代的管理，后面还是会使用的。

放一个fernvenuevenue老师给我的使用html2text黑魔法：

1
2
3
4
5


h2t = html2text.HTML2Text()
h2t.ignore_tables = True
h2t.ignore_images = True
h2t.google_doc = True
h2t.ignore_links = True

由于我的机器人是将内容送去OpenAI做处理的，所以也学习了OpenAI API的使用规范

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")

completion = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]
)
print(completion.choices[0].message)

10月24日更新学习使用了urllib.parse这个库，用来校验链接是否来自readnews这个链接，防止bot被滥用。

1
2
3
4


#验证域名是否来自redhacker.news
def is_valid_link(link):
    parsed_url = urllib.parse.urlparse(link)
    return parsed_url.netloc == "readhacker.news"

以及pipreqs这个库，用来帮助将文件中所有py文件所需要的依赖/库都写入到requirements.txt文件中去。

Example

1
2


$ pipreqs /home/project/location
Successfully saved requirements file in /home/project/location/requirements.txt

10月29日更新

学习AsyncIO & Asynchronous Programming in Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


improt asyncio

async def main():
  task = asuncio.creat_task(other_function())
  print("a")
  await asyncio.sleep(1)
  print("b")
  
async def other_function():
  print("1")
  await asyncio.sleep(2)
  print("2")

也就是在python中的异步，在通过：

1

task = asuncio.creat_task(other_function())

创建任务之后，other_function()将在main()函数休息的时候来同步执行，但它会在第一个函数结束的时候被终止。如果想等第二个函数执行完毕，则需要加入：

1

await task

等待task的结束再结束函数。