Blog posts

State：一个agent相当于environment的一个状态
State space：状态空间，即状态值的集合
Action：对于每个state的可能的动作
Action space of a state：动作集合，但依赖于状态
State transition：当采取一个action时，agent可以从一个state移动到另一个state，定义了agent和environment的一种交互行为，可以用表格表示存在的所有行为，但是只能表示确定性的情况，即deterministic
State transition probability：用概率描述state transition
Policy：策略，一般用$\pai$表示，策略可能是不确定的，stochastic，可以用表格表示
Reward：采取action之后所获得的一个实数，可以用正数代表encouragement，负数表示punishment；reward可以被理解成一种human-machine interface
Trajectory：a state-action-reward chain
Return：沿着trajectory所得到的所有reward总和，通过return可以评价哪个policy好
Discounted return：discount rate $\gamar$，折扣率，避免return发散掉，控制短视和远视
Episode：也是一个trajectory，有限步的，会stop，这样的任务也被称为episode tasks，有些任务是没有terminal states，意味着agent和environment的交互将会永远持续下去，这样的任务称为continuing tasks

Gpt4

less than 1 minute read

Published: November 24, 2023

graph TD
	Toolformer_MetaAI_02/2023 --> LLaMA_MetaAI_02/2023 --> VisualChatGPT_Microsoft_08/03/2023 --> GigaGAN_Adobe_09/03/2023 --> Alpaca_Stanford_13/03/2023 --> GPT4_OpenAI_14/03/2023 --> PALM的API_GoogleCloud_14/03/2023 --> Claude_Anthropic_14/03/2023 --> B轮融资3.5亿美元_Adapt.ai_14/03/2023 --> 第五代文生图模型_midjourney_15/03/2023 --> Copilot_Microsoft_16/03/2023

Chatgpt&instructgpt

less than 1 minute read

Published: November 23, 2023

1、ChatGPT有安全机制
2、ChatGPT能够理解上下文，大约能记住8000词（GPT-4现在达到了25000词）
3、ChatGPT能够理解自己的局限性

Gpt

less than 1 minute read

Published: November 22, 2023

回顾一下GPT系列的论文~

Clip

less than 1 minute read

Published: November 21, 2023

CLIP是OpenAI在2021年1月份发布的一个多模态模型，同时还发布了另一个模型是DALL-E。但CLIP和DALL-E有本质的区别，CLIP是是用文本作为监督信号来训练可迁移的视觉模型，DALL-E是基于文本来生成图像的模型。

Leetcode1_twosum_2_twoadd

less than 1 minute read

Published: April 24, 2023

对于LeetCode第一道题目两数之和这道题，我们有如下思考：
对于一个元素nums[i]，我们需要知道是否存在另一个元素nums[j]的值为target - nums[i]，我们用一个哈希表记录每个元素的值到索引的映射，这样就能快速判断数组中是否有一个值为target - nums[i]的元素了。

Chatgpt&newbing

less than 1 minute read

Published: February 22, 2023

一、ChatGPT申请过程

首先要保证科学上网的条件（代理）
然后是注册google账号
还要注册一个国外手机号：注册及接收验证码过程
在谷歌浏览器上收一次验证码就行了，之后可以重复登录！

2022

Simclr

5 minute read

Published: June 14, 2022

Self-Supervised Learning，又称为自监督学习，一般机器学习分为有监督学习，无监督学习和强化学习。而 Self-Supervised Learning 是无监督学习里面的一种，主要是希望能够学习到一种通用的特征表达用于下游任务 (Downstream Tasks)。其主要的方式就是通过自己监督自己。首先是 kaiming 的 MoCo 引发一波热议， Yann Lecun也在 AAAI 上讲 Self-Supervised Learning 是未来的大势所趋。

2015

Blog Post number 4

less than 1 minute read

Published: August 14, 2015

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

2014

Blog Post number 3

less than 1 minute read

Published: August 14, 2014

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

2013

Blog Post number 2

less than 1 minute read

Published: August 14, 2013

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

2012

Blog Post number 1

less than 1 minute read

Published: August 14, 2012

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Wei Zheng

Blog posts

2199

2025

2024

RFID 信号反射特性分析：结合 RCS 和标签阻抗

2023

Reinforcement Learning 课程推荐

回顾一下GPT系列的论文~

一、ChatGPT申请过程

2022

2015

2014

2013

2012