Wei Zheng
Master student at Xidian University, Applied Cryptography, SE, PPFL
- Xi'an, China
- ResearchGate
- Github
- Google Scholar
- ORCID
You May Also Enjoy
Llm Inference Optimization
2 minute read
Published:
LLM 推理优化
Reinforcement Learning
less than 1 minute read
Published:
Reinforcement Learning 课程推荐
Basic Concepts In Rl
less than 1 minute read
Published:
State: 一个agent相当于environment的一个状态
State space:状态空间,即状态值的集合
Action:对于每个state的可能的动作
Action space of a state:动作集合,但依赖于状态
State transition:当采取一个action时,agent可以从一个state移动到另一个state,定义了agent和environment的一种交互行为,可以用表格表示存在的所有行为,但是只能表示确定性的情况,即deterministic
State transition probability:用概率描述state transition
Policy:策略,一般用$\pai$表示,策略可能是不确定的,stochastic,可以用表格表示
Reward:采取action之后所获得的一个实数,可以用正数代表encouragement,负数表示punishment;reward可以被理解成一种human-machine interface
Trajectory:a state-action-reward chain
Return:沿着trajectory所得到的所有reward总和,通过return可以评价哪个policy好
Discounted return:discount rate $\gamar$,折扣率,避免return发散掉,控制短视和远视
Episode:也是一个trajectory,有限步的,会stop,这样的任务也被称为episode tasks,有些任务是没有terminal states,意味着agent和environment的交互将会永远持续下去,这样的任务称为continuing tasks
Gpt4
less than 1 minute read
Published:
graph TD
Toolformer_MetaAI_02/2023 --> LLaMA_MetaAI_02/2023 --> VisualChatGPT_Microsoft_08/03/2023 --> GigaGAN_Adobe_09/03/2023 --> Alpaca_Stanford_13/03/2023 --> GPT4_OpenAI_14/03/2023 --> PALM的API_GoogleCloud_14/03/2023 --> Claude_Anthropic_14/03/2023 --> B轮融资3.5亿美元_Adapt.ai_14/03/2023 --> 第五代文生图模型_midjourney_15/03/2023 --> Copilot_Microsoft_16/03/2023