Monday, December 05, 2022

AI research laboratory OpenAI has announced ChatGPT, an AI-based conversational chat interface based on the GPT-3 family of large language models, which is designed to understand and respond to natural human language. GPT-3, or the third generation Generative Pre-trained Transformer, is a neural network machine learning model that specialises in producing human-like written text.

ChatGPT is capable of generating detailed, human-like written text and can remember earlier conversations for context. "The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests," OpenAI said in a blogpost.

ChatGPT is a sibling model to InstructGPT, which follows instructions and prompts and provides detailed responses. The model was trained using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup, said the blogpost.

To train AI, RLHF uses a reward/punishment system. Whenever the AI takes action, it is categorised as desirable or punishable. A desired action is rewarded while an undesired one is punished. The AI thus processes a large subset of data and learns like humans, through trial and error.

For ChatGPT, OpenAI used "supervised fine-tuning”, as per a report in Moneycontrol. The AI was trained through conversations by dedicated human AI trainers, who played both the user and an AI assistant, and the trainers were given suggestions when providing responses.

However, the company said that the AI chat interface comes with many limitations and they plan to make regular model updates to improve in such areas. "Users are encouraged to provide feedback on problematic model outputs through the UI, as well as on false positives/negatives from the external content filter which is also part of the interface," said OpenAI.

"We are particularly interested in feedback regarding harmful outputs that could occur in real-world, non-adversarial conditions, as well as feedback that helps us uncover and understand novel risks and possible mitigations," it added.

Currently, ChatGPT has invited users to get feedback and learn about its strengths and weaknesses. During the research preview, usage of ChatGPT will be free, according to the blogpost.

