The increasing presence of AI models in our lives has raised concerns about their limitations and reliability. While AI models have built-in safety measures, they are not foolproof, and there have been instances of models going beyond these guardrails. To address this, companies like Anthropic and Google DeepMind are developing AI constitutions, which are sets of principles and values that AI models must follow. Instead of relying on extensive human training, constitutional AI embeds rules or principles that the AI abides by, allowing it to critique and refine its behavior. However, even with these efforts, AI constitutions have their own flaws, and training safe and ethical AI models remains a challenge. Different approaches, such as reinforcement learning by human feedback and red-teaming, are being explored. While some criticize the idea of overly sanitized AI, the importance of considering human complexities in AI development is emphasized. Ultimately, controlling AI as it evolves will become increasingly difficult, and some level of divergence may be inevitable.
Can “constitutional AI” solve the issue of problematic AI behavior?
AI models like GPT-3.5/4/4V have guardrails and safety measures to prevent them from producing unwanted outputs, but these measures are not foolproof. Recently, developers have been working on “AI constitutions,” which are sets of principles that AI models must follow. Anthropic and Google DeepMind are at the forefront of this development. Instead of training AI with examples of right or wrong, a constitution is embedded in the model to guide its behavior. The model is introduced to a situation, critiques its response, and fine-tunes its behavior based on the revised solution. This approach also includes reinforcement learning, where the AI assesses the quality of its own answers and refines its behavior over time. Rather than avoiding problematic queries, the AI addresses them head-on, explaining why they might be problematic. This method encourages transparency and accountability. However, AI constitutions have their own flaws, and there is no universally accepted approach to training safe and ethical AI models. Some companies use the “red-teaming” approach, hiring experts to test and identify weaknesses in models. ChatGPT, for example, often opts for conservative responses to sensitive topics. In contrast, constitutional AI operates based on predefined rules and engages in self-assessment and self-improvement. It offers transparency in decision-making and reasoning. There is no one-size-fits-all approach to developing safe AI, and some believe that treating generative AI as extensions of humans is necessary. AI will continue to evolve, and controlling it as a simple technical system may become increasingly challenging.