Anthropic kills prompt engineering | Canada launches AI safety institute | (# 1)

Chat GPT struggling with memorization and other news and research updates in this week in AI advancement, safety, and policy.

Equivariant labs

Nov 18, 2024

Hey! Welcome to Newsletter #1. Please head over to the section you are interested in.

AI advancement
AI Governance and policy
AI safety

AI advancement

Product

Death of prompt engineering?
Anthropic adds a Prompt improver in Anthropic Console that can improve your prompt using the best prompt engineering techniques so you won’t need to hire 300kUSD/year salaried prompt engineers anymore. It can add chain-of-thought reasoning, augment new examples, as well as XML format standardization. Read more here.
Earlier this month Anthropic introduced computer use in their updated models. It can do tasks on your computer now, so if you had a tiresome task on some excel sheet that you have been postponing forever, just wait a little bit more. The model they claim is far from accurate as of now but they are rapidly making progress. Read more here.
Open AI entered the search engine market two weeks ago with ChatGPT search. The Internet is now filled with obvious comparisons between Perplexity and ChatGPT search. Which one do you think is better?

Research

The internet this week has been filled with debates whether AI is slowing down and it is end of scaling as the key to AGI. The article published on the Information being at centre of discussion claiming that OpenAI new model Orion is showing diminishing returns on scaling, forcing researchers to look for a different architecture or post-learning training to stay on path to AGI.

Open AI open-sourced Simple QA, a new dataset to benchmark the factual correctness of the LLMs. It is meant to challenge LLM’s tendency to hallucinate. It is a challenging dataset with the best of Open AI model struggling and scoring less than 40%. Frontier models have a new benchmark to claim fame for.

AI Governance and policy

Canada launched a dedicated AI safety institute with a commitment to safe development and deployment. It announced an initial budget of $50 million CAD over five years. It will follow the roadmap set in International Scientific Report on the Safety of Advanced AI, which was chaired by Canada’s own godfather of AI, Yoshua Bengio.
This comes following the suite of UK, US, and Japan that launched similar institutes starting with UK safety institute in April 2023.
Open AI comes to France. In addition to positioning itself as a first-choice AI client in the European startup ecosystem and benefiting from access to European AI talent, it could very well be the strategic move to be close to the conversations on AI regulations in the European Union.

“Having a presence in France also allows us to strengthen our collaboration with the French government, ensuring that the benefits of AI are broadly and responsibly shared. In September, we signed the core commitments of the EU AI Pact, which are aligned with our mission to provide safe, cutting-edge technologies that provide value to everyone. “

Existential Risk Observatory called for an international treaty on AI safety in an article posted in Times magazine. The approach echoed the conditional if.. then.. approach that has been suggested as the best approach moving forward in the policy paper with co-authors like Nobel laureate Geoffrey Hinton and Turing award winner Yoshua Bengio. See the full policy paper here.
It is a similar approach that the Anthropic has adopted internally.
If the model crosses set threshold then stop deployment
This gives a more balanced approach to AI than the Pause AI movement that many believe is unrealistic.

AI safety

Memorization by language model continues to be the privacy hell for LLMs. New research shows that chatGPT produces parts of an article from the training dataset when prompted with the start of articles. Some of these articles were behind paywall which may end up having Open AI trouble.
- In earlier work, they had introduced another attack, also known as poem poem poem attack, where they asked ChatGPT to repeat a word forever, after some time it started producing arbitrary text and 3% of the text was verbatim. This may not sound like a big number but when it comes to models possibly revealing sensitive information, this could amount to huge damage.
Listen to Neel Nanda giving an overview of Gemma Scope. It provides researchers with a suite of sparse autoencoders for examination of the features and representations learned by Gemma 2”

See the full demo here. It is great to see Mechanistic Interpretability tools becoming more broadly accessible. We will discuss about this more in a future post.