iBet uBet
web content aggregator. Adding the entire web to your favor.
Link to original content:
https://dblp.org/pid/258/4701.rss
dblp: Newton Cheng
https://dblp.org/pid/258/4701.html
dblp person page RSS feed
Thu, 25 Apr 2024 05:41:32 +0200
en-US
daily
1
released under the CC0 1.0 license
dblp@dagstuhl.de (dblp team)
dblp@dagstuhl.de (dblp team)
Computers/Computer_Science/Publications/Bibliographies
http://www.rssboard.org/rss-specification
https://dblp.org/img/logo.144x51.png
dblp: Newton Cheng
https://dblp.org/pid/258/4701.html
144
51
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
https://doi.org/10.48550/arXiv.2401.05566
Evan Hubinger
,
Carson Denison
,
Jesse Mu
,
Mike Lambert
,
Meg Tong
,
Monte MacDiarmid
,
Tamera Lanham
,
Daniel M. Ziegler
,
Tim Maxwell
,
Newton Cheng
,
Adam S. Jermyn
,
Amanda Askell
,
Ansh Radhakrishnan
,
Cem Anil
,
David Duvenaud
,
Deep Ganguli
,
Fazl Barez
,
Jack Clark
,
Kamal Ndousse
,
Kshitij Sachan
,
Michael Sellitto
,
Mrinank Sharma
,
Nova DasSarma
,
Roger Grosse
,
Shauna Kravec
,
Yuntao Bai
,
Zachary Witten
,
Marina Favaro
,
Jan Brauner
,
Holden Karnofsky
,
Paul F. Christiano
,
Samuel R. Bowman
,
Logan Graham
,
Jared Kaplan
,
Sören Mindermann
,
Ryan Greenblatt
,
Buck Shlegeris
,
Nicholas Schiefer
,
Ethan Perez
:
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
CoRR
abs/2401.05566
(
2024
)]]>
https://dblp.org/rec/journals/corr/abs-2401-05566
Mon, 01 Jan 2024 00:00:00 +0100
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning.
https://doi.org/10.48550/arXiv.2307.11768
Ansh Radhakrishnan
,
Karina Nguyen
,
Anna Chen
,
Carol Chen
,
Carson Denison
,
Danny Hernandez
,
Esin Durmus
,
Evan Hubinger
,
Jackson Kernion
,
Kamile Lukosiute
,
Newton Cheng
,
Nicholas Joseph
,
Nicholas Schiefer
,
Oliver Rausch
,
Sam McCandlish
,
Sheer El Showk
,
Tamera Lanham
,
Tim Maxwell
,
Venkatesa Chandrasekaran
,
Zac Hatfield-Dodds
,
Jared Kaplan
,
Jan Brauner
,
Samuel R. Bowman
,
Ethan Perez
:
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning.
CoRR
abs/2307.11768
(
2023
)]]>
https://dblp.org/rec/journals/corr/abs-2307-11768
Sun, 01 Jan 2023 00:00:00 +0100
Measuring Faithfulness in Chain-of-Thought Reasoning.
https://doi.org/10.48550/arXiv.2307.13702
Tamera Lanham
,
Anna Chen
,
Ansh Radhakrishnan
,
Benoit Steiner
,
Carson Denison
,
Danny Hernandez
,
Dustin Li
,
Esin Durmus
,
Evan Hubinger
,
Jackson Kernion
,
Kamile Lukosiute
,
Karina Nguyen
,
Newton Cheng
,
Nicholas Joseph
,
Nicholas Schiefer
,
Oliver Rausch
,
Robin Larson
,
Sam McCandlish
,
Sandipan Kundu
,
Saurav Kadavath
,
Shannon Yang
,
Thomas Henighan
,
Timothy Maxwell
,
Timothy Telleen-Lawton
,
Tristan Hume
,
Zac Hatfield-Dodds
,
Jared Kaplan
,
Jan Brauner
,
Samuel R. Bowman
,
Ethan Perez
:
Measuring Faithfulness in Chain-of-Thought Reasoning.
CoRR
abs/2307.13702
(
2023
)]]>
https://dblp.org/rec/journals/corr/abs-2307-13702
Sun, 01 Jan 2023 00:00:00 +0100
Towards Understanding Sycophancy in Language Models.
https://doi.org/10.48550/arXiv.2310.13548
Mrinank Sharma
,
Meg Tong
,
Tomasz Korbak
,
David Duvenaud
,
Amanda Askell
,
Samuel R. Bowman
,
Newton Cheng
,
Esin Durmus
,
Zac Hatfield-Dodds
,
Scott R. Johnston
,
Shauna Kravec
,
Timothy Maxwell
,
Sam McCandlish
,
Kamal Ndousse
,
Oliver Rausch
,
Nicholas Schiefer
,
Da Yan
,
Miranda Zhang
,
Ethan Perez
:
Towards Understanding Sycophancy in Language Models.
CoRR
abs/2310.13548
(
2023
)]]>
https://dblp.org/rec/journals/corr/abs-2310-13548
Sun, 01 Jan 2023 00:00:00 +0100
Specific versus General Principles for Constitutional AI.
https://doi.org/10.48550/arXiv.2310.13798
Sandipan Kundu
,
Yuntao Bai
,
Saurav Kadavath
,
Amanda Askell
,
Andrew Callahan
,
Anna Chen
,
Anna Goldie
,
Avital Balwit
,
Azalia Mirhoseini
,
Brayden McLean
,
Catherine Olsson
,
Cassie Evraets
,
Eli Tran-Johnson
,
Esin Durmus
,
Ethan Perez
,
Jackson Kernion
,
Jamie Kerr
,
Kamal Ndousse
,
Karina Nguyen
,
Nelson Elhage
,
Newton Cheng
,
Nicholas Schiefer
,
Nova DasSarma
,
Oliver Rausch
,
Robin Larson
,
Shannon Yang
,
Shauna Kravec
,
Timothy Telleen-Lawton
,
Thomas I. Liao
,
Tom Henighan
,
Tristan Hume
,
Zac Hatfield-Dodds
,
Sören Mindermann
,
Nicholas Joseph
,
Sam McCandlish
,
Jared Kaplan
:
Specific versus General Principles for Constitutional AI.
CoRR
abs/2310.13798
(
2023
)]]>
https://dblp.org/rec/journals/corr/abs-2310-13798
Sun, 01 Jan 2023 00:00:00 +0100
Topological Link Models of Multipartite Entanglement.
https://doi.org/10.22331/q-2022-06-20-741
Ning Bao
,
Newton Cheng
,
Sergio Hernández-Cuenca
,
Vincent Paul Su
:
Topological Link Models of Multipartite Entanglement.
Quantum
6
:
741
(
2022
)]]>
https://dblp.org/rec/journals/quantum/BaoCHS22
Sat, 01 Jan 2022 00:00:00 +0100
The Quantum Entropy Cone of Hypergraphs.
https://arxiv.org/abs/2002.05317
Ning Bao
,
Newton Cheng
,
Sergio Hernández-Cuenca
,
Vincent Paul Su
:
The Quantum Entropy Cone of Hypergraphs.
CoRR
abs/2002.05317
(
2020
)]]>
https://dblp.org/rec/journals/corr/abs-2002-05317
Wed, 01 Jan 2020 00:00:00 +0100