-
Characterisation of the Warm-Jupiter TOI-1130 system with CHEOPS and photo-dynamical approach
Authors:
L. Borsato,
D. Degen,
A. Leleu,
M. J. Hooton,
J. A. Egger,
A. Bekkelien,
A. Brandeker,
A. Collier Cameron,
M. N. Günther,
V. Nascimbeni,
C. M. Persson,
A. Bonfanti,
T. G. Wilson,
A. C. M. Correia,
T. Zingales,
T. Guillot,
A. H. M. J. Triaud,
G. Piotto,
D. Gandolfi,
L. Abe,
Y. Alibert,
R. Alonso,
T. Bárczy,
D. Barrado Navascues,
S. C. C. Barros
, et al. (71 additional authors not shown)
Abstract:
Among the thousands of exoplanets discovered to date, approximately a few hundred gas giants on short-period orbits are classified as "lonely" and only a few are in a multi-planet system with a smaller companion on a close orbit. The processes that formed multi-planet systems hosting gas giants on close orbits are poorly understood, and only a few examples of this kind of system have been observed…
▽ More
Among the thousands of exoplanets discovered to date, approximately a few hundred gas giants on short-period orbits are classified as "lonely" and only a few are in a multi-planet system with a smaller companion on a close orbit. The processes that formed multi-planet systems hosting gas giants on close orbits are poorly understood, and only a few examples of this kind of system have been observed and well characterised. Within the contest of multi-planet system hosting gas-giant on short orbits, we characterise TOI-1130 system by measuring masses and orbital parameters. This is a 2-transiting planet system with a Jupiter-like planet (c) on a 8.35 days orbit and a Neptune-like planet (b) on an inner (4.07 days) orbit. Both planets show strong anti-correlated transit timing variations (TTVs). Furthermore, radial velocity (RV) analysis showed an additional linear trend, a possible hint of a non-transiting candidate planet on a far outer orbit. Since 2019, extensive transit and radial velocity observations of the TOI-1130 have been acquired using TESS and various ground-based facilities. We present a new photo-dynamical analysis of all available transit and RV data, with the addition of new CHEOPS and ASTEP+ data that achieve the best precision to date on the planetary radii and masses and on the timings of each transit. We were able to model interior structure of planet b constraining the presence of a gaseous envelope of H/He, while it was not possible to assess the possible water content. Furthermore, we analysed the resonant state of the two transiting planets, and we found that they lie just outside the resonant region. This could be the result of the tidal evolution that the system underwent. We obtained both masses of the planets with a precision less than 1.5%, and radii with a precision of about 1% and 3% for planet b and c, respectively.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations
Authors:
Hao Yang,
Hongyuan Lu,
Xinhua Zeng,
Yang Liu,
Xiang Zhang,
Haoran Yang,
Yumeng Zhang,
Shan Huang,
Yiran Wei,
Wai Lam
Abstract:
In the rapidly evolving field of natural language processing, dialogue systems primarily employ a single-step dialogue paradigm. Although this paradigm is efficient, it lacks the depth and fluidity of human interactions and does not appear natural. We introduce a novel \textbf{Step}-by-Step Dialogue Paradigm (Stephanie), designed to mimic the ongoing dynamic nature of human conversations. By emplo…
▽ More
In the rapidly evolving field of natural language processing, dialogue systems primarily employ a single-step dialogue paradigm. Although this paradigm is efficient, it lacks the depth and fluidity of human interactions and does not appear natural. We introduce a novel \textbf{Step}-by-Step Dialogue Paradigm (Stephanie), designed to mimic the ongoing dynamic nature of human conversations. By employing a dual learning strategy and a further-split post-editing method, we generated and utilized a high-quality step-by-step dialogue dataset to fine-tune existing large language models, enabling them to perform step-by-step dialogues. We thoroughly present Stephanie. Tailored automatic and human evaluations are conducted to assess its effectiveness compared to the traditional single-step dialogue paradigm. We will release code, Stephanie datasets, and Stephanie LLMs to facilitate the future of chatbot eras.
△ Less
Submitted 12 July, 2024; v1 submitted 4 July, 2024;
originally announced July 2024.
-
Unveiling the internal structure and formation history of the three planets transiting HIP 29442 (TOI-469) with CHEOPS
Authors:
J. A. Egger,
H. P. Osborn,
D. Kubyshkina,
C. Mordasini,
Y. Alibert,
M. N. Günther,
M. Lendl,
A. Brandeker,
A. Heitzmann,
A. Leleu,
M. Damasso,
A. Bonfanti,
T. G. Wilson,
S. G. Sousa,
J. Haldemann,
L. Delrez,
M. J. Hooton,
T. Zingales,
R. Luque,
R. Alonso,
J. Asquier,
T. Bárczy,
D. Barrado Navascues,
S. C. C. Barros,
W. Baumjohann
, et al. (69 additional authors not shown)
Abstract:
Multiplanetary systems spanning the radius valley are ideal testing grounds for exploring the proposed explanations for the observed bimodality in the radius distribution of close-in exoplanets. One such system is HIP 29442 (TOI-469), an evolved K0V star hosting two super-Earths and a sub-Neptune. We observe HIP 29442 with CHEOPS for a total of 9.6 days, which we model jointly with 2 sectors of TE…
▽ More
Multiplanetary systems spanning the radius valley are ideal testing grounds for exploring the proposed explanations for the observed bimodality in the radius distribution of close-in exoplanets. One such system is HIP 29442 (TOI-469), an evolved K0V star hosting two super-Earths and a sub-Neptune. We observe HIP 29442 with CHEOPS for a total of 9.6 days, which we model jointly with 2 sectors of TESS data to derive planetary radii of $3.410\pm0.046$, $1.551\pm0.045$ and $1.538\pm0.049$ R$_\oplus$ for planets b, c and d, which orbit HIP 29442 with periods of 13.6, 3.5 and 6.4 days. For planet d, this value deviates by more than 3 sigma from the median value reported in the discovery paper, leading us to conclude that caution is required when using TESS photometry to determine the radii of small planets with low per-transit S/N and large gaps between observations. Given the high precision of these new radii, combining them with published RVs from ESPRESSO and HIRES provides us with ideal conditions to investigate the internal structure and formation pathways of the planets in the system. We introduce the publicly available code plaNETic, a fast and robust neural network-based Bayesian internal structure modelling framework. We then apply hydrodynamic models to explore the upper atmospheric properties of these inferred structures. Finally, we identify planetary system analogues in a synthetic population generated with the Bern model for planet formation and evolution. Based on this analysis, we find that the planets likely formed on opposing sides of the water iceline from a protoplanetary disk with an intermediate solid mass. We finally report that the observed parameters of the HIP 29442 system are compatible with both a scenario where the second peak in the bimodal radius distribution corresponds to sub-Neptunes with a pure H/He envelope as well as a scenario with water-rich sub-Neptunes.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning
Authors:
Sen Yang,
Leyang Cui,
Deng Cai,
Xinting Huang,
Shuming Shi,
Wai Lam
Abstract:
Iterative preference learning, though yielding superior performances, requires online annotated preference labels. In this work, we study strategies to select worth-annotating response pairs for cost-efficient annotation while achieving competitive or even better performances compared with the random selection baseline for iterative preference learning. Built on assumptions regarding uncertainty a…
▽ More
Iterative preference learning, though yielding superior performances, requires online annotated preference labels. In this work, we study strategies to select worth-annotating response pairs for cost-efficient annotation while achieving competitive or even better performances compared with the random selection baseline for iterative preference learning. Built on assumptions regarding uncertainty and distribution shifts, we propose a comparative view to rank the implicit reward margins as predicted by DPO to select the response pairs that yield more benefits. Through extensive experiments, we show that annotating those response pairs with small margins is generally better than large or random, under both single- and multi-iteration scenarios. Besides, our empirical results suggest allocating more annotation budgets in the earlier iterations rather than later across multiple iterations.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
New $^{63}$Ga(p,$γ$)$^{64}$Ge and $^{64}$Ge(p,$γ$)$^{65}$As reaction rates corresponding to the temperature regime of thermonuclear X-ray bursts
Authors:
Ning Lu,
Yi Hua Lam,
Alexander Heger,
Zi Xin Liu,
Hidetoshi Yamaguchi
Abstract:
We compute the $^{63}$Ga(p,$γ$)$^{64}$Ge and $^{64}$Ge(p,$γ$)$^{65}$As thermonuclear reaction rates using the latest experimental input supplemented with theoretical nuclear spectroscopic information. The experimental input consists of the latest proton thresholds of $^{64}$Ge and $^{65}$As, and the nuclear spectroscopic information of $^{65}$As, whereas the theoretical nuclear spectroscopic infor…
▽ More
We compute the $^{63}$Ga(p,$γ$)$^{64}$Ge and $^{64}$Ge(p,$γ$)$^{65}$As thermonuclear reaction rates using the latest experimental input supplemented with theoretical nuclear spectroscopic information. The experimental input consists of the latest proton thresholds of $^{64}$Ge and $^{65}$As, and the nuclear spectroscopic information of $^{65}$As, whereas the theoretical nuclear spectroscopic information for $^{64}$Ge and $^{65}$As are deduced from the full pf-shell space configuration-interaction shell-model calculations with the GXPF1A Hamiltonian. Both thermonuclear reaction rates are determined with known uncertainties at the energies that correspond to the Gamow windows of the temperature regime relevant to Type I X-ray bursts, covering the typical temperature range of the thermonuclear runaway of the GS 1826$-$24 periodic bursts and SAX J1808.4$-$3658 photospheric radius expansion bursts.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
On the Worst Prompt Performance of Large Language Models
Authors:
Bowen Cao,
Deng Cai,
Zhisong Zhang,
Yuexian Zou,
Wai Lam
Abstract:
The performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts, which raises significant concerns about their reliability in real-world scenarios. Existing studies often divide prompts into task-level instructions and case-level inputs and primarily focus on evaluating and improving robustness against variations in tasks-level instructions. However, this setup fail…
▽ More
The performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts, which raises significant concerns about their reliability in real-world scenarios. Existing studies often divide prompts into task-level instructions and case-level inputs and primarily focus on evaluating and improving robustness against variations in tasks-level instructions. However, this setup fails to fully address the diversity of real-world user queries and assumes the existence of task-specific datasets. To address these limitations, we introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries and emphasizes the importance of using the worst prompt performance to gauge the lower bound of model performance. Extensive experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance; for instance, a difference of 45.48% between the worst and best performance for the Llama-2-70B-chat model, with its worst performance dipping as low as 9.38%. We further illustrate the difficulty in identifying the worst prompt from both model-agnostic and model-dependent perspectives, emphasizing the absence of a shortcut to characterize the worst prompt. We also attempt to enhance the worst prompt performance using existing prompt engineering and prompt consistency methods, but find that their impact is limited. These findings underscore the need to create more resilient LLMs that can maintain high performance across diverse prompts. Data and code are available at https://github.com/cbwbuaa/On-the-Worst-Prompt- Performance-of-LLMs.
△ Less
Submitted 21 June, 2024; v1 submitted 8 June, 2024;
originally announced June 2024.
-
Space of circle patterns on tori and its symplectic form
Authors:
Wai Yeung Lam
Abstract:
We consider circle patterns on closed tori equipped with complex projective structures. There is an embedding of the space of circle patterns to the Teichmüller space of a punctured surface. Via the embedding, the Weil-Petersson symplectic form is pulled back to the space of circle patterns. We investigate its non-degeneracy. On the other hand, we also complete a conjecture that the space of circl…
▽ More
We consider circle patterns on closed tori equipped with complex projective structures. There is an embedding of the space of circle patterns to the Teichmüller space of a punctured surface. Via the embedding, the Weil-Petersson symplectic form is pulled back to the space of circle patterns. We investigate its non-degeneracy. On the other hand, we also complete a conjecture that the space of circle patterns is homeomorphic to the Teichmüller space of the closed torus.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
The PLATO Mission
Authors:
Heike Rauer,
Conny Aerts,
Juan Cabrera,
Magali Deleuil,
Anders Erikson,
Laurent Gizon,
Mariejo Goupil,
Ana Heras,
Jose Lorenzo-Alvarez,
Filippo Marliani,
Cesar Martin-Garcia,
J. Miguel Mas-Hesse,
Laurence O'Rourke,
Hugh Osborn,
Isabella Pagano,
Giampaolo Piotto,
Don Pollacco,
Roberto Ragazzoni,
Gavin Ramsay,
Stéphane Udry,
Thierry Appourchaux,
Willy Benz,
Alexis Brandeker,
Manuel Güdel,
Eduardo Janot-Pacheco
, et al. (801 additional authors not shown)
Abstract:
PLATO (PLAnetary Transits and Oscillations of stars) is ESA's M3 mission designed to detect and characterise extrasolar planets and perform asteroseismic monitoring of a large number of stars. PLATO will detect small planets (down to <2 R_(Earth)) around bright stars (<11 mag), including terrestrial planets in the habitable zone of solar-like stars. With the complement of radial velocity observati…
▽ More
PLATO (PLAnetary Transits and Oscillations of stars) is ESA's M3 mission designed to detect and characterise extrasolar planets and perform asteroseismic monitoring of a large number of stars. PLATO will detect small planets (down to <2 R_(Earth)) around bright stars (<11 mag), including terrestrial planets in the habitable zone of solar-like stars. With the complement of radial velocity observations from the ground, planets will be characterised for their radius, mass, and age with high accuracy (5 %, 10 %, 10 % for an Earth-Sun combination respectively). PLATO will provide us with a large-scale catalogue of well-characterised small planets up to intermediate orbital periods, relevant for a meaningful comparison to planet formation theories and to better understand planet evolution. It will make possible comparative exoplanetology to place our Solar System planets in a broader context. In parallel, PLATO will study (host) stars using asteroseismology, allowing us to determine the stellar properties with high accuracy, substantially enhancing our knowledge of stellar structure and evolution.
The payload instrument consists of 26 cameras with 12cm aperture each. For at least four years, the mission will perform high-precision photometric measurements. Here we review the science objectives, present PLATO's target samples and fields, provide an overview of expected core science performance as well as a description of the instrument and the mission profile at the beginning of the serial production of the flight cameras. PLATO is scheduled for a launch date end 2026. This overview therefore provides a summary of the mission to the community in preparation of the upcoming operational phases.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
CHEOPS in-flight performance: A comprehensive look at the first 3.5 years of operations
Authors:
A. Fortier,
A. E. Simon,
C. Broeg,
G. Olofsson,
A. Deline,
T. G. Wilson,
P. F. L. Maxted,
A. Brandeker,
A. Collier Cameron,
M. Beck,
A. Bekkelien,
N. Billot,
A. Bonfanti,
G. Bruno,
J. Cabrera,
L. Delrez,
B. -O. Demory,
D. Futyan,
H. -G. Florén,
M. N. Günther,
A. Heitzmann,
S. Hoyer,
K. G. Isaak,
S. G. Sousa,
M. Stalport
, et al. (106 additional authors not shown)
Abstract:
CHEOPS is a space telescope specifically designed to monitor transiting exoplanets orbiting bright stars. In September 2023, CHEOPS completed its nominal mission and remains in excellent operational conditions. The mission has been extended until the end of 2026. Scientific and instrumental data have been collected throughout in-orbit commissioning and nominal operations, enabling a comprehensive…
▽ More
CHEOPS is a space telescope specifically designed to monitor transiting exoplanets orbiting bright stars. In September 2023, CHEOPS completed its nominal mission and remains in excellent operational conditions. The mission has been extended until the end of 2026. Scientific and instrumental data have been collected throughout in-orbit commissioning and nominal operations, enabling a comprehensive analysis of the mission's performance. In this article, we present the results of this analysis with a twofold goal. First, we aim to inform the scientific community about the present status of the mission and what can be expected as the instrument ages. Secondly, we intend for this publication to serve as a legacy document for future missions, providing insights and lessons learned from the successful operation of CHEOPS. To evaluate the instrument performance in flight, we developed a comprehensive monitoring and characterisation programme. It consists of dedicated observations that allow us to characterise the instrument's response. In addition to the standard collection of nominal science and housekeeping data, these observations provide input for detecting, modelling, and correcting instrument systematics, discovering and addressing anomalies, and comparing the instrument's actual performance with expectations. The precision of the CHEOPS measurements has enabled the mission objectives to be met and exceeded. Careful modelling of the instrumental systematics allows the data quality to be significantly improved during the light curve analysis phase, resulting in more precise scientific measurements. CHEOPS is compliant with the driving scientific requirements of the mission. Although visible, the ageing of the instrument has not affected the mission's performance.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
HIP 41378 observed by CHEOPS: Where is planet d?
Authors:
S. Sulis,
L. Borsato,
S. Grouffal,
H. P. Osborn,
A. Santerne,
A. Brandeker,
M. N. Günther,
A. Heitzmann,
M. Lendl,
M. Fridlund,
D. Gandolfi,
Y. Alibert,
R. Alonso,
T. Bárczy,
D. Barrado Navascues,
S. C. Barros,
W. Baumjohann,
T. Beck,
W. Benz,
M. Bergomi,
N. Billot,
A. Bonfanti,
C. Broeg,
A. Collier Cameron,
C. Corral van Damme
, et al. (62 additional authors not shown)
Abstract:
HIP 41378 d is a long-period planet that has only been observed to transit twice, three years apart, with K2. According to stability considerations and a partial detection of the Rossiter-McLaughlin effect, $P_\mathrm{d} = 278.36$ d has been determined to be the most likely orbital period. We targeted HIP 41378 d with CHEOPS at the predicted transit timing based on $P_\mathrm{d}= 278.36$ d, but th…
▽ More
HIP 41378 d is a long-period planet that has only been observed to transit twice, three years apart, with K2. According to stability considerations and a partial detection of the Rossiter-McLaughlin effect, $P_\mathrm{d} = 278.36$ d has been determined to be the most likely orbital period. We targeted HIP 41378 d with CHEOPS at the predicted transit timing based on $P_\mathrm{d}= 278.36$ d, but the observations show no transit. We find that large ($>22.4$ hours) transit timing variations (TTVs) could explain this non-detection during the CHEOPS observation window. We also investigated the possibility of an incorrect orbital solution, which would have major implications for our knowledge of this system. If $P_\mathrm{d} \neq 278.36$ d, the periods that minimize the eccentricity would be $101.22$ d and $371.14$ d. The shortest orbital period will be tested by TESS, which will observe HIP 41378 in Sector 88 starting in January 2025. Our study shows the importance of a mission like CHEOPS, which today is the only mission able to make long observations (i.e., from space) to track the ephemeris of long-period planets possibly affected by large TTVs.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Photo-dynamical characterisation of the TOI-178 resonant chain
Authors:
A. Leleu,
J. -B. Delisle,
L. Delrez,
E. M. Bryant,
A. Brandeker,
H. P. Osborn,
N. Hara,
T. G. Wilson,
N. Billot,
M. Lendl,
D. Ehrenreich,
H. Chakraborty,
M. N. Günther,
M. J. Hooton,
Y. Alibert,
R. Alonso,
D. R. Alves,
D. R. Anderson,
I. Apergis,
D. Armstrong,
T. Bárczy,
D. Barrado Navascues,
S. C. C. Barros,
M. P. Battley,
W. Baumjohann
, et al. (82 additional authors not shown)
Abstract:
The TOI-178 system consists of a nearby late K-dwarf transited by six planets in the super-Earth to mini-Neptune regime, with radii ranging from 1.2 to 2.9 earth radius and orbital periods between 1.9 and 20.7 days. All planets but the innermost one form a chain of Laplace resonances. The fine-tuning and fragility of such orbital configurations ensure that no significant scattering or collision ev…
▽ More
The TOI-178 system consists of a nearby late K-dwarf transited by six planets in the super-Earth to mini-Neptune regime, with radii ranging from 1.2 to 2.9 earth radius and orbital periods between 1.9 and 20.7 days. All planets but the innermost one form a chain of Laplace resonances. The fine-tuning and fragility of such orbital configurations ensure that no significant scattering or collision event has taken place since the formation and migration of the planets in the protoplanetary disc, hence providing important anchors for planet formation models. We aim to improve the characterisation of the architecture of this key system, and in particular the masses and radii of its planets. In addition, since this system is one of the few resonant chains that can be characterised by both photometry and radial velocities, we aim to use it as a test bench for the robustness of the planetary mass determination with each technique. We perform a global analysis of all available photometry and radial velocity. We also try different sets of priors on the masses and eccentricity, as well as different stellar activity models, to study their effects on the masses estimated by each method. We show how stellar activity is preventing us from obtaining a robust mass estimation for the three outer planets using radial velocity data alone. We also show that our joint photo-dynamical and radial velocity analysis resulted in a robust mass determination for planets c to g, with precision of 12% for the mass of planet c, and better than 10% for planets d to g. The new precisions on the radii range from 2 to 3%. The understanding of this synergy between photometric and radial velocity measurements will be valuable during the PLATO mission. We also show that TOI-178 is indeed currently locked in the resonant configuration, librating around an equilibrium of the chain.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Discrete harmonic maps between hyperbolic surfaces
Authors:
Wai Yeung Lam
Abstract:
Given a topological cell decomposition of a closed surface equipped with edge weights, we consider the Dirichlet energy of any geodesic realization of the 1-skeleton graph to a hyperbolic surface. By minimizing the energy over all possible hyperbolic structures and over all realizations within a fixed homotopy class, one obtains a discrete harmonic map into an optimal hyperbolic surface. We charac…
▽ More
Given a topological cell decomposition of a closed surface equipped with edge weights, we consider the Dirichlet energy of any geodesic realization of the 1-skeleton graph to a hyperbolic surface. By minimizing the energy over all possible hyperbolic structures and over all realizations within a fixed homotopy class, one obtains a discrete harmonic map into an optimal hyperbolic surface. We characterize the extremum by showing that at the optimal hyperbolic structure, the discrete harmonic map and the edge weights are induced from a weighted Delaunay decomposition.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Pullback of symplectic forms to the space of circle patterns
Authors:
Wai Yeung Lam
Abstract:
We consider circle patterns on surfaces with complex projective structures. We investigate two symplectic forms pulled back to the deformation space of circle patterns. The first one is Goldman's symplectic form on the space of complex projective structures on closed surfaces. The other is the Weil-Petersson symplectic form on the Teichmüller space of punctured surfaces. We show that their pullbac…
▽ More
We consider circle patterns on surfaces with complex projective structures. We investigate two symplectic forms pulled back to the deformation space of circle patterns. The first one is Goldman's symplectic form on the space of complex projective structures on closed surfaces. The other is the Weil-Petersson symplectic form on the Teichmüller space of punctured surfaces. We show that their pullbacks to the space of circle patterns coincide. It is applied to prove the smoothness of the deformation space, which is an essential step to the conjecture that the space of circle patterns is homeomorphic to the Teichmüller space of the closed surface. We further conjecture that the pullback of the symplectic forms is non-degenerate and defines a symplectic structure on the space of circle patterns.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Characterisation of the TOI-421 planetary system using CHEOPS, TESS, and archival radial velocity data
Authors:
A. F. Krenn,
D. Kubyshkina,
L. Fossati,
J. A. Egger,
A. Bonfanti,
A. Deline,
D. Ehrenreich,
M. Beck,
W. Benz,
J. Cabrera,
T. G. Wilson,
A. Leleu,
S. G. Sousa,
V. Adibekyan,
A. C. M. Correira,
Y. Alibert,
L. Delrez,
M. Lendl,
J. A. Patel,
J. Venturini,
R. Alonso,
G. Anglada,
J. Asquier,
T. Bárczy,
D. Barrado Navascues
, et al. (66 additional authors not shown)
Abstract:
The TOI-421 planetary system contains two sub-Neptune-type planets and is a prime target to study the formation and evolution of planets and their atmospheres. The inner planet is especially interesting as the existence of a hydrogen-dominated atmosphere at its orbital separation cannot be explained by current formation models without previous orbital migration. We jointly analysed photometric dat…
▽ More
The TOI-421 planetary system contains two sub-Neptune-type planets and is a prime target to study the formation and evolution of planets and their atmospheres. The inner planet is especially interesting as the existence of a hydrogen-dominated atmosphere at its orbital separation cannot be explained by current formation models without previous orbital migration. We jointly analysed photometric data of three TESS sectors and six CHEOPS visits as well as 156 radial velocity data points to retrieve improved planetary parameters. We also searched for TTVs and modelled the interior structure of the planets. Finally, we simulated the evolution of the primordial H-He atmospheres of the planets using two different modelling frameworks. We determine the planetary radii and masses of TOI-421 b and c to be $R_{\rm b} = 2.64 \pm 0.08 \, R_{\oplus}$, $M_{\rm b} = 6.7 \pm 0.6 \, M_{\oplus}$, $R_{\rm c} = 5.09 \pm 0.07 \, R_{\oplus}$, and $M_{\rm c} = 14.1 \pm 1.4 \, M_{\oplus}$. We do not detect any statistically significant TTV signals. Assuming the presence of a hydrogen-dominated atmosphere, the interior structure modelling results in both planets having extensive envelopes. While the modelling of the atmospheric evolution predicts for TOI-421 b to have lost any primordial atmosphere that it could have accreted at its current orbital position, TOI-421 c could have started out with an initial atmospheric mass fraction somewhere between 10 and 35%. We conclude that the low observed mean density of TOI-421 b can only be explained by either a bias in the measured planetary parameters (e.g. driven by high-altitude clouds) and/or in the context of orbital migration. We also find that the results of atmospheric evolution models are strongly dependent on the employed planetary structure model.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
Authors:
Junpeng Liu,
Yifan Song,
Bill Yuchen Lin,
Wai Lam,
Graham Neubig,
Yuanzhi Li,
Xiang Yue
Abstract:
Multimodal Large Language models (MLLMs) have shown promise in web-related tasks, but evaluating their performance in the web domain remains a challenge due to the lack of comprehensive benchmarks. Existing benchmarks are either designed for general multimodal tasks, failing to capture the unique characteristics of web pages, or focus on end-to-end web agent tasks, unable to measure fine-grained a…
▽ More
Multimodal Large Language models (MLLMs) have shown promise in web-related tasks, but evaluating their performance in the web domain remains a challenge due to the lack of comprehensive benchmarks. Existing benchmarks are either designed for general multimodal tasks, failing to capture the unique characteristics of web pages, or focus on end-to-end web agent tasks, unable to measure fine-grained abilities such as OCR, understanding, and grounding. In this paper, we introduce \bench{}, a multimodal benchmark designed to assess the capabilities of MLLMs across a variety of web tasks. \bench{} consists of seven tasks, and comprises 1.5K human-curated instances from 139 real websites, covering 87 sub-domains. We evaluate 14 open-source MLLMs, Gemini Pro, Claude-3 series, and GPT-4V(ision) on \bench{}, revealing significant challenges and performance gaps. Further analysis highlights the limitations of current MLLMs, including inadequate grounding in text-rich environments and subpar performance with low-resolution image inputs. We believe \bench{} will serve as a valuable resource for the research community and contribute to the creation of more powerful and versatile MLLMs for web-related applications.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
AURORA: Navigating UI Tarpits via Automated Neural Screen Understanding
Authors:
Safwat Ali Khan,
Wenyu Wang,
Yiran Ren,
Bin Zhu,
Jiangfan Shi,
Alyssa McGowan,
Wing Lam,
Kevin Moran
Abstract:
Nearly a decade of research in software engineering has focused on automating mobile app testing to help engineers in overcoming the unique challenges associated with the software platform. Much of this work has come in the form of Automated Input Generation tools (AIG tools) that dynamically explore app screens. However, such tools have repeatedly been demonstrated to achieve lower-than-expected…
▽ More
Nearly a decade of research in software engineering has focused on automating mobile app testing to help engineers in overcoming the unique challenges associated with the software platform. Much of this work has come in the form of Automated Input Generation tools (AIG tools) that dynamically explore app screens. However, such tools have repeatedly been demonstrated to achieve lower-than-expected code coverage - particularly on sophisticated proprietary apps. Prior work has illustrated that a primary cause of these coverage deficiencies is related to so-called tarpits, or complex screens that are difficult to navigate.
In this paper, we take a critical step toward enabling AIG tools to effectively navigate tarpits during app exploration through a new form of automated semantic screen understanding. We introduce AURORA, a technique that learns from the visual and textual patterns that exist in mobile app UIs to automatically detect common screen designs and navigate them accordingly. The key idea of AURORA is that there are a finite number of mobile app screen designs, albeit with subtle variations, such that the general patterns of different categories of UI designs can be learned. As such, AURORA employs a multi-modal, neural screen classifier that is able to recognize the most common types of UI screen designs. After recognizing a given screen, it then applies a set of flexible and generalizable heuristics to properly navigate the screen. We evaluated AURORA both on a set of 12 apps with known tarpits from prior work, and on a new set of five of the most popular apps from the Google Play store. Our results indicate that AURORA is able to effectively navigate tarpit screens, outperforming prior approaches that avoid tarpits by 19.6% in terms of method coverage. The improvements can be attributed to AURORA's UI design classification and heuristic navigation techniques.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Detailed cool star flare morphology with CHEOPS and TESS
Authors:
G. Bruno,
I. Pagano,
G. Scandariato,
H. -G. Florén,
A. Brandeker,
G. Olofsson,
P. F. L. Maxted,
A. Fortier,
S. G. Sousa,
S. Sulis,
V. Van Grootel,
Z. Garai,
A. Boldog,
L. Kriskovics,
M. Gy. Szabó,
D. Gandolfi,
Y. Alibert,
R. Alonso,
T. Bárczy,
D. Barrado Navascues,
S. C. C. Barros,
W. Baumjohann,
M. Beck,
T. Beck,
W. Benz
, et al. (57 additional authors not shown)
Abstract:
Context. White-light stellar flares are proxies for some of the most energetic types of flares, but their triggering mechanism is still poorly understood. As they are associated with strong X and UV emission, their study is particularly relevant to estimate the amount of high-energy irradiation onto the atmospheres of exoplanets, especially those in their stars' habitable zone. Aims. We used the h…
▽ More
Context. White-light stellar flares are proxies for some of the most energetic types of flares, but their triggering mechanism is still poorly understood. As they are associated with strong X and UV emission, their study is particularly relevant to estimate the amount of high-energy irradiation onto the atmospheres of exoplanets, especially those in their stars' habitable zone. Aims. We used the high-cadence, high-photometric capabilities of the CHEOPS and TESS space telescopes to study the detailed morphology of white-light flares occurring in a sample of 130 late-K and M stars, and compared our findings with results obtained at a lower cadence. We developed dedicated software for this purpose. Results. Multi-peak flares represent a significant percentage ($\gtrsim 30$\%) of the detected outburst events. Our findings suggest that high-impulse flares are more frequent than suspected from lower-cadence data, so that the most impactful flux levels that hit close-in exoplanets might be more time-limited than expected. We found significant differences in the duration distributions of single-peak and complex flare components, but not in their peak luminosity. A statistical analysis of the flare parameter distributions provides marginal support for their description with a log-normal instead of a power-law function, leaving the door open to several flare formation scenarios. We tentatively confirmed previous results about quasi-periodic pulsations in high-cadence photometry, report the possible detection of a pre-flare dip, and did not find hints of photometric variability due to an undetected flare background. Conclusions. The high-cadence study of stellar hosts might be crucial to evaluate the impact of their flares on close-in exoplanets, as their impulsive phase emission might otherwise be incorrectly estimated. Future telescopes such as PLATO and Ariel will help in this respect.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Precise characterisation of HD 15337 with CHEOPS: a laboratory for planet formation and evolution
Authors:
N. M. Rosário,
O. D. S. Demangeon,
S. C. C. Barros,
D. Gandolfi,
J. A. Egger,
L. M. Serrano,
H. P. Osborn,
M. Beck,
W. Benz,
H. -G. Florén,
P. Guterman,
T. G. Wilson,
Y. Alibert,
L. Fossati,
M. J. Hooton,
L. Delrez,
N. C. Santos,
S. G. Sousa,
A. Bonfanti,
S. Salmon,
V. Adibekyan,
A. Nigioni,
J. Venturini,
R. Alonso,
G. Anglada
, et al. (68 additional authors not shown)
Abstract:
We aim to constrain the internal structure and composition of HD 15337 b and c, two short-period planets situated on opposite sides of the radius valley, using new transit photometry and radial velocity data. We acquire 6 new transit visits with the CHaracterising ExOPlanet Satellite (CHEOPS) and 32 new radial velocity measurements from the High Accuracy Radial Velocity Planet Searcher (HARPS) to…
▽ More
We aim to constrain the internal structure and composition of HD 15337 b and c, two short-period planets situated on opposite sides of the radius valley, using new transit photometry and radial velocity data. We acquire 6 new transit visits with the CHaracterising ExOPlanet Satellite (CHEOPS) and 32 new radial velocity measurements from the High Accuracy Radial Velocity Planet Searcher (HARPS) to improve the accuracy of the mass and radius estimates for both planets. We reanalyse light curves from TESS sectors 3 and 4 and analyse new data from sector 30, correcting for long-term stellar activity. Subsequently, we perform a joint fit of the TESS and CHEOPS light curves, and all available RV data from HARPS and the Planet Finder Spectrograph (PFS). Our model fits the planetary signals, the stellar activity signal and the instrumental decorrelation model for the CHEOPS data simultaneously. The stellar activity was modelled using a Gaussian-process regression on both the RV and activity indicators. We finally employ a Bayesian retrieval code to determine the internal composition and structure of the planets. We derive updated and highly precise parameters for the HD 15337 system. Our improved precision on the planetary parameters makes HD 15337 b one of the most precisely characterised rocky exoplanets, with radius and mass measurements achieving a precision better than 2\% and 7\%, respectively. We are able to improve the precision of the radius measurement of HD 15337 c to 3\%. Our results imply that the composition of HD 15337 b is predominantly rocky, while HD 15337 c exhibits a gas envelope with a mass of at least $0.01\ M_\oplus$.Our results lay the groundwork for future studies, which can further unravel the atmospheric evolution of these exoplanets and give new insights into their composition and formation history and the causes behind the radius gap.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
CO3: Low-resource Contrastive Co-training for Generative Conversational Query Rewrite
Authors:
Yifei Yuan,
Chen Shi,
Runze Wang,
Liyi Chen,
Renjun Hu,
Zengming Zhang,
Feijun Jiang,
Wai Lam
Abstract:
Generative query rewrite generates reconstructed query rewrites using the conversation history while rely heavily on gold rewrite pairs that are expensive to obtain. Recently, few-shot learning is gaining increasing popularity for this task, whereas these methods are sensitive to the inherent noise due to limited data size. Besides, both attempts face performance degradation when there exists lang…
▽ More
Generative query rewrite generates reconstructed query rewrites using the conversation history while rely heavily on gold rewrite pairs that are expensive to obtain. Recently, few-shot learning is gaining increasing popularity for this task, whereas these methods are sensitive to the inherent noise due to limited data size. Besides, both attempts face performance degradation when there exists language style shift between training and testing cases. To this end, we study low-resource generative conversational query rewrite that is robust to both noise and language style shift. The core idea is to utilize massive unlabeled data to make further improvements via a contrastive co-training paradigm. Specifically, we co-train two dual models (namely Rewriter and Simplifier) such that each of them provides extra guidance through pseudo-labeling for enhancing the other in an iterative manner. We also leverage contrastive learning with data augmentation, which enables our model pay more attention on the truly valuable information than the noise. Extensive experiments demonstrate the superiority of our model under both few-shot and zero-shot scenarios. We also verify the better generalization ability of our model when encountering language style shift.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Unveiling the Generalization Power of Fine-Tuned Large Language Models
Authors:
Haoran Yang,
Yumeng Zhang,
Jiaqi Xu,
Hongyuan Lu,
Pheng Ann Heng,
Wai Lam
Abstract:
While Large Language Models (LLMs) have demonstrated exceptional multitasking abilities, fine-tuning these models on downstream, domain-specific datasets is often necessary to yield superior performance on test sets compared to their counterparts without fine-tuning. However, the comprehensive effects of fine-tuning on the LLMs' generalization ability are not fully understood. This paper delves in…
▽ More
While Large Language Models (LLMs) have demonstrated exceptional multitasking abilities, fine-tuning these models on downstream, domain-specific datasets is often necessary to yield superior performance on test sets compared to their counterparts without fine-tuning. However, the comprehensive effects of fine-tuning on the LLMs' generalization ability are not fully understood. This paper delves into the differences between original, unmodified LLMs and their fine-tuned variants. Our primary investigation centers on whether fine-tuning affects the generalization ability intrinsic to LLMs. To elaborate on this, we conduct extensive experiments across five distinct language tasks on various datasets. Our main findings reveal that models fine-tuned on generation and classification tasks exhibit dissimilar behaviors in generalizing to different domains and tasks. Intriguingly, we observe that integrating the in-context learning strategy during fine-tuning on generation tasks can enhance the model's generalization ability. Through this systematic investigation, we aim to contribute valuable insights into the evolving landscape of fine-tuning practices for LLMs.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
Authors:
Qibing Ren,
Chang Gao,
Jing Shao,
Junchi Yan,
Xin Tan,
Wai Lam,
Lizhuang Ma
Abstract:
The rapid advancement of Large Language Models (LLMs) has brought about remarkable generative capabilities but also raised concerns about their potential misuse. While strategies like supervised fine-tuning and reinforcement learning from human feedback have enhanced their safety, these methods primarily focus on natural languages, which may not generalize to other domains. This paper introduces C…
▽ More
The rapid advancement of Large Language Models (LLMs) has brought about remarkable generative capabilities but also raised concerns about their potential misuse. While strategies like supervised fine-tuning and reinforcement learning from human feedback have enhanced their safety, these methods primarily focus on natural languages, which may not generalize to other domains. This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs, presenting a novel environment for testing the safety generalization of LLMs. Our comprehensive studies on state-of-the-art LLMs including GPT-4, Claude-2, and Llama-2 series reveal a new and universal safety vulnerability of these models against code input: CodeAttack bypasses the safety guardrails of all models more than 80\% of the time. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization, such as encoding natural language input with data structures. Furthermore, we give our hypotheses about the success of CodeAttack: the misaligned bias acquired by LLMs during code training, prioritizing code completion over avoiding the potential safety risk. Finally, we analyze potential mitigation measures. These findings highlight new safety risks in the code domain and the need for more robust safety alignment algorithms to match the code capabilities of LLMs.
△ Less
Submitted 9 June, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Consecutive Model Editing with Batch alongside HooK Layers
Authors:
Shuaiyi Li,
Yang Deng,
Deng Cai,
Hongyuan Lu,
Liang Chen,
Wai Lam
Abstract:
As the typical retraining paradigm is unacceptably time- and resource-consuming, researchers are turning to model editing in order to seek an effective, consecutive, and batch-supportive way to edit the model behavior directly. Despite all these practical expectations, existing model editing methods fail to realize all of them. Furthermore, the memory demands for such succession-supportive model e…
▽ More
As the typical retraining paradigm is unacceptably time- and resource-consuming, researchers are turning to model editing in order to seek an effective, consecutive, and batch-supportive way to edit the model behavior directly. Despite all these practical expectations, existing model editing methods fail to realize all of them. Furthermore, the memory demands for such succession-supportive model editing approaches tend to be prohibitive, frequently necessitating an external memory that grows incrementally over time. To cope with these challenges, we propose COMEBA-HK, a model editing method that is both consecutive and batch-supportive. COMEBA-HK is memory-friendly as it only needs a small amount of it to store several hook layers with updated weights. Experimental results demonstrate the superiority of our method over other batch-supportive model editing methods under both single-round and consecutive batch editing scenarios. Extensive analyses of COMEBA-HK have been conducted to verify the stability of our method over 1) the number of consecutive steps and 2) the number of editing instance.
△ Less
Submitted 17 April, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Authors:
Haoran Li,
Qingxiu Dong,
Zhengyang Tang,
Chaojun Wang,
Xingxing Zhang,
Haoyang Huang,
Shaohan Huang,
Xiaolong Huang,
Zeqiang Huang,
Dongdong Zhang,
Yuxian Gu,
Xin Cheng,
Xun Wang,
Si-Qing Chen,
Li Dong,
Wei Lu,
Zhifang Sui,
Benyou Wang,
Wai Lam,
Furu Wei
Abstract:
We introduce Generalized Instruction Tuning (called GLAN), a general and scalable method for instruction tuning of Large Language Models (LLMs). Unlike prior work that relies on seed examples or existing datasets to construct instruction tuning data, GLAN exclusively utilizes a pre-curated taxonomy of human knowledge and capabilities as input and generates large-scale synthetic instruction data ac…
▽ More
We introduce Generalized Instruction Tuning (called GLAN), a general and scalable method for instruction tuning of Large Language Models (LLMs). Unlike prior work that relies on seed examples or existing datasets to construct instruction tuning data, GLAN exclusively utilizes a pre-curated taxonomy of human knowledge and capabilities as input and generates large-scale synthetic instruction data across all disciplines. Specifically, inspired by the systematic structure in human education system, we build the taxonomy by decomposing human knowledge and capabilities to various fields, sub-fields and ultimately, distinct disciplines semi-automatically, facilitated by LLMs. Subsequently, we generate a comprehensive list of subjects for every discipline and proceed to design a syllabus tailored to each subject, again utilizing LLMs. With the fine-grained key concepts detailed in every class session of the syllabus, we are able to generate diverse instructions with a broad coverage across the entire spectrum of human knowledge and skills. Extensive experiments on large language models (e.g., Mistral) demonstrate that GLAN excels in multiple dimensions from mathematical reasoning, coding, academic exams, logical reasoning to general instruction following without using task-specific training data of these tasks. In addition, GLAN allows for easy customization and new fields or skills can be added by simply incorporating a new node into our taxonomy.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
The tidal deformation and atmosphere of WASP-12b from its phase curve
Authors:
B. Akinsanmi,
S. C. C. Barros,
M. Lendl,
L. Carone,
P. E. Cubillos,
A. Bekkelien,
A. Fortier,
H. -G. Florén,
A. Collier Cameron,
G. Boué,
G. Bruno,
B. -O. Demory,
A. Brandeker,
S. G. Sousa,
T. G. Wilson,
A. Deline,
A. Bonfanti,
G. Scandariato,
M. J. Hooton,
A. C. M. Correia,
O. D. S. Demangeon,
A. M. S. Smith,
V. Singh,
Y. Alibert,
R. Alonso
, et al. (63 additional authors not shown)
Abstract:
Ultra-hot Jupiters present a unique opportunity to understand the physics and chemistry of planets at extreme conditions. WASP-12b stands out as an archetype of this class of exoplanets. We performed comprehensive analyses of the transits, occultations, and phase curves of WASP-12b by combining new CHEOPS observations with previous TESS and Spitzer data to measure the planet's tidal deformation, a…
▽ More
Ultra-hot Jupiters present a unique opportunity to understand the physics and chemistry of planets at extreme conditions. WASP-12b stands out as an archetype of this class of exoplanets. We performed comprehensive analyses of the transits, occultations, and phase curves of WASP-12b by combining new CHEOPS observations with previous TESS and Spitzer data to measure the planet's tidal deformation, atmospheric properties, and orbital decay rate. The planet was modeled as a triaxial ellipsoid parameterized by the second-order fluid Love number, $h_2$, which quantifies its radial deformation and provides insight into the interior structure. We measured the tidal deformation of WASP-12b and estimated a Love number of $h_2=1.55_{-0.49}^{+0.45}$ (at 3.2$σ$) from its phase curve. We measured occultation depths of $333\pm24$ppm and $493\pm29$ppm in the CHEOPS and TESS bands, respectively, while the dayside emission spectrum indicates that CHEOPS and TESS probe similar pressure levels in the atmosphere at a temperature of 2900K. We also estimated low geometric albedos of $0.086\pm0.017$ and $0.01\pm0.023$ in the CHEOPS and TESS passbands, respectively, suggesting the absence of reflective clouds in the dayside of the WASP-12b. The CHEOPS occultations do not show strong evidence for variability in the dayside atmosphere of the planet. Finally, we refine the orbital decay rate by 12% to a value of -30.23$\pm$0.82 ms/yr.
WASP-12b becomes the second exoplanet, after WASP-103b, for which the Love number has been measured (at 3$sigma$) from the effect of tidal deformation in the light curve. However, constraining the core mass fraction of the planet requires measuring $h_2$ with a higher precision. This can be achieved with high signal-to-noise observations with JWST since the phase curve amplitude, and consequently the induced tidal deformation effect, is higher in the infrared.
△ Less
Submitted 20 February, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Asking Multimodal Clarifying Questions in Mixed-Initiative Conversational Search
Authors:
Yifei Yuan,
Clemencia Siro,
Mohammad Aliannejadi,
Maarten de Rijke,
Wai Lam
Abstract:
In mixed-initiative conversational search systems, clarifying questions are used to help users who struggle to express their intentions in a single query. These questions aim to uncover user's information needs and resolve query ambiguities. We hypothesize that in scenarios where multimodal information is pertinent, the clarification process can be improved by using non-textual information. Theref…
▽ More
In mixed-initiative conversational search systems, clarifying questions are used to help users who struggle to express their intentions in a single query. These questions aim to uncover user's information needs and resolve query ambiguities. We hypothesize that in scenarios where multimodal information is pertinent, the clarification process can be improved by using non-textual information. Therefore, we propose to add images to clarifying questions and formulate the novel task of asking multimodal clarifying questions in open-domain, mixed-initiative conversational search systems. To facilitate research into this task, we collect a dataset named Melon that contains over 4k multimodal clarifying questions, enriched with over 14k images. We also propose a multimodal query clarification model named Marto and adopt a prompt-based, generative fine-tuning strategy to perform the training of different stages with different prompts. Several analyses are conducted to understand the importance of multimodal contents during the query clarification phase. Experimental results indicate that the addition of images leads to significant improvements of up to 90% in retrieval performance when selecting the relevant images. Extensive analyses are also performed to show the superiority of Marto compared with discriminative baselines in terms of effectiveness and efficiency.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
A Thorough Examination of Decoding Methods in the Era of LLMs
Authors:
Chufan Shi,
Haoran Yang,
Deng Cai,
Zhisong Zhang,
Yifan Wang,
Yujiu Yang,
Wai Lam
Abstract:
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current era of general-purpose large language models (LLMs). Moreover, the recent influx of decoding strategies has further complicated this landscape. This paper provi…
▽ More
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current era of general-purpose large language models (LLMs). Moreover, the recent influx of decoding strategies has further complicated this landscape. This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of LLMs, evaluating their performance, robustness to hyperparameter changes, and decoding speeds across a wide range of tasks, models, and deployment environments. Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization. Intriguingly, sensitivity analysis exposes that certain methods achieve superior performance at the cost of extensive hyperparameter tuning, highlighting the trade-off between attaining optimal results and the practicality of implementation in varying contexts.
△ Less
Submitted 17 June, 2024; v1 submitted 10 February, 2024;
originally announced February 2024.
-
High-precision mass measurements of neutron deficient silver isotopes probe the robustness of the $N$ = 50 shell closure
Authors:
Zhuang Ge,
Mikael Reponen,
Tommi Eronen,
Baishan Hu,
Markus Kortelainen,
Anu Kankainen,
Iain Moore,
Dmitrii Nesterenko,
Cenxi Yuan,
Olga Beliuskina,
Laetitia Cañete,
Ruben de Groote,
Celement Delafosse,
Pierre Delahaye,
Timo Dickel,
Antoine de Roubin,
Sarina Geldhof,
Wouter Gins,
Jason Holt,
Marjut Hukkanen,
Arthur Jaries,
Ari Jokinen,
Ágota Koszorús,
Gabriella Kripkó-Koncz,
Sonja Kujanpää
, et al. (14 additional authors not shown)
Abstract:
High-precision mass measurements of exotic $^{95-97}$Ag isotopes close to the $N = Z$ line have been conducted with the JYFLTRAP double Penning trap mass spectrometer, with the silver ions produced using the recently commissioned inductively-heated hot cavity catcher laser ion source at the Ion Guide Isotope Separator On-Line facility. The atomic mass of $^{95}$Ag was directly determined for the f…
▽ More
High-precision mass measurements of exotic $^{95-97}$Ag isotopes close to the $N = Z$ line have been conducted with the JYFLTRAP double Penning trap mass spectrometer, with the silver ions produced using the recently commissioned inductively-heated hot cavity catcher laser ion source at the Ion Guide Isotope Separator On-Line facility. The atomic mass of $^{95}$Ag was directly determined for the first time. In addition, the atomic masses of $β$-decaying 2$^+$ and 8$^+$ states in $^{96}$Ag have been identified and measured for the first time, and the precision of the $^{97}$Ag mass has been improved. The newly measured masses, with a precision of $\approx$ 1 keV/c$^2$, have been used to investigate the $N =$ 50 neutron shell closure confirming it to be robust. Empirical shell-gap and pairing energies determined with the new ground-state mass data are compared with the state-of-the-art \textit{ab initio} calculations with various chiral effective field theory Hamiltonians. The precise determination of the excitation energy of the $^{96m}$Ag isomer in particular serves as a benchmark for \textit{ab initio} predictions of nuclear properties beyond the ground state, specifically for odd-odd nuclei situated in proximity to the proton dripline below $^{100}$Sn. In addition, density functional theory (DFT) calculations and configuration-interaction shell-model (CISM) calculations are compared with the experimental results. All theoretical approaches face challenges to reproduce the trend of nuclear ground-state properties in the silver isotopic chain across the $N =$50 neutron shell and toward the proton drip-line.
△ Less
Submitted 14 June, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
Reasons to Reject? Aligning Language Models with Judgments
Authors:
Weiwen Xu,
Deng Cai,
Zhisong Zhang,
Wai Lam,
Shuming Shi
Abstract:
As humans, we consistently interact with our peers and receive feedback in the form of natural language. This language feedback allows us to maintain appropriate behavior, and rectify potential errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with scalar rewards, we present the first systema…
▽ More
As humans, we consistently interact with our peers and receive feedback in the form of natural language. This language feedback allows us to maintain appropriate behavior, and rectify potential errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with scalar rewards, we present the first systematic exploration of alignment through the lens of language feedback (i.e., judgment). We start with an in-depth investigation of potential methods that can be adapted for aligning LLMs with judgments, revealing that these methods cannot fully capitalize on judgments. To facilitate more effective utilization of judgments, we propose a novel framework, Contrastive Unlikelihood Training (CUT), that allows for fine-grained inappropriate content detection and correction based on judgments. Our results show that, with merely 1317 off-the-shelf judgment data, CUT (LLaMA2-13b) can beat the 175B DaVinci003 and surpass the best baseline by 50.84 points on AlpacaEval. CUT (LLaMA2-chat-13b) can also align LLMs in an iterative fashion using up-to-date model-specific judgments, improving performance from 81.09 to 91.68 points on AlpacaEval. Further analysis suggests that judgments hold greater potential than rewards in LLM alignment.
△ Less
Submitted 6 June, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
The EBLM Project XI. Mass, radius and effective temperature measurements for 23 M-dwarf companions to solar-type stars observed with CHEOPS
Authors:
M. I. Swayne,
P. F. L. Maxted,
A. H. M. J. Triaud,
S. G. Sousa,
A. Deline,
D. Ehrenreich,
S. Hoyer,
G. Olofsson,
I. Boisse,
A. Duck,
S. Gill,
D. Martin,
J. McCormac,
C. M. Persson,
A. Santerne,
D. Sebastian,
M. R. Standing,
L. Acuña,
Y. Alibert,
R. Alonso,
G. Anglada,
T. Bárczy,
D. Barrado Navascues,
S. C. C. Barros,
W. Baumjohann
, et al. (82 additional authors not shown)
Abstract:
Observations of low-mass stars have frequently shown a disagreement between observed stellar radii and radii predicted by theoretical stellar structure models. This ``radius inflation'' problem could have an impact on both stellar and exoplanetary science. We present the final results of our observation programme with the CHEOPS satellite to obtain high-precision light curves of eclipsing binaries…
▽ More
Observations of low-mass stars have frequently shown a disagreement between observed stellar radii and radii predicted by theoretical stellar structure models. This ``radius inflation'' problem could have an impact on both stellar and exoplanetary science. We present the final results of our observation programme with the CHEOPS satellite to obtain high-precision light curves of eclipsing binaries with low mass stellar companions (EBLMs). Combined with the spectroscopic orbits of the solar-type companion, we can derive the masses, radii and effective temperatures of 23 M-dwarf stars. We use the PYCHEOPS data analysis software to analyse their primary and secondary occultations. For all but one target, we also perform analyses with TESS light curves for comparison. We have assessed the impact of starspot-induced variation on our derived parameters and account for this in our radius and effective temperature uncertainties using simulated light curves. We observe trends for inflation with both metallicity and orbital separation. We also observe a strong trend in the difference between theoretical and observational effective temperatures with metallicity. There is no such trend with orbital separation. These results are not consistent with the idea that observed inflation in stellar radius combines with lower effective temperature to preserve the luminosity predicted by low-mass stellar models. Our EBLM systems are high-quality and homogeneous measurements that can be used in further studies into radius inflation.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
A resonant sextuplet of sub-Neptunes transiting the bright star HD 110067
Authors:
R. Luque,
H. P. Osborn,
A. Leleu,
E. Pallé,
A. Bonfanti,
O. Barragán,
T. G. Wilson,
C. Broeg,
A. Collier Cameron,
M. Lendl,
P. F. L. Maxted,
Y. Alibert,
D. Gandolfi,
J. -B. Delisle,
M. J. Hooton,
J. A. Egger,
G. Nowak,
M. Lafarga,
D. Rapetti,
J. D. Twicken,
J. C. Morales,
I. Carleo,
J. Orell-Miquel,
V. Adibekyan,
R. Alonso
, et al. (127 additional authors not shown)
Abstract:
Planets with radii between that of the Earth and Neptune (hereafter referred to as sub-Neptunes) are found in close-in orbits around more than half of all Sun-like stars. Yet, their composition, formation, and evolution remain poorly understood. The study of multi-planetary systems offers an opportunity to investigate the outcomes of planet formation and evolution while controlling for initial con…
▽ More
Planets with radii between that of the Earth and Neptune (hereafter referred to as sub-Neptunes) are found in close-in orbits around more than half of all Sun-like stars. Yet, their composition, formation, and evolution remain poorly understood. The study of multi-planetary systems offers an opportunity to investigate the outcomes of planet formation and evolution while controlling for initial conditions and environment. Those in resonance (with their orbital periods related by a ratio of small integers) are particularly valuable because they imply a system architecture practically unchanged since its birth. Here, we present the observations of six transiting planets around the bright nearby star HD 110067. We find that the planets follow a chain of resonant orbits. A dynamical study of the innermost planet triplet allowed the prediction and later confirmation of the orbits of the rest of the planets in the system. The six planets are found to be sub-Neptunes with radii ranging from 1.94 to 2.85 Re. Three of the planets have measured masses, yielding low bulk densities that suggest the presence of large hydrogen-dominated atmospheres.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Durable, ultrathin, and antifouling polymer brush coating for efficient condensation heat transfer
Authors:
Shuai Li,
Cheuk Wing Edmond Lam,
Matteo Donati,
Kartik Regulagadda,
Emre Yavuz,
Till Pfeiffer,
Panagiotis Sarkiris,
Evangelos Gogolides,
Athanasios Milionis,
Dimos Poulikakos,
Hans-Jürgen Butt,
Michael Kappl
Abstract:
Heat exchangers are made of metals because of their high heat conductivity and mechanical stability. Metal surfaces are inherently hydrophilic, leading to inefficient filmwise condensation. It is still a challenge to coat these metal surfaces with a durable, robust and thin hydrophobic layer, which is required for efficient dropwise condensation. Here, we report the non-structured and ultrathin (~…
▽ More
Heat exchangers are made of metals because of their high heat conductivity and mechanical stability. Metal surfaces are inherently hydrophilic, leading to inefficient filmwise condensation. It is still a challenge to coat these metal surfaces with a durable, robust and thin hydrophobic layer, which is required for efficient dropwise condensation. Here, we report the non-structured and ultrathin (~6 nm) polydimethylsiloxane (PDMS) brushes on copper that sustain high-performing dropwise condensation in high supersaturation. Due to the flexible hydrophobic siloxane polymer chains, the coating has low resistance to drop sliding and excellent chemical stability. The PDMS brushes can sustain dropwise condensation for up to ~8 h during exposure to 111 °C saturated steam flowing at 3 m/s, with a 5-7 times higher heat transfer coefficient compared to filmwise condensation. The surface is self-cleaning and can reduce bacterial attachment by 99%. This low-cost, facile, fluorine-free, and scalable method is suitable for a great variety of condensation heat transfer applications.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Characterising TOI-732 b and c: New insights into the M-dwarf radius and density valley
Authors:
A. Bonfanti,
M. Brady,
T. G. Wilson,
J. Venturini,
J. A. Egger,
A. Brandeker,
S. G. Sousa,
M. Lendl,
A. E. Simon,
D. Queloz,
G. Olofsson,
V. Adibekyan,
Y. Alibert,
L. Fossati,
M. J. Hooton,
D. Kubyshkina,
R. Luque,
F. Murgas,
A. J. Mustill,
N. C. Santos,
V. Van Grootel,
R. Alonso,
J. Asquier,
T. Bandy,
T. Bárczy
, et al. (66 additional authors not shown)
Abstract:
TOI-732 is an M dwarf hosting two transiting planets that are located on the two opposite sides of the radius valley. By doubling the number of available space-based observations and increasing the number of radial velocity (RV) measurements, we aim at refining the parameters of TOI-732 b and c. We also use the results to study the slope of the radius valley and the density valley for a well-chara…
▽ More
TOI-732 is an M dwarf hosting two transiting planets that are located on the two opposite sides of the radius valley. By doubling the number of available space-based observations and increasing the number of radial velocity (RV) measurements, we aim at refining the parameters of TOI-732 b and c. We also use the results to study the slope of the radius valley and the density valley for a well-characterised sample of M-dwarf exoplanets. We performed a global MCMC analysis by jointly modelling ground-based light curves and CHEOPS and TESS observations, along with RV time series both taken from the literature and obtained with the MAROON-X spectrograph. The slopes of the M-dwarf valleys were quantified via a Support Vector Machine (SVM) procedure. TOI-732 b is an ultrashort-period planet ($P\sim0.77$ d) with a radius $R_b=1.325_{-0.058}^{+0.057}$ $R_{\oplus}$ and a mass $M_b=2.46\pm0.19$ $M_{\oplus}$ (mean density $ρ_b=5.8_{-0.8}^{+1.0}$ g cm$^{-3}$), while the outer planet at $P\sim12.25$ d has $R_c=2.39_{-0.11}^{+0.10}$ $R_{\oplus}$, $M_c=8.04_{-0.48}^{+0.50}$ $M_{\oplus}$, and thus $ρ_c=3.24_{-0.43}^{+0.55}$ g cm$^{-3}$. Also taking into account our interior structure calculations, TOI-732 b is a super-Earth and TOI-732 c is a mini-Neptune. Following the SVM approach, we quantified $\mathrm{d}\log{R_{p,{\mathrm{valley}}}}/\mathrm{d}\log{P}=-0.065_{-0.013}^{+0.024}$, which is flatter than for Sun-like stars. In line with former analyses, we note that the radius valley for M-dwarf planets is more densely populated, and we further quantify the slope of the density valley as $\mathrm{d}\log{\hatρ_{\mathrm{valley}}}/\mathrm{d}\log{P}=-0.02_{-0.04}^{+0.12}$. Compared to FGK stars, the weaker dependence of the position of the radius valley on the orbital period might indicate that the formation shapes the radius valley around M dwarfs more strongly than the evolution mechanisms.
△ Less
Submitted 30 November, 2023; v1 submitted 21 November, 2023;
originally announced November 2023.
-
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs
Authors:
Sen Yang,
Xin Li,
Leyang Cui,
Lidong Bing,
Wai Lam
Abstract:
Though prompting LLMs with various reasoning structures produces reasoning proofs along with answers, these proofs are not ensured to be causal and reliable due to the inherent defects of LLMs. Tracking such deficiencies, we present a neuro-symbolic integration method, in which a neural LLM is used to represent the knowledge of the problem while an LLM-free symbolic solver is adopted to do deliber…
▽ More
Though prompting LLMs with various reasoning structures produces reasoning proofs along with answers, these proofs are not ensured to be causal and reliable due to the inherent defects of LLMs. Tracking such deficiencies, we present a neuro-symbolic integration method, in which a neural LLM is used to represent the knowledge of the problem while an LLM-free symbolic solver is adopted to do deliberative reasoning using the knowledge. Specifically, our customized meta-interpreters allow the production of reasoning proofs and support flexible search strategies. These reasoning proofs are ensured to be causal and reliable because of the deterministic executing nature of the symbolic solvers. Empirically, on ProofWriter, our method surpasses the CoT baseline by nearly double in accuracy and more than triple in proof similarity. On GSM8K, our method also shows accuracy improvements and nearly doubled proof similarity. Our code is released at https://github.com/DAMO-NLP-SG/CaRing
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving
Authors:
Chang Gao,
Haiyun Jiang,
Deng Cai,
Shuming Shi,
Wai Lam
Abstract:
Most existing prompting methods suffer from the issues of generalizability and consistency, as they often rely on instance-specific solutions that may not be applicable to other instances and lack task-level consistency across the selected few-shot examples. To address these limitations, we propose a comprehensive framework, StrategyLLM, allowing LLMs to perform inductive reasoning, deriving gener…
▽ More
Most existing prompting methods suffer from the issues of generalizability and consistency, as they often rely on instance-specific solutions that may not be applicable to other instances and lack task-level consistency across the selected few-shot examples. To address these limitations, we propose a comprehensive framework, StrategyLLM, allowing LLMs to perform inductive reasoning, deriving general strategies from specific task instances, and deductive reasoning, applying these general strategies to particular task examples, for constructing generalizable and consistent few-shot prompts. It employs four LLM-based agents: strategy generator, executor, optimizer, and evaluator, working together to generate, evaluate, and select promising strategies for a given task. Experimental results demonstrate that StrategyLLM outperforms the competitive baseline CoT-SC that requires human-annotated solutions on 13 datasets across 4 challenging tasks without human involvement, including math reasoning (34.2\% $\rightarrow$ 38.8\%), commonsense reasoning (70.3\% $\rightarrow$ 72.5\%), algorithmic reasoning (73.7\% $\rightarrow$ 85.0\%), and symbolic reasoning (30.0\% $\rightarrow$ 79.2\%). Further analysis reveals that StrategyLLM is applicable to various LLMs and demonstrates advantages across numerous scenarios.
△ Less
Submitted 24 May, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Modelling the Light Curves of Transiting Exomoons: a Zero-order Photodynamic Agent Added to the Transit and Light Curve Modeller
Authors:
Sz. Kálmán,
Sz. Csizmadia,
A. E. Simon,
K. W. F. Lam,
A. Deline,
J. -V. Harre,
Gy. M. Szabó
Abstract:
Despite the ever-growing number of exoplanets discovered and the extensive analyses carried out to find their potential satellites, only two exomoon candidates, Kepler-1625b-i and Kepler-1708 b-i, have been discovered to date. A considerable amount of effort has been invested in the development of algorithms for modelling, searching, and detecting exomoons in exoplanetary light curves. In this wor…
▽ More
Despite the ever-growing number of exoplanets discovered and the extensive analyses carried out to find their potential satellites, only two exomoon candidates, Kepler-1625b-i and Kepler-1708 b-i, have been discovered to date. A considerable amount of effort has been invested in the development of algorithms for modelling, searching, and detecting exomoons in exoplanetary light curves. In this work, we incorporate moon handling capabilities into the state-of-the-art and publicly available code, the Transit and Light Curve Modeller (TLCM). The code is designed for the analysis of transiting exoplanet systems with the inclusion of a wavelet-based noise handling algorithm. Here we present an updated version of TLCM that is capable of modelling a coplanar planet-moon system on an elliptical orbit around its host, accounting for mutual eclipses between the two bodies (and neglecting perturbative effects) -- a so-called photodynamic model. The key benefit of this framework is the ability for a joint analysis of multiple planet-moon transits. We demonstrate the necessity of this software on a case study of Kepler-1625b. Similarly to prior works, we conclude that there is no firm evidence of an exomoon in that system, by showing that temporally correlated noise can mimic apparent lunar transits.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
CHEOPS observations of KELT-20 b/MASCARA-2 b: An aligned orbit and signs of variability from a reflective dayside
Authors:
V. Singh,
G. Scandariato,
A. M. S. Smith,
P. E. Cubillos,
M. Lendl,
N. Billot,
A. Fortier,
D. Queloz,
S. G. Sousa,
Sz. Csizmadia,
A. Brandeker,
L. Carone,
T. G. Wilson,
B. Akinsanmi,
J. A. Patel,
A. Krenn,
O. D. S. Demangeon,
G. Bruno,
I. Pagano,
M. J. Hooton,
J. Cabrera,
N. C. Santos,
Y. Alibert,
R. Alonso,
J. Asquier
, et al. (65 additional authors not shown)
Abstract:
Occultations are windows of opportunity to indirectly peek into the dayside atmosphere of exoplanets. High-precision transit events provide information on the spin-orbit alignment of exoplanets around fast-rotating hosts. We aim to precisely measure the planetary radius and geometric albedo of the ultra-hot Jupiter (UHJ) KELT-20 b as well as the system's spin-orbit alignment. We obtained optical h…
▽ More
Occultations are windows of opportunity to indirectly peek into the dayside atmosphere of exoplanets. High-precision transit events provide information on the spin-orbit alignment of exoplanets around fast-rotating hosts. We aim to precisely measure the planetary radius and geometric albedo of the ultra-hot Jupiter (UHJ) KELT-20 b as well as the system's spin-orbit alignment. We obtained optical high-precision transits and occultations of KELT-20 b using CHEOPS observations in conjunction with the simultaneous TESS observations. We interpreted the occultation measurements together with archival infrared observations to measure the planetary geometric albedo and dayside temperatures. We further used the host star's gravity-darkened nature to measure the system's obliquity. We present a time-averaged precise occultation depth of 82(6) ppm measured with seven CHEOPS visits and 131(+8/-7) ppm from the analysis of all available TESS photometry. Using these measurements, we precisely constrain the geometric albedo of KELT-20 b to 0.26(0.04) and the brightness temperature of the dayside hemisphere to 2566(+77/-80) K. Assuming Lambertian scattering law, we constrain the Bond albedo to 0.36(+0.04/-0.05) along with a minimal heat transfer to the night side. Furthermore, using five transit observations we provide stricter constraints of 3.9(1.1) degrees on the sky-projected obliquity of the system. The aligned orbit of KELT-20 b is in contrast to previous CHEOPS studies that have found strongly inclined orbits for planets orbiting other A-type stars. The comparably high planetary geometric albedo of KELT-20 b corroborates a known trend of strongly irradiated planets being more reflective. Finally, we tentatively detect signs of temporal variability in the occultation depths, which might indicate variable cloud cover advecting onto the planetary day side.
△ Less
Submitted 29 November, 2023; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents
Authors:
Yang Deng,
Wenxuan Zhang,
Wai Lam,
See-Kiong Ng,
Tat-Seng Chua
Abstract:
Proactive dialogues serve as a practical yet challenging dialogue problem in the era of large language models (LLMs), where the dialogue policy planning is the key to improving the proactivity of LLMs. Most existing studies enable the dialogue policy planning of LLMs using various prompting schemes or iteratively enhance this capability in handling the given case with verbal AI feedback. However,…
▽ More
Proactive dialogues serve as a practical yet challenging dialogue problem in the era of large language models (LLMs), where the dialogue policy planning is the key to improving the proactivity of LLMs. Most existing studies enable the dialogue policy planning of LLMs using various prompting schemes or iteratively enhance this capability in handling the given case with verbal AI feedback. However, these approaches are either bounded by the policy planning capability of the frozen LLMs or hard to be transferred to new cases. In this work, we introduce a new dialogue policy planning paradigm to strategize LLMs for proactive dialogue problems with a tunable language model plug-in as a plug-and-play dialogue policy planner, named PPDPP. Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data as well as reinforcement learning from goal-oriented AI feedback with dynamic interaction data collected by the LLM-based self-play simulation. In this manner, the LLM-powered dialogue agent can not only be generalized to different cases after the training, but also be applicable to different applications by just substituting the learned plug-in. In addition, we propose to evaluate the policy planning capability of dialogue systems under the interactive setting. Experimental results demonstrate that PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications, including negotiation, emotional support, and tutoring dialogues.
△ Less
Submitted 11 March, 2024; v1 submitted 31 October, 2023;
originally announced November 2023.
-
TOI-544 b: a potential water-world inside the radius valley in a two-planet system
Authors:
H. L. M. Osborne,
V. Van Eylen,
E. Goffo,
D. Gandolfi,
G. Nowak,
C. M. Persson,
J. Livingston,
A. Weeks,
E. Pallé,
R. Luque,
C. Hellier,
I. Carleo,
S. Redfield,
T. Hirano,
M. Garbaccio Gili,
J. Alarcon,
O. Barragán,
N. Casasayas-Barris,
M. R. Díaz,
M. Esposito,
J. S. Jenkins,
E. Knudstrup,
F. Murgas,
J. Orell-Miquel,
F. Rodler
, et al. (10 additional authors not shown)
Abstract:
We report on the precise radial velocity follow-up of TOI-544 (HD 290498), a bright K star (V=10.8), which hosts a small transiting planet recently discovered by the Transiting Exoplanet Survey Satellite (TESS). We collected 122 high-resolution HARPS and HARPS-N spectra to spectroscopically confirm the transiting planet and measure its mass. The nearly 3-year baseline of our follow-up allowed us t…
▽ More
We report on the precise radial velocity follow-up of TOI-544 (HD 290498), a bright K star (V=10.8), which hosts a small transiting planet recently discovered by the Transiting Exoplanet Survey Satellite (TESS). We collected 122 high-resolution HARPS and HARPS-N spectra to spectroscopically confirm the transiting planet and measure its mass. The nearly 3-year baseline of our follow-up allowed us to unveil the presence of an additional, non-transiting, longer-period companion planet. We derived a radius and mass for the inner planet, TOI-544b, of 2.018 $\pm$ 0.076 R$_{\oplus}$ and 2.89 $\pm$ 0.48 M$_{\oplus}$ respectively, which gives a bulk density of $1.93^{+0.30}_{-0.25}$ g cm$^{-3}$. TOI-544c has a minimum mass of 21.5 $\pm$ 2.0 M$_{\oplus}$ and orbital period of 50.1 $\pm$ 0.2 days. The low density of planet-b implies that it has either an Earth-like rocky core with a hydrogen atmosphere, or a composition which harbours a significant fraction of water. The composition interpretation is degenerate depending on the specific choice of planet interior models used. Additionally, TOI-544b has an orbital period of 1.55 days and equilibrium temperature of 999 $\pm$ 14 K, placing it within the predicted location of the radius valley, where few planets are expected. TOI-544b is a top target for future atmospheric observations, for example with JWST, which would enable better constraints of the planet composition.
△ Less
Submitted 11 December, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Once Upon a $\textit{Time}$ in $\textit{Graph}$: Relative-Time Pretraining for Complex Temporal Reasoning
Authors:
Sen Yang,
Xin Li,
Lidong Bing,
Wai Lam
Abstract:
Our physical world is constantly evolving over time, rendering challenges for pre-trained language models to understand and reason over the temporal contexts of texts. Existing work focuses on strengthening the direct association between a piece of text and its time-stamp. However, the knowledge-time association is usually insufficient for the downstream tasks that require reasoning over temporal…
▽ More
Our physical world is constantly evolving over time, rendering challenges for pre-trained language models to understand and reason over the temporal contexts of texts. Existing work focuses on strengthening the direct association between a piece of text and its time-stamp. However, the knowledge-time association is usually insufficient for the downstream tasks that require reasoning over temporal dependencies between knowledge. In this work, we make use of the underlying nature of time, all temporally-scoped sentences are strung together through a one-dimensional time axis, and suggest creating a graph structure based on the relative placements of events along the time axis. Inspired by the graph view, we propose RemeMo ($\underline{Re}$lative Ti$\underline{me}$ $\underline{Mo}$deling), which explicitly connects all temporally-scoped facts by modeling the time relations between any two sentences. Experimental results show that RemeMo outperforms the baseline T5 on multiple temporal question answering datasets under various settings. Further analysis suggests that RemeMo is especially good at modeling long-range complex temporal dependencies. We release our code and pre-trained checkpoints at $\href{https://github.com/DAMO-NLP-SG/RemeMo}{\text{this url}}$.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
DepWiGNN: A Depth-wise Graph Neural Network for Multi-hop Spatial Reasoning in Text
Authors:
Shuaiyi Li,
Yang Deng,
Wai Lam
Abstract:
Spatial reasoning in text plays a crucial role in various real-world applications. Existing approaches for spatial reasoning typically infer spatial relations from pure text, which overlooks the gap between natural language and symbolic structures. Graph neural networks (GNNs) have showcased exceptional proficiency in inducing and aggregating symbolic structures. However, classical GNNs face chall…
▽ More
Spatial reasoning in text plays a crucial role in various real-world applications. Existing approaches for spatial reasoning typically infer spatial relations from pure text, which overlooks the gap between natural language and symbolic structures. Graph neural networks (GNNs) have showcased exceptional proficiency in inducing and aggregating symbolic structures. However, classical GNNs face challenges in handling multi-hop spatial reasoning due to the over-smoothing issue, i.e., the performance decreases substantially as the number of graph layers increases. To cope with these challenges, we propose a novel Depth-Wise Graph Neural Network (DepWiGNN). Specifically, we design a novel node memory scheme and aggregate the information over the depth dimension instead of the breadth dimension of the graph, which empowers the ability to collect long dependencies without stacking multiple layers. Experimental results on two challenging multi-hop spatial reasoning datasets show that DepWiGNN outperforms existing spatial reasoning methods. The comparisons with the other three GNNs further demonstrate its superiority in capturing long dependency in the graph.
△ Less
Submitted 8 March, 2024; v1 submitted 19 October, 2023;
originally announced October 2023.
-
The Effects of Computational Resources on Flaky Tests
Authors:
Denini Silva,
Martin Gruber,
Satyajit Gokhale,
Ellen Arteca,
Alexi Turcotte,
Marcelo d'Amorim,
Wing Lam,
Stefan Winter,
Jonathan Bell
Abstract:
Flaky tests are tests that nondeterministically pass and fail in unchanged code. These tests can be detrimental to developers' productivity. Particularly when tests run in continuous integration environments, the tests may be competing for access to limited computational resources (CPUs, memory etc.), and we hypothesize that resource (in)availability may be a significant factor in the failure rate…
▽ More
Flaky tests are tests that nondeterministically pass and fail in unchanged code. These tests can be detrimental to developers' productivity. Particularly when tests run in continuous integration environments, the tests may be competing for access to limited computational resources (CPUs, memory etc.), and we hypothesize that resource (in)availability may be a significant factor in the failure rate of flaky tests. We present the first assessment of the impact that computational resources have on flaky tests, including a total of 52 projects written in Java, JavaScript and Python, and 27 different resource configurations. Using a rigorous statistical methodology, we determine which tests are RAFT (Resource-Affected Flaky Tests). We find that 46.5% of the flaky tests in our dataset are RAFT, indicating that a substantial proportion of flaky-test failures can be avoided by adjusting the resources available when running tests. We report RAFTs and configurations to avoid them to developers, and received interest to either fix the RAFTs or to improve the specifications of the projects so that tests would be run only in configurations that are unlikely to encounter RAFT failures. Our results also have implications for researchers attempting to detect flaky tests, e.g., reducing the resources available when running tests is a cost-effective approach to detect more flaky failures.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Condensate droplet roaming on nanostructured superhydrophobic surfaces
Authors:
Cheuk Wing Edmond Lam,
Kartik Regulagadda,
Matteo Donati,
Abinash Tripathy,
Gopal Chandra Pal,
Chander Shekhar Sharma,
Athanasios Milionis,
Dimos Poulikakos
Abstract:
Jumping of coalescing condensate droplets from superhydrophobic surfaces is an interesting phenomenon which yields marked heat transfer enhancement over the more explored gravity-driven droplet removal mode in surface condensation, a phase change process of central interest to applications ranging from energy to water harvesting. However, when condensate microdroplets coalesce, they can also spont…
▽ More
Jumping of coalescing condensate droplets from superhydrophobic surfaces is an interesting phenomenon which yields marked heat transfer enhancement over the more explored gravity-driven droplet removal mode in surface condensation, a phase change process of central interest to applications ranging from energy to water harvesting. However, when condensate microdroplets coalesce, they can also spontaneously propel themselves omnidirectionally on the surface independent of gravity and grow by feeding from droplets they sweep along the way. Here we observe and explain the physics behind this phenomenon of roaming of coalescing condensate microdroplets on solely nanostructured superhydrophobic surfaces, where the microdroplets are orders of magnitude larger than the underlaying surface nanotexture. We quantify and show that it is the inherent asymmetries in droplet adhesion during condensation, arising from the stochastic nature of nucleation within the nanostructures, that generates the tangential momentum driving the roaming motion. Subsequent dewetting during this conversion initiates a vivid roaming and successive coalescence process, preventing condensate flooding of the surface, and enhancing surface renewal. Finally, we show that the more efficient conversion process of roaming from excess surface energy to kinetic energy results in significantly improved heat transfer efficiency over condensate droplet jumping, the mechanism currently understood as maximum.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
No random transits in CHEOPS observations of HD 139139
Authors:
R. Alonso,
S. Hoyer,
M. Deleuil,
A. E. Simon,
M. Beck,
W. Benz,
H. -G. Florén,
P. Guterman,
L. Borsato,
A. Brandeker,
D. Gandolfi,
T. G. Wilson,
T. Zingales,
Y. Alibert,
G. Anglada,
T. Bárczy,
D. Barrado Navascues,
S. C. C. Barros,
W. Baumjohann,
T. Beck,
N. Billot,
X. Bonfils,
Ch. Broeg,
S. Charnoz,
A. Collier Cameron
, et al. (56 additional authors not shown)
Abstract:
HD 139139 (a.k.a. 'The Random Transiter') is a star that exhibited enigmatic transit-like features with no apparent periodicity in K2 data. The shallow depth of the events ($\sim$200 ppm -- equivalent to transiting objects with radii of $\sim$1.5 R$_\oplus$ in front of a Sun-like star), and their non-periodicity, constitutes a challenge for the photometric follow-up of this star. The goal of this…
▽ More
HD 139139 (a.k.a. 'The Random Transiter') is a star that exhibited enigmatic transit-like features with no apparent periodicity in K2 data. The shallow depth of the events ($\sim$200 ppm -- equivalent to transiting objects with radii of $\sim$1.5 R$_\oplus$ in front of a Sun-like star), and their non-periodicity, constitutes a challenge for the photometric follow-up of this star. The goal of this study is to confirm with independent measurements the presence of shallow, non-periodic transit-like features on this object. We performed observations with CHEOPS, for a total accumulated time of 12.75 d, distributed in visits of roughly 20 h in two observing campaigns in years 2021 and 2022. The precision of the data is sufficient to detect 150 ppm features with durations longer than 1.5 h. We use the duration and times of the events seen in the K2 curve to estimate how many should have been detected in our campaigns, under the assumption that their behaviour during the CHEOPS observations would be the same as in the K2 data of 2017. We do not detect events with depths larger than 150 ppm in our data set. If the frequency, depth, and duration of the events were the same as in the K2 campaign, we estimate the probability of having missed all events due to our limited observing window would be 4.8 %. We suggest three different scenarios to explain our results: 1) Our observing window was not long enough, and the events were missed with the estimated 4.8 % probability. 2) The events recorded in the K2 observations were time critical, and the mechanism producing them was either not active in the 2021 and 2022 campaigns or created shallower events under our detectability level. 3) The enigmatic events in the K2 data are the result of an unidentified and infrequent instrumental noise in the original data set or its data treatment.
△ Less
Submitted 25 October, 2023; v1 submitted 16 October, 2023;
originally announced October 2023.
-
JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning
Authors:
Chang Gao,
Wenxuan Zhang,
Guizhen Chen,
Wai Lam
Abstract:
Instruction tuning has become an essential process for optimizing the performance of large language models (LLMs). However, current text-to-text instruction tuning methods, referred to as TextTuning, exhibit significant limitations in terms of generalization, robustness, and controllability, primarily due to the absence of explicit task structures. In this paper, we introduce JsonTuning, a novel s…
▽ More
Instruction tuning has become an essential process for optimizing the performance of large language models (LLMs). However, current text-to-text instruction tuning methods, referred to as TextTuning, exhibit significant limitations in terms of generalization, robustness, and controllability, primarily due to the absence of explicit task structures. In this paper, we introduce JsonTuning, a novel structure-to-structure approach for instruction tuning. By utilizing the versatile and structured format of JSON to represent tasks, JsonTuning enhances generalization by enabling the model to comprehend essential task elements and their interrelations, improves robustness by reducing ambiguity, and increases controllability by providing explicit control over the output. We conduct a comprehensive comparative analysis between JsonTuning and TextTuning using various language models and evaluation benchmarks. Our experimental results demonstrate that JsonTuning consistently outperforms TextTuning across a range of applications, showing marked improvements in performance, robustness, and controllability. By addressing the inherent limitations of TextTuning, JsonTuning reveals significant potential for developing more effective and reliable LLMs capable of managing diverse scenarios.
△ Less
Submitted 24 May, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Social Media Fashion Knowledge Extraction as Captioning
Authors:
Yifei Yuan,
Wenxuan Zhang,
Yang Deng,
Wai Lam
Abstract:
Social media plays a significant role in boosting the fashion industry, where a massive amount of fashion-related posts are generated every day. In order to obtain the rich fashion information from the posts, we study the task of social media fashion knowledge extraction. Fashion knowledge, which typically consists of the occasion, person attributes, and fashion item information, can be effectivel…
▽ More
Social media plays a significant role in boosting the fashion industry, where a massive amount of fashion-related posts are generated every day. In order to obtain the rich fashion information from the posts, we study the task of social media fashion knowledge extraction. Fashion knowledge, which typically consists of the occasion, person attributes, and fashion item information, can be effectively represented as a set of tuples. Most previous studies on fashion knowledge extraction are based on the fashion product images without considering the rich text information in social media posts. Existing work on fashion knowledge extraction in social media is classification-based and requires to manually determine a set of fashion knowledge categories in advance. In our work, we propose to cast the task as a captioning problem to capture the interplay of the multimodal post information. Specifically, we transform the fashion knowledge tuples into a natural language caption with a sentence transformation method. Our framework then aims to generate the sentence-based fashion knowledge directly from the social media post. Inspired by the big success of pre-trained models, we build our model based on a multimodal pre-trained generative model and design several auxiliary tasks for enhancing the knowledge extraction. Since there is no existing dataset which can be directly borrowed to our task, we introduce a dataset consisting of social media posts with manual fashion knowledge annotation. Extensive experiments are conducted to demonstrate the effectiveness of our model.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Constraining the reflective properties of WASP-178b using Cheops photometry
Authors:
I. Pagano,
G. Scandariato,
V. Singh,
M. Lendl,
D. Queloz,
A. E. Simon,
S. G. Sousa,
A. Brandeker,
A. Collier Cameron,
S. Sulis,
V. Van Grootel,
T. G. Wilson,
Y. Alibert,
R. Alonso,
G. Anglada,
T. Bárczy,
D. Barrado Navascues,
S. C. C. Barros,
W. Baumjohann,
M. Beck,
T. Beck,
W. Benz,
N. Billot,
X. Bonfils,
L. Borsato
, et al. (57 additional authors not shown)
Abstract:
Multiwavelength photometry of the secondary eclipses of extrasolar planets is able to disentangle the reflected and thermally emitted light radiated from the planetary dayside. This leads to the measurement of the planetary geometric albedo $A_g$, which is an indicator of the presence of clouds in the atmosphere, and the recirculation efficiency $ε$, which quantifies the energy transport within th…
▽ More
Multiwavelength photometry of the secondary eclipses of extrasolar planets is able to disentangle the reflected and thermally emitted light radiated from the planetary dayside. This leads to the measurement of the planetary geometric albedo $A_g$, which is an indicator of the presence of clouds in the atmosphere, and the recirculation efficiency $ε$, which quantifies the energy transport within the atmosphere. In this work we aim to measure $A_g$ and $ε$ for the planet WASP-178 b, a highly irradiated giant planet with an estimated equilibrium temperature of 2450 K.} We analyzed archival spectra and the light curves collected by Cheops and Tess to characterize the host WASP-178, refine the ephemeris of the system and measure the eclipse depth in the passbands of the two respective telescopes. We measured a marginally significant eclipse depth of 70$\pm$40 ppm in the Tess passband and statistically significant depth of 70$\pm$20 ppm in the Cheops passband. Combining the eclipse depth measurement in the Cheops (lambda_eff=6300 AA) and Tess (lambda_eff=8000 AA) passbands we constrained the dayside brightness temperature of WASP-178 b in the 2250-2800 K interval. The geometric albedo 0.1<$\rm A_g$<0.35 is in general agreement with the picture of poorly reflective giant planets, while the recirculation efficiency $ε>$0.7 makes WASP-178 b an interesting laboratory to test the current heat recirculation models.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
Distributional Inclusion Hypothesis and Quantifications: Probing for Hypernymy in Functional Distributional Semantics
Authors:
Chun Hei Lo,
Wai Lam,
Hong Cheng,
Guy Emerson
Abstract:
Functional Distributional Semantics (FDS) models the meaning of words by truth-conditional functions. This provides a natural representation for hypernymy but no guarantee that it can be learnt when FDS models are trained on a corpus. In this paper, we probe into FDS models and study the representations learnt, drawing connections between quantifications, the Distributional Inclusion Hypothesis (D…
▽ More
Functional Distributional Semantics (FDS) models the meaning of words by truth-conditional functions. This provides a natural representation for hypernymy but no guarantee that it can be learnt when FDS models are trained on a corpus. In this paper, we probe into FDS models and study the representations learnt, drawing connections between quantifications, the Distributional Inclusion Hypothesis (DIH), and the variational-autoencoding objective of FDS model training. Using synthetic data sets, we reveal that FDS models learn hypernymy on a restricted class of corpus that strictly follows the DIH. We further introduce a training objective that both enables hypernymy learning under the reverse of the DIH and improves hypernymy detection from real corpora.
△ Less
Submitted 10 February, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Nuclear ground-state properties probed by the relativistic Hartree-Bogoliubov approach
Authors:
Zi Xin Liu,
Yi Hua Lam,
Ning Lu,
Peter Ring
Abstract:
Using the relativistic Hartree-Bogoliubov framework with separable pairing force coupled with the latest covariant density functionals, i.e., PC-L3R, PC-X, DD-PCX, and DD-MEX, we systematically explore the ground-state properties of all isotopes of Z=8-110. These properties consist of the binding energies, one- and two-neutron separation energies ($S_\mathrm{n}$ and $S_\mathrm{2n}$), root-mean-squ…
▽ More
Using the relativistic Hartree-Bogoliubov framework with separable pairing force coupled with the latest covariant density functionals, i.e., PC-L3R, PC-X, DD-PCX, and DD-MEX, we systematically explore the ground-state properties of all isotopes of Z=8-110. These properties consist of the binding energies, one- and two-neutron separation energies ($S_\mathrm{n}$ and $S_\mathrm{2n}$), root-mean-square radius of matter, of neutron, of proton, and of charge distributions, Fermi surfaces, ground-state spins and parities. We then predict the edges of nuclear landscape and bound nuclei for the isotopic chains from oxygen (Z=8) to darmstadtium (Z=110) based on these latest covariant density functionals. The number of bound nuclei predicted by PC-L3R, PC-X, DD-PCX, and DD-MEX, are 9004, 9162, 6799, and 7112, respectively. The root-mean-square deviations of $S_\mathrm{n}$ ($S_\mathrm{2n}$) yielded from PC-L3R, PCX, DD-PCX, and DD-MEX are 0.962 (1.300) MeV, 0.920 (1.483) MeV, 0.993 (1.753) MeV, and 1.010 (1.544) MeV, respectively. The root-mean-square deviations of charge radius distributions of comparing the available experimental values with the theoretical counterparts resulted from PC-L3R, PC-X, DD-PCX, and DD-MEX are 0.035 fm, 0.037 fm, 0.035 fm, and 0.034 fm, respectively. We notice pronounced differences between the empirical and theoretical root-mean-square radii of neutron at nuclei near the neutron drip line of the Mg, Ca, and Kr isotopic chains, suggesting the possible existence of the halo or giant halo phenomena.
△ Less
Submitted 18 January, 2024; v1 submitted 12 September, 2023;
originally announced September 2023.
-
EPA: Easy Prompt Augmentation on Large Language Models via Multiple Sources and Multiple Targets
Authors:
Hongyuan Lu,
Wai Lam
Abstract:
Large language models (LLMs) have shown promising performance on various NLP tasks via task prompting. And their performance can be further improved by appending task demonstrations to the head of the prompt. And usually, a better performance can be achieved with more demonstrations. However, asking the users to write the demonstrations can be cumbersome. As a simple yet cost-effective workaround,…
▽ More
Large language models (LLMs) have shown promising performance on various NLP tasks via task prompting. And their performance can be further improved by appending task demonstrations to the head of the prompt. And usually, a better performance can be achieved with more demonstrations. However, asking the users to write the demonstrations can be cumbersome. As a simple yet cost-effective workaround, this paper proposes a novel method called EPA (\textbf{E}asy \textbf{P}rompt \textbf{A}ugmentation)\footnote{While this paper considers augmenting prompts via demonstrations, we name it EPA as the name EDA is already taken by a well-known NLP method \citep{wei-zou-2019-eda}.} that effectively minimizes user efforts in writing demonstrations while improving the model performance at the same time. EPA achieves these goals by automatically augmenting the demonstrations with multiple sources/targets, where each of them paraphrases each other. This is well motivated as augmenting data via paraphrasing effectively improves neural language models. EPA thus employs paraphrasing as an augmentation method for in-context learning. Extensive experiments indicate that EPA effectively improves both NLU and NLG tasks, covering from natural language inference to machine translation in translating tens of languages.\footnote{Code and data will be released upon publication.}
△ Less
Submitted 9 September, 2023;
originally announced September 2023.
-
Exceptional behavior in critical first-passage percolation and random sums
Authors:
Michael Damron,
Jack Hanson,
David Harper,
Wai-Kit Lam
Abstract:
We study first-passage percolation (FPP) on the square lattice. The model is defined using i.i.d. nonnegative random edge-weights $(t_e)$ associated to the nearest neighbor edges of $\mathbb{Z}^2$. The passage time between vertices $x$ and $y$, $T(x,y)$, is the minimal total weight of any lattice path from $x$ to $y$. The growth rate of $T(x,y)$ depends on the value of $F(0) = \mathbb{P}(t_e=0)$:…
▽ More
We study first-passage percolation (FPP) on the square lattice. The model is defined using i.i.d. nonnegative random edge-weights $(t_e)$ associated to the nearest neighbor edges of $\mathbb{Z}^2$. The passage time between vertices $x$ and $y$, $T(x,y)$, is the minimal total weight of any lattice path from $x$ to $y$. The growth rate of $T(x,y)$ depends on the value of $F(0) = \mathbb{P}(t_e=0)$: if $F(0) < 1/2$ then $T(x,y)$ grows linearly in $|x-y|$, but if $F(0) > 1/2$ then it is stochastically bounded. In the critical case, where $F(0) = 1/2$, $T(x,y)$ can be bounded or unbounded depending on the behavior of the distribution function $F$ of $t_e$ near 0. In this paper, we consider the critical case in which $T(x,y)$ is unbounded and prove the existence of an incipient infinite cluster (IIC) type measure, constructed by conditioning the environment on the event that the passage time from $0$ to a far distance remains bounded. This IIC measure is a natural candidate for the distribution of the weights at a typical exceptional time in dynamical FPP. A major part of the analysis involves characterizing the limiting behavior of independent nonnegative random variables conditioned to have small sum. We give conditions on random variables that ensure that such limits are trivial, and several examples that exhibit nontrivial limits.
△ Less
Submitted 19 August, 2023;
originally announced August 2023.