\addbibresource

sample.bib

Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification

Yu-Yang Li Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, 20A Datun Road, Chaoyang District, Beijing 100101, People’s Republic of China College of Astronomy and Space Sciences, University of Chinese Academy of Sciences, Beijing 100049, China Institute for Frontiers in Astronomy and Astrophysics, Beijing Normal University, Beijing 102206, China Yu Bai Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, 20A Datun Road, Chaoyang District, Beijing 100101, People’s Republic of China College of Astronomy and Space Sciences, University of Chinese Academy of Sciences, Beijing 100049, China Institute for Frontiers in Astronomy and Astrophysics, Beijing Normal University, Beijing 102206, China Cunshi Wang Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, 20A Datun Road, Chaoyang District, Beijing 100101, People’s Republic of China College of Astronomy and Space Sciences, University of Chinese Academy of Sciences, Beijing 100049, China Mengwei Qu State Key Laboratory of Isotope Geochemistry, Guangzhou Institute of Geochemistry, Chinese Academy of Sciences, Guangzhou 510640,China College of Earth and Planetary Sciences, University of Chinese Academy of Sciences, Beijing 100049, China Ziteng Lu School of Foreign Studies, TonglingUniversity, Tongling, Anhui, 244061, People’s Republic of China Roberto Soria College of Astronomy and Space Sciences, University of Chinese Academy of Sciences, Beijing 100049, China INAF—Osservatorio Astrofisico di Torino, Strada Osservatorio 20, I-10025 Pino Torinese, Italy Sydney Institute for Astronomy, School of Physics A28, The University of Sydney, Sydney, NSW 2006, Australia Jifeng Liu Key Laboratory of Optical Astronomy, National Astronomical Observatories, Chinese Academy of Sciences, 20A Datun Road, Chaoyang District, Beijing 100101, People’s Republic of China College of Astronomy and Space Sciences, University of Chinese Academy of Sciences, Beijing 100049, China Institute for Frontiers in Astronomy and Astrophysics, Beijing Normal University, Beijing 102206, China New Cornerstone Science Laboratory, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, People’s Republic of China

Abstract

Light curves serve as a valuable source of information on stellar formation and evolution. With the rapid advancement of machine learning techniques, it can be effectively processed to extract astronomical patterns and information. In this study, we present a comprehensive evaluation of deep-learning and large language model (LLM) based models for the automatic classification of variable star light curves, based on large datasets from the Kepler and K2 missions. Special emphasis is placed on Cepheids, RR Lyrae, and eclipsing binaries, examining the influence of observational cadence and phase distribution on classification precision. Employing AutoDL optimization, we achieve striking performance with the 1D-Convolution+BiLSTM architecture and the Swin Transformer, hitting accuracies of 94% and 99% correspondingly, with the latter demonstrating a notable 83% accuracy in discerning the elusive Type II Cepheids—comprising merely 0.02% of the total dataset. We unveil StarWhisper LightCurve (LC), an innovative Series comprising three LLM-based models: LLM, multimodal large language model (MLLM), and Large Audio Language Model (LALM). Each model is fine-tuned with strategic prompt engineering and customized training methods to explore the emergent abilities of these models for astronomical data. Remarkably, StarWhisper LC Series exhibit high accuracies around 90%, significantly reducing the need for explicit feature engineering, thereby paving the way for streamlined parallel data processing and the progression of multifaceted multimodal models in astronomical applications. The study furnishes two detailed catalogs illustrating the impacts of phase and sampling intervals on deep learning classification accuracy, showing that a substantial decrease of up to 14% in observation duration and 21% in sampling points can be realized without compromising accuracy by more than 10%.

1 Introduction

The phenomenon of light variation is a crucial aspect of astrophysics and has long been studied in the field of time-domain astronomy. Cepheid variables, a special kind of variable star, serve a critical role as a universal standard candle, enabling us to measure the distance to clusters and galaxies with their period-luminosity relation. Other types of variable stars, including RR Lyrae, $\delta$ Sct, and $\gamma$ Dor stars, are characterized by their unique absolute magnitudes, pulsation patterns, each exhibiting diverse period-luminosity relations. The classification of these pulsating stars has significantly enriched our understanding of their formation and evolution. Moreover, it has shed light on the structure and dynamics of binary, n-body systems and our Galaxy. [prsaKEPLERECLIPSINGBINARY2011, slawsonKEPLERECLIPSINGBINARY2011, matijevicKEPLERECLIPSINGBINARY2012, conroyKEPLERECLIPSINGBINARY2014a, conroyKeplerEclipsingBinary2014b, lacourseKeplerEclipsingBinary2015, kirkKEPLERECLIPSINGBINARY2016, abdul-masihKEPLERECLIPSINGBINARY2016a]

Long periods of observations are crucial in order to understand the nature of variable stars. To carry out these observations, it is important to make wise use of telescope time that is a valuable resource [garcia-piquerEfficientSchedulingAstronomical2017]. One such strategy involves the use of simulations to create plans that optimally allocate available telescope time, considering overheads and other factors. Another strategy employs autonomous agents to optimize the observation of time-varying phenomena [saundersOptimalObservingAstronomical]. In this study, we aim to understand the importance tied to different phases of variables and the classification accuracy provided by varying cadences. This can help us to establish robust guidelines for future star gazing observations [wangTransferLearningApplied2023].

A substantial amount of labeled data is required, in order to understand the significance of phase and cadence for different variable stars. In recent years, astronomy has entered the era of big data. For example, the Zwicky Transient Factory (ZTF) [bellmZwickyTransientFacility2014] at Palomar Observatory scans the sky every two days with a 1.2m telescope , generating 1TB of raw data each night [mahabalMachineLearningZwicky2019a]. The upcoming SiTian survey [liuSiTianProject2021], aims to monitor over 1,000 $deg^{2}$ of sky every 30 minutes using fifty 1m class Schmidt telescopes, and is expected to produce around 140TB of processed data each night.

As the volume of data grows, there’s a increasing requirement for efficient and automated interpretation and analysis methods. Deep learning, a component of machine learning, has risen as a powerful tool for image and signal processing [lecunDeepLearning2015]. It can learn and extract features from data [krizhevskyImageNetClassificationDeep2012a, lecunDeepLearning2015], making it an ideal solution for managing large, complex datasets. Specifically, recurrent neural networks (RNNs) have shown great effectiveness in processing time series data [liptonCriticalReviewRecurrent2015], while convolutional neural networks (CNNs) have demonstrated their superiority in image processing tasks [krizhevskyImageNetClassificationDeep2012a, heDeepResidualLearning2015]. Moreover, the transformer architecture, with its attention mechanism [vaswaniAttentionAllYou2017], has shown remarkable performance across various applications [devlinBERTPretrainingDeep2019a, radfordImprovingLanguageUnderstanding, radfordLanguageModelsAre, brownLanguageModelsAre2020]. RNNs originally developed by [elmanFindingStructureTime1990], led to Simple RNNs, which were later enhanced by LSTMs [hochreiterLongShortTermMemory1997] to address long-term dependencies. These were further simplified by GRUs [choPropertiesNeuralMachine2014] for computational efficiency. CNNs initially introduced for hand-written digit recognition [lecunBackpropagationAppliedHandwritten1989], and have evolved with architectures like AlexNet, VGGNet, and ResNet. Recent advancements include EfficientNet [tanEfficientNetRethinkingModel2020, tanEfficientNetV2SmallerModels2021], which optimizes depth, width, and resolution for model accuracy and efficiency.

In recent years, Transformer architecture has shown remarkable performance across a diverse range of applications.Initially introduced to address sequential data processing tasks [vaswaniAttentionAllYou2017]. Transformer use self-attention mechanisms to model relationships with sequences, enabling them to capture long-range dependencies and relationships between sequence elements. The success of the Transformer in natural language processing has sparked interest in their potential application in other domains, such as computer vision and speech recognition. An example of a vision Transformer is the Swin Transformer [liuSwinTransformerHierarchical2021, liuSwinTransformerV22022], which has optimized the computation of attention mechanism and shown promising results in various benchmark datasets.

Beyond the traditional reliance on extensive data for transfer learning, it is imperative to investigate the potential of LLM that incorporate both substantial datasets and large parameter counts. StarWhisper ¹¹1https://github.com/Yu-Yang-Li/StarWhisper is a LLM for astronomy, which has strong astronomical ability and instruction following ability, and can complete a series of functions such as knowledge question answering, calling multi-modal tools, and docking telescope control systems. The StarWhisper LC Series is initiated with the purpose of leveraging the experience gained from the prior training of the StarWhisper language model. It seeks to explore and discuss the potential harnessed from vast data to engender emergent properties in the analysis of light curve data. LLMs, such as the Gemini 7B model, have shown promise in adapting to new data types through fine-tuning with specific prompt templates [zhou2023fits13]. MLLMs, like the deepseek-vl-7b-chat, are adept at handling tasks involving image classification due to their extensive training on datasets containing chart data [tsai2019multimodal_6]. LALMs, such as Qwen-audio,trained on audio datasets, exhibits exceptional performance in audio classification. [yang2022voice2series_4].

In this study, we perform a comprehensive evaluation of deep-learning and LLM-based models for the classification of variable star light curves, using the data from Kepler and K2 observations. The types of variable stars and the data pre-processing are presented in Section 2. In Section 3, we utilize various classification models, including the LSTM, the GRU, the Transformer, the LightGBM, the EfficientNet, the Swin Transformer and StarWhisper LC serires. Training methods are introduced in Section 4. We also present model performance and catalogs of the phase importance and sampling intervals in Section 5. A discussion is provided in Section 6.

2 Data

2.1 Kepler and K2

The Kepler spacecraft was launched in 2009 aiming to discover for Earth-like planets [doi:10.1126/science.1185402]. It was equipped with an optical telescope a 95cm aperture and a $115.6^{\circ}$ field of view. The advanced technology allowed Kepler to precisely track light curves from 200,000 different targets. This high precision resulted in the discovery of over 2,000 planets. For stars with V band magnitude between 13mag to 14mag, the precision was 100 ppm (parts per million), while for stars with V band magnitude between 9mag to 10mag, the precision was 10 ppm. Unfortunately, after four years, half of Kepler’s four reaction wheels failed, leading to the end of its primary mission. Yet, this marked the beginning of the K2 mission. The K2 mission used the transit method to detect changes in light along the ecliptic plane and created catalogs with photometric precision closely matching that of the original Kepler mission. The observations of K2 were controlled using the remaining reaction wheels and thrusters, with each campaign limited to 80 days.

There are two types of light curve available, named Presearch Data Conditioning (PDC) and Simple Aperture Photometry (SAP) light curves [Cleve2016KeplerIH]. While SAP light curves retain long-term trends, PDC light curves were generated by the Kepler Operation Science Center, and are free from systematic errors. Therefore, we adopt PDC light curves for our analysis. Additionally, we consider two types of data with different time resolutions: long-cadence data, with a 30-minute sampling interval, and short-cadence data, with a one-minute sampling interval. Due to the limited number of short-cadence data, we focus on the long-cadence light curves. For each light curve, we remove the quarter-to-quarter differences and convert the flux into relative flux, similar to the methods used in [yangFlareCatalogFlare2019, hanStellarActivityCycles2021].

2.2 Variable Stars

Our training samples are similar to [wangTransferLearningApplied2023], including eclipsing binaries, RR Lyrae, $\delta$ Scuti, $\gamma$ Dor, and $\delta$ Scuti / $\gamma$ Dor hybrids. The type II Cepheids are included, in order to make a more universal sample. Table 1 lists our training sample and corresponding references. The training samples are seriously biased among different variable types. This bias often leads to decreased performance, especially in situations few-shot or small sample scenarios [kokolMachineLearningSmall2022, clemenconStatisticalLearningBiased2019]. However, [taniguchiMachineLearningModel2018] suggest that certain models may help overcome these challenges, although such cases seem to be the exception rather than the rule. These biased training samples offer a unique opportunity to study the algorithm’s dependence on such biases and to understand how this dependence affects the overall performance and accuracy when applied to astronomical data.

2.3 Pre-processing

A pre-processing method, as described by [wangTransferLearningApplied2023], was adopted to manage and clean the light curves, with the aim of enhancing their features and expanding the training data. The lightcurves were segmented into 10-day intervals. Any segments with gaps exceeding a day were removed, while those with gaps less than one day were interpolated with a time sequence of 30 minute intervals.

Table 1: Training Sample

Label	Input Catalog	Final Sample	References
$\delta$ Sct	1389	111528	(1), (2), (3), (4), (14), (15)
EB	2908	226937	(6), (7), (8), (9), (10), (11), (12), (13)
$\gamma$ Dor	941	65786	(14),(15)
HYB	1552	33751	(1), (2), (3), (4), (14), (15)
RR	482	9306	(5)
T2CEP	3	94	(1), (2), (3), (4), (5)
Total	7275	447402	(1)-(15)

Note

The input catalog column indicates the sources of data that we obtained from several catalogs. The final sample column represents the final sample size after applying all the pre-processing procedures. References: (1)[kholopovCombinedGeneralCatalogue1998], (2)[durlevichListErrorsGCVS1994], (3)[artyukhinaVizieROnlineData1996], (4)[samus84thNameListVariable2021], (5)[molnarGaiaDataRelease2018], (6)[prsaKEPLERECLIPSINGBINARY2011], (7)[slawsonKEPLERECLIPSINGBINARY2011], (8)[matijevicKEPLERECLIPSINGBINARY2012], (9)[conroyKEPLERECLIPSINGBINARY2014a], (10)[conroyKeplerEclipsingBinary2014b], (11)[lacourseKeplerEclipsingBinary2015], (12)[kirkKEPLERECLIPSINGBINARY2016], (13)[abdul-masihKEPLERECLIPSINGBINARY2016], (14)[bradleyRESULTSSEARCHDOR2015], (15)[balonaGaiaLuminositiesPulsating2018].

Table 2: Period and Observation Time Saving

Star	Period (d)	$\Delta_{phase}$ (%)	$t_{phase}$ (%)	$\Delta_{sampling}$ (%)	$t_{sampling}$ (%)
RR_251457011	0.509	1.06	41	8.63	50
RR_251457012	0.529	3.48	71	0	0
RR_251457013	0.542	2.47	31	1.83	50
RR_251457014	0.593	9.99	29	9.32	50
RR_251457015	0.574	0.53	55	7.82	50
RR_251457016	0.510	1.30	83	1.38	50
RR_251457020	0.478	0	0	0.76	50
RR_251457021	0.546	0.70	74	4.48	80
RR_251457022	0.508	1.56	49	0	0
RR_251457024	0.553	0	0	0	0

Note

The table shows the period of the stars (in the unit of days) along with the variation of accuracy and time saved (percentages), for both phase importance and sampling research.

An alternative approach to time-series data classification involves categorizing of images that are created from graphically represented light curves. This is achieved using transfer learning techniques. As highlighted in their result, the continuous wavelet transform (CWT) method has demonstrated superior results in imaging light curves.

Refer to caption — Figure 1: CWT images of different objects.

The Morlet wavelet was chosen as the core function for the CWT, owing to its proven effectiveness in analyzing signals that display shifts in amplitude. This analysis resulted in a collection of images that provide valuable insights into the time-frequency characteristics of the signals. Importantly, these images facilitate the identification of patterns, trends, and anomalies, thus offering a comprehensive understanding of the signals. Figure 1 shows some variable stars from the training sample.

2.4 Lomb-Scargle Periodogram

We employ the Lomb-Scargle algorithm to extract the most significant periods complete light curves, which are used to study the phase importance of periodic variables. The Lomb-Scargle algorithm, a variant of the Discrete Fourier Transform (DFT) developed by [1976Ap&SS..39..447L] and [scargleStudiesAstronomicalTime1982], has been specifically designed for unevenly sampled time-series data. It transforms a time series into a linear combination of sinusoidal waveforms, simplifying the conversion from time to frequency domain. We calculate the period by applying the Lightkurve Collaboration’s LSP method [vanderplasUnderstandingLombScarglePeriodogram2018]. Some examples are shown in Figure 2.

More specifically, we use a uniform sampling strategy in our set frequency domain to select an array of frequency points for periodicity analysis. The periodogram calculation is carried out by assessing the Power Spectral Density (PSD) at each frequency. This periodogram reveals the intensity of periodic signals across the spectrum of frequencies. Significant peaks in the periodogram indicate strong periodic signals. We determine the exact periods through the reciprocals of these frequencies. Our implementation also includes considerations for normalization methods and computational strategies. We use a specific approach to frequency sampling to ensure that the periodogram computation is both precise and efficient. The results are shown in Table 2.

3 Model Construction

In order to classify variable stars from their light curves, we have explored various deep learning models. The basic architectures of these models are shown in Figure 3.

3.1 CNN and RNN

We explore several advanced LSTM architectures. For Conv1D + BiLSTM, it combines 1-dimensional convolutional layers with bidirectional LSTM layers. The convolutional layer is used to extract features from the data [kimConvolutionalNeuralNetworks2014], while the bidirectional LSTM layer capture contextual information from both the past and future time steps [schusterBidirectionalRecurrentNeural1997]. For Conv1D + BiLSTM + Attention, the addition of an attention mechanism enables the model to focus on specific parts of the input data, improving performance by giving higher weights to more relevant parts [vaswaniAttentionAllYou2017]. This can help improve the accuracy of classification and facilitate further research on the importance of different phases for classification [salinasDistinguishingPlanetaryTransit2023].

GRU models are a variation of LSTM that are computationally more efficient as they have fewer parameters, making them easier to train and less prone to overfitting [choPropertiesNeuralMachine2014, chungEmpiricalEvaluationGated2014]. Specifically, we use the Conv1D + BiGRU architecture.

In addition to the deep learning approaches, it is also important to consider the performance of classic machine learning methods for comparison. One such method is LightGBM, a gradient-boosting decision tree (GBDT) framework that uses tree-based learning algorithms [keLightGBMHighlyEfficienta].

3.2 Transformer

Transformer are highly effective in managing data dependencies, a critical aspect of time series analysis. This results in favorable outcomes [kitaevReformerEfficientTransformer2019, liuPYRAFORMERLOWCOMPLEXITYPYRAMIDAL2022]. Their multiple attention mechanisms are skilled in identifying the most relevant parts of input data [vaswaniAttentionAllYou2017]. By combining a 1-dimensional convolutional layer with a transformer encoder layer, we can effectively capture both global dependencies and local interactions in the data. This approach addresses the transformer architecture’s limitation in detectinglocal nuances.

We have adopted the Swin Transformer for few-shot classification tasks (i.e. T2CEP). Transformers excel in modeling long-range dependencies in computer vision tasks [dosovitskiyImageWorth16x162021a, wangPyramidVisionTransformer2021] by focusing directly on an image’s essential parts [vaswaniAttentionAllYou2017]. The Swin Transformer takes this a step further; it streamlines computation by limiting attention within small windows while maintaining effectiveness [liuSwinTransformerHierarchical2021, liuSwinTransformerV22022].

Table 3: Accuracy and Macro F1-Score

Input Format	Model	Accuracy(All/T2CEP)	Macro F1-Score
Time-Series	Conv1D+Transformer	85%/17%	0.7
	LightGBM	87%/25%	0.71
	BiLSTM+Attention	93%	0.76
	Conv1D+BiGRU	93%	0.76
	Conv1D+BiLSTM	94%	0.77
CWT Image	Efficientnet	99%	0.82
	Swin Transformer	99%/83%	0.96
Textual Time-Series	LLM-based model	89%	0.80
Lightcurve Image	MLLM-based model	95%	0.94
Transformed Audio	LALM-based model	93%	0.71

Note

The table presents the accuracy and macro F1-score for various models, including the accuracy of some models specifically on T2CEP.

3.3 Efficientnet

For a more thorough comparative analysis, we have adopted an advanced CNN known as the pretrained EfficientNet. This network exceeds its predecessors by integrating optimization elements for depth, width, and resolution within its framework. In comparison, previous networks only optimized one or two of these elements [tanEfficientNetRethinkingModel2020, tanEfficientNetV2SmallerModels2021]. EfficientNet uses a diverse range of architectural techniques including depth-wise separable convolutions, squeeze-and-excitation blocks, and dynamic image scaling. These techniques are validated to boost performance while reducing computational resource requirements.

3.4 StarWhisper LC Series

3.4.1 LLM

LLMs, owing to their emergent capabilities, can learn new languages (broadly defined) through few-shot learning or fine-tuning processes, like lean3 [ying2024internlmmath]. The Gemini 7B model ²²2storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf, pre-trained on a vast corpus of text, exhibits emergent behavior, enabling it to perform tasks not explicitly covered during its initial training. Therefore, we are contemplating the integration of specific prompt template as shown in 4, to transform time series data into a form of language in a broader sense, in order to leverage these advanced models more effectively.

3.4.2 MLLM

Next, we introduce a MLLM trained on the deepseek-vl-7b-chat [lu2024deepseekvl]. MLLMs expand upon the capabilities of LLMs by incorporating the ability to process and interpret visual information alongside text. The deepseek-vl-7b-chat model, with its extensive training on chart data, is particularly well-suited for tasks involving time series image classification, making it a valuable addition to the StarWhisper LC series.

3.4.3 LALM

The final model in the series is based on audio processing, utilizing the Qwen-audio [chu2023qwenaudio]. This model has been specifically enhanced for audio classification tasks, offering a novel approach to time series analysis by converting time series data into audio signals. A sampling frequency of 500Hz is used to transform the normalized time-series data into audio signals, which possess distinct characteristics that aid in classification.

Table 4: Prompt and Output Template

Instruction

Given the normalized flux data at 0.02 day intervals (delimited by the triple backticks), classify the TYPE of light curve. Return the answer in an Assignment Statement, containing ONE variable: TYPE. Only return the classification, not the Python code.

‘‘‘{flux}‘‘‘

Output

TYPE = ’Label’

•

For other LLM-based models, replace “Python code” with “description,” and “flux” with “image path or audio path.”

4 Training

4.1 Auto deep learning methods

Auto deep learning methods are a set of techniques that aim to automate the process of designing, training, and evaluating neural network models. These methods typically use techniques such as Bayesian optimization [snoekPracticalBayesianOptimization2012], reinforcement learning [mnihHumanlevelControlDeep2015], or evolutionary algorithms [realLargeScaleEvolutionImage2017] to optimize the model architecture and hyperparameters for the best possible performance.

Bayesian optimization is a powerful approach for hyperparameter tuning and exploration[bergstraAlgorithmsHyperParameterOptimization2011]. It utilizes Gaussian Processes Regression (GPR) or Tree-structured Parzen Estimators (TPE) to model the probability distribution of historical data and employs the Sequential Model-Based Optimization (SMBO) framework for iterative hyperparameter selection.

The SMBO process is an iterative method for hyperparameter optimization that consists of four main steps [10.1007/978-3-642-25566-3_40]. First, a probability distribution model is built based on the existing tuning history. Next, an acquisition function, such as Expected Improvement (EI), is employed to select the next hyperparameter. Following this, the new observation is incorporated into the existing tuning history. The process is then repeated until the predefined maximum number of iterations is reached.

In this study, we used Auto deep learning to automate the process of training and evaluating deep learning models. Specifically, 80% of the data was allocated for training and the remaining 20% was used as a validation set. Bayesian optimization allow us to efficiently find the best set of hyperparameters[akibaOptunaNextgenerationHyperparameter2019].

4.2 LLM-based models training

For LLM, the light curve data was prepared to fit within the model’s context length limitations and underwent a normalization process. The fine-tuning process involves all layers of the model using the Qlora technique [dettmers2023qlora] a highly efficient fine-tuning method that achieves near-full fine-tuning performance with significantly reduced memory requirements.

The MLLM’s training involved a comprehensive fine-tuning process that adjusted both the textual and visual processing components of the model. This approach is necessary to effectively handle the multimodal nature of the task, leveraging the model’s pre-trained knowledge of visual patterns for time series image classification.

For the LALM, the success of the training phase hinge on the conversion of time series data into audio signals. This innovative approach allow the model to classify light curves transformed into audio signals, demonstrating the potential of audio-based analysis in scientific research.

5 Result

5.1 Performance

The confusion matrices (Appendix) serve as the basis for computing numerous performance. $\rm F_{1}-$ score, a balance indicator derived from precision and recall, is calculated as their harmonic mean. We here calculate the Macro $-\rm F_{1}-$ score [DBLP:journals/corr/abs-1911-03347] to evaluate the model’s overall performance, which averages the F1 scores across all classes, making it a vital measure when working with imbalanced data.

For transfer learning, both models achieved an accuracy of 99 $\%$ . The Swin Transformer achieved the highest Macro $-\rm F_{1}-$ score of 0.96, demonstrating high effectiveness even with a small sample size. It achieved an 83 $\%$ accuracy rate on T2CEP, a class comprising only 94 out of 447,402 samples. For the non-pretrained models, RNN-based models showed efficiency while maintaining high accuracy levels. The Conv1D+BiLSTM model showed the best performance with an accuracy of 94 $\%$ . It achieved a Macro-F1 score of 0.77, comparable to that of the pre-trained Efficientnet. This proves its ability to manage imbalanced data.

Transfer learning’s good performances are due to preprocessing via the CWT, a technique that simplifies the process of initial feature extraction. Its efficiency is further supported by its prior training on a numerous of images, enhancing its feature extraction capability. However, the HYB category performs short relative to others, when evaluating classification accuracy for variable stars. This underperformance could be due to the intrinsic complexity of the HYB category, resulting in an overlap of features and subsequently complicating the classification process [10.1093/mnras/sty1511, wangTransferLearningApplied2023].

The StarWhisper LC series also yielded impressive results in the classification of light curves, showcasing the robust capabilities of LLM-based models in scientific data analysis. The LLM, despite the significant data reduction achieved by trimming the time-series data to input samples of 0.2d and normalizing the data’s precision to one part in a hundred thousand, attained an accuracy rate of approximately 89%. This demonstrates the model’s ability to efficiently process and analyze condensed time-series data without substantial loss of information. The MLLM, which did not utilize CWT remarkably achieved a 95% accuracy rate, underscoring its inherent strength in handling multimodal data, including chart data akin to time series image classification. Furthermore, it exhibits excellent classification performance on small samples like T2CEP, with an $\rm F_{1}-$ score of 0.94, which further validates the sensitivity of image-based models to small sample sizes. The LALM’s innovative approach of converting time-series data into audio signals for classification led to a commendable accuracy rate of 93%, highlighting the potential of audio-based analysis in scientific research. These results collectively emphasize the effectiveness and versatility of the StarWhisper LC series in leveraging transfer learning and LLM-based model for high-accuracy classification tasks in the realm of astrophysics.

5.2 Catalogs

5.2.1 Phase Importance

Using the Conv1D + BiLSTM model, we assess the impact on accuracy by obscuring observation points in a specific phase. We set the zero phase to correspond to the peak of the light curve. The resulting decrease in the model’s confidence level for accurate classification helped us measure the importance of each phase. Table 5 lists the importance with an interval of 0.1 phase. Figure 5, Figure 6 and Figure 7 reveal that the key features for GDOR, DSCT, and HYB, are primarily located in the phase interval following the peak flux. For EB and RR shown in the Figure 8 and Figure 9, the main concentration is detected in the phase interval during which the flux returns to its peak. According to the heatmap distribution, significant or dark intervals predominantly occur in the first half of the EB and RR. In contrast, DSCT and HYB show two distinct phase intervals localized within them. We discuss the time saving associated with this and present a catalog in Section 6.

5.2.2 Sampling

The observation schedule is significantly influenced by sampling, which itself is limited by the telescope’s clear aperture, the CCD’s read-out speed, the scientific object of the survey. We calculated the sampling-importance relations for the variables with a single prime frequency. We adjusted the sampling rate by changing the number of points per period. The effect on classification accuracy is shown in Figure 10.

We found that for EB, accuracy can remain above 75 $\%$ even if sampling points are decreased to one-tenth. For RR and DSCT stars, halving the sampling rate still maintains an accuracy of around 80 $\%$ . This can be explained by the phase importance: except for EB, the key features are located in only a few specific phases. It is probably that there is a significant decrease in the prediction model’s accuracy beyond a certain sampling rate. This is likely due to an increase in the sampling interval, leading to the model’s failure to capture former phase features. The observed correlation between sampling and phase importance further substantiates the reliability of the method we employed.

We also found that at a specific lower sampling interval, some variables exhibit a decline in accuracy with increasing sampling. We suggest that within this sampling interval, the specific instrumental noise may have led to a decline in accuracy, given that accuracy generally increases with higher sampling rates beyond this range. This suggests a critical sampling threshold where feature capture is optimized before noise becomes predominant. Table 6 list the accuracies corresponding to different sampling rates.

6 Discussion

6.1 Observation time Saving

By correlating classification accuracy across variables with phase importance, we sequentially eliminate multiple phase intervals in ascending order of significance. This process help us estimate the maximum observational time that can be saved for each star type. It is suggest that, on average, a 14 $\%$ reduction in observation time can be achieved across different variables, provided the accuracy variation remains within 10 $\%$ over a given observation period. Specifically, RR and EB stars can save an average of 44 $\%$ and 29 $\%$ of observation time, respectively.

In addition, we link classification accuracy with the number of sampling points within each period to assess the potential decrease in observation time by reducing the sampling rate. Given that the accuracy variation within a single observation period remains under 10 $\%$ , our findings indicate an average potential reduction of 21 $\%$ in the number of sampling points for observations. Among these, EB stars can benefit from an average reduction of 54 $\%$ in the number of sampling points required for observations. The results are presented in Table 2.

Our proposed RNN-based models show substantial performance without the need for image preprocessing. Furthermore, using automated deep learning enables us to more efficiently identify suitable hyperparameters. Additionally, we take into consideration the impact of masked points or unobserved points within the complete phase on classification accuracy. For EB stars with sampling points reduced to just 10 $\%$ , and DSCT stars with sampling halved, the accuracy consistently hovers around 0.75. And RR stars maintain an accuracy of over 0.85 even when its sampling is reduced by half. As a result, our method is apt for more efficient, and potentially real-time, astronomical time-series recognition scenarios, such as exoplanets [salinasDistinguishingPlanetaryTransit2023] and transients [muthukrishnaRealTimeDetectionAnomalies2022].

6.2 Learning form imbalanced data

Imbalanced samples are a common occurrence in astronomical data. In this study, for the first time, we have applied transfer learning models that were previously trained on a significant volume of image data to augment the recognition capabilities for these less prevalent variable stars. Notably, the Swin Transformer and MLLM yield particularly fairly good results.

Additionally, we find that the self-attention mechanism can significantly enhance the recognition ability to recognize small samples. We will further consider applying data augmentation, resampling, pre-training, and other methods in self-attention-based deep learning models, in order to improve their performance on tasks involving imbalanced data, such as the TESS variable star.

6.3 Time serises as Language

Our research highlights the potential of leveraging the emergent capabilities of LLM-based models for the processing of light curves, a task that requires rapid convergence as shown in 4, resistance to overfitting, and minimal susceptibility to data quality variations [liuSiTianProject2021]. By integrating these models with additional capability modules, such as visual and audio encoding, we aim to enhance their performance and applicability in the domain of astronomy. This approach not only introduces a novel analytical method for astronomical data but also explore the development of multimodal models that can process a variety of astronomical inputs.

The optimization of LLMs for inferential tasks, coupled with their capacity for parallel and rapid data processing [kwon2023efficient], underscores their utility in handling the vast and complex datasets encountered in astronomy field. There is a promising prospect of training specialized astronomical encoding modules that build upon the robust foundation of LLMs. Such modules could be tailored to interpret and analyze astronomical phenomena with a high degree of accuracy and efficiency.

Future, we will focus on refining the models by adjusting parameters such as data volume, sampling points, precision, and temporal length. This will enable us to delve deeper into the feature extraction thresholds of the models and further enhance their capabilities for astronomy-specific tasks. Moreover, the potential for applying this methodology to other time series tasks is significant, offering a versatile path for developing multi-task applications that operate on a multimodal basis.

6.4 SiTian project

The SiTian prototype, introduced in [liuSiTianProject2021], is scheduled to release its internal data. The data contains information from three bands, and we aim to construct its light curves using numerical simulations and evaluate their classifications using deep learning techniques. Given the large volume of data, striking a balance between prediction time and accuracy is essential.

Highly accurate models, particularly sensitive to smaller samples, can significantly enhance SiTian’s capabilities in identifying variable stars. The catalogs we’ve developed related to phase importance and sampling provide insights into optimal observation intervals and sampling rates during monitoring. This allows for a calculated tradeoff between model accuracy and the time costs associated with training and prediction, ensuring efficient and effective astronomical analyses.

Acknowledgments

The research presented in this paper was generously funded by the National Programs on Key Research and Development Project, with specific contributions from grant numbers 2019YFA0405504 and 2019YFA0405000. Additional support came from the National Natural Science Foundation of China (NSFC) under grants NSFC-11988101, 11973054, and 11933004. We also received backing from the Strategic Priority Program of the Chinese Academy of Sciences, granted under XDB41000000. Special acknowledgment goes to the China Manned Space Project for their science research grant, denoted by NO.CMS-CSST-2021-B07.

JFL extends gratitude for the support received from the New Cornerstone Science Foundation, particularly via the NewCornerstone Investigator Program, and the honor of the XPLORER PRIZE.

This research incorporates data sourced from the Kepler mission, with its funding being attributed to the NASA Science Mission Directorate. We sourced all data for this study from the Mikulsk Archive for Space Telescopes (MAST). The operation of STScI is overseen by the Association of Universities for Research in Astronomy, Inc., under the NASA contract NAS5-26555. The MAST’s support for non-HST data comes through the NASA Office of Space Science, notably grant NNX09AF08G, and various other grants and contracts.

Appendix

In Figures 11 through 17, we present the confusion matrices of different models. The main body of each matrix, represented by varying shades of blue, illustrates the number of each crossed object in the classification model.

\printbibliography

Table 5: Phase Importance Catalog

Star/Focused phase	0	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
DSCT_001026294	0.9996	0.9934	1.0000	1.0000	0.9995	0.9986	0.9986	0.9692	0.6977	0.0000
DSCT_001162150	0.1585	0.1310	1.0000	0.7921	0.3214	0.1172	0.1245	0.0323	0.0184	0.0000
DSCT_001163943	0.3598	0.1016	1.0000	0.4261	0.0467	0.1701	0.0711	0.0031	0.0393	0.0188
DSCT_001294670	0.3018	0.3963	1.0000	0.6999	0.0631	0.0211	0.0327	0.0001	0.0034	0.0004
DSCT_001430590	0.1579	0.2669	1.0000	0.5738	0.4043	0.0964	0.0573	0.0266	0.0102	0.0041
DSCT_001434660	0.0923	0.1533	1.0000	0.6423	0.6384	0.1624	0.0591	0.0247	0.0037	0.0142
DSCT_001570023	0.0688	0.1608	1.0000	0.5841	0.4261	0.0822	0.0513	0.0040	0.0000	0.0027
DSCT_001571717	0.9996	0.9994	1.0000	0.9999	1.0000	0.9997	1.0000	0.9942	0.9730	0.0000
DSCT_001572768	0.0000	0.9147	0.9228	0.9773	0.9816	0.8840	0.8007	0.8885	0.4628	0.4773
DSCT_001575977	0.1454	0.6365	1.0000	0.8420	0.2347	0.3248	0.3394	0.0303	0.0057	0.0241

•

Phase-importance catalog illustrates the relationship between importance and phase, with the first ten rows shown.

Table 6: Sampling Catalog

Star/Sampling Rate	0.02d	0.04d	0.06d	0.08d	0.1d	0.12d	0.14d	0.16d	0.18d	0.2d
DSCT_001026294	0.9288	0.4412	0.0002	0.0003	0.0000	0.0000	0.0434	0.0847	0.0998	0.0141
DSCT_001162150	0.9817	0.9598	0.1599	0.6392	0.3082	0.1531	0.4663	0.2866	0.1713	0.1649
DSCT_001163943	0.9891	0.7806	0.6678	0.5106	0.5077	0.6081	0.6654	0.6412	0.5840	0.4549
DSCT_001294670	0.9985	0.9980	0.9521	0.7222	0.3226	0.6149	0.5517	0.5106	0.5157	0.2001
DSCT_001430590	0.9935	0.9918	0.2952	0.6864	0.6574	0.2160	0.1984	0.5479	0.2953	0.3322
DSCT_001434660	0.9746	0.9851	0.8613	0.3703	0.4243	0.3810	0.5342	0.2130	0.3090	0.2803
DSCT_001570023	0.9944	0.9897	0.8082	0.8866	0.4413	0.2945	0.5183	0.4653	0.2904	0.1887
DSCT_001571717	0.9656	0.0958	0.0018	0.0159	0.0000	0.0001	0.3438	0.0925	0.0448	0.0198
DSCT_001572768	0.9253	0.3443	0.1441	0.1154	0.1095	0.0479	0.0768	0.0454	0.1152	0.1194
DSCT_001575977	0.9839	0.7853	0.6452	0.3186	0.3033	0.3250	0.2376	0.3029	0.2770	0.2040

•

Sampling catalog illustrate the relationship between accuracy and sampling rate, with the first ten rows shown here