Guide for Fritz Scanners

This page is a guide the SCoPe classification process. It contains sections on the classification taxonomies we use, definitions of each classification that may be posted to Fritz, An explanation of the binary classifier algorithms we train and the workflow we run on transient candidates, and plots of each classifer’s current precision and recall scores.

Two classification taxonomies

The goal of SCoPe is to use machine learning algorithms to reliably classify each ZTF source with as much detail as possible. The level of classification detail will vary across the broad range of ZTF sources. Factors that can affect the level of source classification include the quantity and quality of the data, the similarity of the training set to the source in question, and the existence of new kinds of variable sources in the data. With this in mind, we adopt two kinds of taxonomies which contain the labels we use to classify ZTF sources.

Ontological classifications

The first taxonomy is ontological and contains specific kinds of astrophysical sources. On Fritz, this is called Sitewide Taxonomy. See the table below for the current ontological classifications, training set abbreviations and definitions, ordered by low to high detail:

classification

abbreviation

definition

pulsator

puls

Pulsating star

AGN

agn

Active Galactic Nucleus

YSO

yso

Young Stellar Object

CV

cv

Cataclysmic Variable

binary

bis

binary system

Cepheid

ceph

Cepheid variable star

Delta Scu

dscu

Delta Scu star

Pop II Cepheid

ceph2

Population II Cepheid variable star

RR Lyr

rrlyr

RR Lyr star

LPV

lpv

Long Period Variable star

MS-MS

emsms

Eclipsing MS-MS binary

W UMa

wuma

W UMa binary system

Beta Lyr

blyr

Beta Lyr binary

RS CVn

rscvn

RS CVn binary

BL Her

blher

BL Her-type Cepheid variable star

RRab

rrab

RR Lyr ab star

RRc

rrc

RR Lyr c star

RRd

rrd

RR Lyr d star

Mira

mir

Mira variable star

SRV

srv

Semi-regular variable star

OSARG

osarg

OGLE small-amplitude red giant star

W Vir

wvir

W Vir-type Cepheid variable star

Refer to the field guide for more information about these classes.

Phenomenological classifications

In consideration of the importance of having some information about a source (even if not a definitive ontological classification), we also employ a phenomenological taxonomy with labels that describe light curve-based features. This taxonomy is called SCoPe Phenomenological Taxonomy on Fritz. See the table below for the current phenomenological classifications, training set abbreviations and definitions:

classification

abbreviation

definition

variable

vnv

Light curve shows variability

periodic

pnp

periodic variability

irregular

i

irregular variability

eclipsing

e

eclipsing phenomenology

sinusoidal

sin

sinusoidal phenomenology

sawtooth

saw

sawtooth phenomenology

long timescale

longt

long timescale variability

flaring

fla

flaring phenomenology

EA

ea

EA eclipsing phenomenology

EB

eb

EB eclipsing phenomenology

EW

ew

EW eclipsing phenomenology

bogus

bogus

bogus variability

blend

blend

blended sources phenomenology

extended

ext

extended source

Refer to the field guide for more information about these classes.

Independent binary classifiers

We train a binary classifier for every label in these taxonomies. This choice allows more than one classification to be assigned to a source, often with varying levels of detail. This is important not only due to the practical challenges outlined above, but also because some sources merit more than one classification (e.g. an eclipsing binary system containing a flaring star). The independence of binary classifiers allows for future updates to the taxonomies without a revision of the current results from each existing classifier.

We classify each ZTF light curve separately in recognition of systematics that may exist between ZTF fields and bands. Before posting results to Fritz, we aggregate these classification results on a source-by-source basis. The details of this workflow are described in the next section.

Classification process

Machine learning algorithms/training

We currently employ a convolutional/dense neural network (DNN) and gradient-boosted decision trees (XGBoost, XGB) to perform classification. The process is initially a regression problem, with classifiers assigning a classification probability that ranges between 0 and 1 for each source. We then apply a probability threshold to determine whether to include each source as a positive or negative example when minimizing the binary cross-entropy loss function.

We trained each binary classifier algorithm using a training set containing ~80,000 sources labeled manually (~170,000 light curves). The training set is available on Fritz in group 1458 (Golden Dataset Unique Sources).

Repeated workflow for transients

The following SCoPe workflow currently runs every two hours as a cron job:

  • Query Fritz for GCN events within the last 7 days

  • For each event, query all candidates within the 95% confidence localization

  • For each candidate, query existing ZTF DR16 light curves within 0.5 arcsec

  • For ZTF light curves with 50 or more epochs of data, generate SCoPe features and run through all trained binary classifiers (DNN and XGB)

  • Consolidate light curve classifications by matching Gaia, AllWISE or Pan-STARRS1 IDs, computing the mean probabilities among all light curves for a source.

    • Each source will now have a set of classifications from both the DNN and XGB algorithms.

  • For each ZTF source, compute mean classification probabilities between DNN and XGB results.

  • For classifications having a mean probability ≥ 0.7, post to the candidate page.

    • SCoPe classifications will be color-coded with blue text (instead of the default black) and will be preceded by the ML: prefix.

    • Note that these classifications do not pertain to the candidate itself, but persistent ZTF sources within 0.5 arcsec.

    • The time series and phase-folded ZTF light curves used for classification are posted as comments on their associated candidate.

Classifier performance

The bar plots below show the precision and recall metrics for the DNN and XGB classifiers. ‘Missing’ bars indicate classifiers which did not have enough examples to train successfully. dnn classifier precision/recall xgb classifier precision/recall