An algorithm that labels galaxies

Classifying galaxies currently needs to be done manually, requiring a lot of time from astronomers and citizen scientists. But a team of Australian astrophysicists has now developed a machine-learning algorithm that should speed the process up considerably.

“Galaxies come in different shapes and sizes,” says Mitchell Cavanagh, a PhD candidate at the  University of Western Australia branch of the International Centre for Radio Astronomy Research (ICRAR), and lead author on a paper describing the research, published in Monthly Notices of the Royal Astronomical Society.

“Classifying the shapes of galaxies is an important step in understanding their formation and evolution, and can even shed light on the nature of the Universe itself.”

Four galaxies. One is a tiny dot, two are disks and one is a crescent
Different shapes of galaxies, left to right: elliptical, lenticular, spiral, and irregular/miscellaneous. Credit: NASA/Hubble (elliptical galaxy M87), ESA/Hubble & NASA (lenticular galaxy NGC 6861 and the colliding Antennae galaxies), and David Dayag (the Andromeda spiral galaxy).

As telescopes improve, the volume of data on new galaxies is becoming too overwhelming for astronomers to deal with.

“We’re talking several million galaxies over the next few years. Sometimes citizen scientists are recruited to help classify galaxy shapes in projects like Galaxy Zoo, but this still takes time,” says Cavanagh.

Cavanagh and colleagues have addressed this by developing a program based on a convolutional neural network, or CNN. These neural networks are particularly useful for processing visual data, because of the way they layer information.

Lots of small galaxies in green and purple
The power of CNNs lies in their ability to extract features in images. Within the computer program, the convolutional layers are able to outline, trace and detect the presence of spiral arms or other features. Credit: Mitchell Cavanagh/ICRAR

“Each convolutional layer applies a variety of filters to the image to create feature maps,” says Cavanagh. “Think of Adobe Photoshop, where you might want to sharpen edges or apply a smooth blur.”

“What makes CNNs so versatile is that the filters used to extract these features are not hard-coded at all; in fact, they start off completely random!”

CNNs have previously been used by astronomers to classify galaxies, but only in binary cases – whether a galaxy is a spiral galaxy or not, for instance. This neural network will use multiclass classification, making it more accurate than existing networks.

“The massive advantage of neural networks is speed,” says Cavanagh. “Using a standard graphics card, we can classify 14,000 galaxies in less than three seconds.”

The network has been trained on galaxy data generated by people. This means it will not necessarily be more accurate than humans (its overall accuracy is 80%), but it will be much faster.

“This inherent uncertainty is the limiting factor in any AI model trained on labelled data.”

Cavanagh adds that another limiting factor in the network’s accuracy is that galaxies that don’t fit neatly into these categories. “There are many different types (and subtypes!) of galaxies, as the Hubble tuning fork will attest to. Even if we were to group them into overarching categories such as ‘Elliptical’, ‘Lenticular’ or ‘Spiral’, there will almost always be some overlap, and some disagreement.

“The biggest barrier with the CNN is accurately classifying irregular galaxies. As the name suggests, this category is necessarily broad, covering everything from odd-shaped clumps to galaxies undergoing massive tidal disruption. It’s no surprise then that the CNN misclassifies many irregulars.”

He points out that the data they’ve trained the galaxies on under-represents these irregular galaxies, which may also affect the network’s accuracy.

Four small galaxies in green. They say initial, crop, mirror and rotate
Being able to distinguish a lenticular galaxy from the other types can be difficult for human eyes, but the convolutional layers look for features we can’t see. Also, a CNN never tires, and if the image is flipped or rotated, that won’t cause the CNN to make a mistake. Credit: Mitchell Cavanagh/ICRAR

While the neural network can speed things up, it relies on data from citizen science astronomy projects.

“Citizen science initiatives are extremely useful for astronomers, as the success of Galaxy Zoo and its sequel Galaxy Zoo 2 have shown. The ICRAR-led AstroQuest citizen science project also aims to help inspect many tens of thousands of galaxies,” says Cavanagh.

“Another often-overlooked benefit of citizen science is the availability of large-population statistics. It’s then easy to see which galaxies contributors found easy to classify (nearly unanimous selections) and which were harder to classify (broad spread of selections). The harder-to-classify galaxies can then be selected for more targeted analysis. It’s highly likely that such initiatives will continue as more large-scale surveys go online.”

CNNs could be used in other fields, if given big enough datasets to train with.

“CNNs need not just apply to optical images of galaxies, they can just as easily work with radio images too, which will be useful with the imminent arrival of the Square Kilometre Array (SKA),” says Cavanagh.

“CNNs will play an increasingly important role in the future of data processing, especially as fields like astronomy grapple with the challenges of big data.”


Read more:

Please login to favourite this article.