Classifying galaxies currently needs to be done manually, requiring a lot of time from astronomers and citizen scientists. But a team of Australian astrophysicists has now developed a machine-learning algorithm that should speed the process up considerably.
“Galaxies come in different shapes and sizes,” says Mitchell Cavanagh, a PhD candidate at the University of Western Australia branch of the International Centre for Radio Astronomy Research (ICRAR), and lead author on a paper describing the research, published in Monthly Notices of the Royal Astronomical Society.
“Classifying the shapes of galaxies is an important step in understanding their formation and evolution, and can even shed light on the nature of the Universe itself.”
As telescopes improve, the volume of data on new galaxies is becoming too overwhelming for astronomers to deal with.
“We’re talking several million galaxies over the next few years. Sometimes citizen scientists are recruited to help classify galaxy shapes in projects like Galaxy Zoo, but this still takes time,” says Cavanagh.
Cavanagh and colleagues have addressed this by developing a program based on a convolutional neural network, or CNN. These neural networks are particularly useful for processing visual data, because of the way they layer information.
“Each convolutional layer applies a variety of filters to the image to create feature maps,” says Cavanagh. “Think of Adobe Photoshop, where you might want to sharpen edges or apply a smooth blur.”
“What makes CNNs so versatile is that the filters used to extract these features are not hard-coded at all; in fact, they start off completely random!”
CNNs have previously been used by astronomers to classify galaxies, but only in binary cases – whether a galaxy is a spiral galaxy or not, for instance. This neural network will use multiclass classification, making it more accurate than existing networks.
“The massive advantage of neural networks is speed,” says Cavanagh. “Using a standard graphics card, we can classify 14,000 galaxies in less than three seconds.”
The network has been trained on galaxy data generated by people. This means it will not necessarily be more accurate than humans (its overall accuracy is 80%), but it will be much faster.
“This inherent uncertainty is the limiting factor in any AI model trained on labelled data.”
Cavanagh adds that another limiting factor in the network’s accuracy is that galaxies that don’t fit neatly into these categories. “There are many different types (and subtypes!) of galaxies, as the Hubble tuning fork will attest to. Even if we were to group them into overarching categories such as ‘Elliptical’, ‘Lenticular’ or ‘Spiral’, there will almost always be some overlap, and some disagreement.
“The biggest barrier with the CNN is accurately classifying irregular galaxies. As the name suggests, this category is necessarily broad, covering everything from odd-shaped clumps to galaxies undergoing massive tidal disruption. It’s no surprise then that the CNN misclassifies many irregulars.”
He points out that the data they’ve trained the galaxies on under-represents these irregular galaxies, which may also affect the network’s accuracy.
While the neural network can speed things up, it relies on data from citizen science astronomy projects.
“Citizen science initiatives are extremely useful for astronomers, as the success of Galaxy Zoo and its sequel Galaxy Zoo 2 have shown. The ICRAR-led AstroQuest citizen science project also aims to help inspect many tens of thousands of galaxies,” says Cavanagh.
“Another often-overlooked benefit of citizen science is the availability of large-population statistics. It’s then easy to see which galaxies contributors found easy to classify (nearly unanimous selections) and which were harder to classify (broad spread of selections). The harder-to-classify galaxies can then be selected for more targeted analysis. It’s highly likely that such initiatives will continue as more large-scale surveys go online.”
CNNs could be used in other fields, if given big enough datasets to train with.
“CNNs need not just apply to optical images of galaxies, they can just as easily work with radio images too, which will be useful with the imminent arrival of the Square Kilometre Array (SKA),” says Cavanagh.
“CNNs will play an increasingly important role in the future of data processing, especially as fields like astronomy grapple with the challenges of big data.”
Read more: