(date: 2024-11-10 09:48:04)
date: 2024-11-10, from: ETH Zurich, recently added
Müller, Lukas; Rothacher, Markus; Soja, Benedikt; Jäggi, Adrian; Arnold, Daniel
http://hdl.handle.net/20.500.11850/704477
date: 2024-11-09, from: ETH Zurich Research Archives
Visit or Coffee Lecture on 27 November. Read more
https://rc-blog.ethz.ch/en/coffee-lecture-protecting-yourself-from-fraudulent-publishers/
date: 2024-11-08, from: SciELO in Perspective
The SciELO Accessibility Interdisciplinary Working Group has been developing actions for the production and dissemination of open science with accessibility, making improvements to the sites that use the SciELO methodology, awareness-raising activities and partnerships with publishing teams. These and other practices are planned for the next four years. … Read More →
The post Accessibility in the SciELO Program: current status and future prospects – Part 1 first appeared on SciELO in Perspective.
https://blog.scielo.org/en/2024/11/08/accessibility-in-the-scielo-program-pt-1/
date: 2024-11-08, from: Scholarly Kitchen
In light of recent events, we revisit Karin Wulf’s 2022 post which declared that universities need democracy, and vice versa, and discussed an important book which shows the 20th century history of that relationship in the United States, and offers a prescription for what we do as both are imperiled.
The post It’s a New World? Revisiting What Universities — and Researchers, Libraries, and Publishers — Owe Democracy appeared first on The Scholarly Kitchen.
date: 2024-11-07, from: Scholarly Kitchen
Digital accessibility to the scholarly communications process is core to providing equitable access to the literature.
The post Paywalls are Not the Only Barriers to Access: Accessibility is Critical to Equitable Access appeared first on The Scholarly Kitchen.
https://scholarlykitchen.sspnet.org/2024/11/07/paywalls-are-not-the-only-barriers-to-access/
date: 2024-11-07, from: ROR Research ID Blog
The fourth blog post about metadata matching by ROR’s Adam Buttrick and Crossref’s Dominika Tkaczyk explains how to measure the quality of different matching strategies with an evaluation dataset and metrics.
https://ror.org/blog/2024-11-06-how-good-is-your-matching/
date: 2024-11-06, from: Scholarly Kitchen
A diverse panel of researchers shared their first-hand publishing experiences at the 2024New Directions seminar.
The post First-Hand Publishing Experiences: Researcher Panel at SSP’s New Directions Seminar appeared first on The Scholarly Kitchen.
date: 2024-11-06, from: ETH Zurich, recently added
Reisach, Dominik
http://hdl.handle.net/20.500.11850/703142
date: 2024-11-06, from: Crossref Blog
In our previous blog post in this series, we explained why no metadata matching strategy can return perfect results. Thankfully, however, this does not mean that it’s impossible to know anything about the quality of matching. Indeed, we can (and should!) measure how close (or far) we are from achieving perfection with our matching. Read on to learn how this can be done!
How about we start with a quiz? Imagine a database of scholarly metadata that needs to be enriched with identifiers, such as ORCIDs or ROR IDs. Hopefully, by this point in our series this is recognizable as a classic matching problem. In searching for a solution, you identify an externally-developed matching tool that makes one of the below claims. Which of the following would demonstrate satisfactory performance?
Okay, okay, trick question. The correct answer here is to opt for secret answer #4: “I wouldn’t be satisfied by any of these claims!” Let’s dig in a bit more to why this is the correct response.
Before we decide to integrate a matching strategy, it is important to understand as much as possible about how it will perform. Whether it is used in a semi or fully automated fashion, metadata matching will result in the creation of new relationships between things like works, authors, funding sources, and institutions. Those relationships will then, in turn, be used by the consumers of this metadata to guide their understanding and perhaps even to make important decisions about those same entities. As organisations providing scholarly infrastructure, we must therefore take it as our paramount responsibility to understand any caveats or shortcomings of the scholarly metadata we make available, including that resulting from matching.
Proper evaluation is what allows us to do this, as it is impossible to know how well a given matching strategy will perform in its absence. This is true no matter how simple or complex a matching strategy may seem. Complex methods can be tailored to data with specific characteristics and might fail when faced with something different from this. Simple methods might be only appropriate for clean metadata or a narrow set of use cases.
Beyond complexity, matching strategies themselves vary widely in character, inheriting biases from their design, training data, or how a problem has been formulated. Some prioritise avoiding false negatives, while others focus on minimising false positives. Even a generally high-performing strategy might not be perfectly aligned with your specific needs or data. In some cases, the task also itself might be too challenging, or the available metadata too noisy, for any matching strategy to perform adequately.
Evaluation is, again, how we understand these nuances and make informed decisions about whether to implement matching or avoid it altogether. By now, it should also be clear that the notion “we don’t need to evaluate” is far from ideal! Given its importance, let’s explore how evaluation is actually done.
In general, a proper evaluation procedure should follow the following steps:
From this accounting, we can see that there are two primary components for the evaluation process: an evaluation dataset and metrics.
It’s useful to conceive an evaluation dataset as the specification for an ideal matching strategy, describing what would be returned from our forever-elusive perfect matching. When creating such a dataset, what this means in practice is that it should contain a number of real-world, example inputs, along with the corresponding ideal or expected outputs, and that all data should be in the same format as the strategy is expected to process. The outputs should themselves also confirm the strategy’s overall requirements, for example, by being consistent with its cardinality, meaning whether zero, one, or multiple matches should be returned and under what circumstances. In terms of size, it’s generally useful to calculate the ideal number of evaluation examples using a sample size calculator or using standardised measures, but as a quick rule of thumb: less than 100 examples is probably insufficient, more than 1,000 or 2,000 is generally acceptable.
It is also important that the evaluation dataset be representative of the data to be matched in order to ensure reliable results. Using unrepresentative data, even if convenient, can lead to biassed or misleading evaluations. For example, if matching affiliations from various journals, building an evaluation dataset solely from one journal that already assigns ROR IDs to authors’ affiliations might be tempting. The data, having been already annotated, allow us to avoid the tedious work of labelling, and we might even know that it is produced by a high-quality source. This is still, unfortunately, a flawed approach. In practice, such datasets are unlikely to represent the entire range of affiliations to be matched, potentially leading to a significant discrepancy between the evaluated quality and the actual performance of the matching strategy, when applied to the full dataset. To assess a matching strategy’s effectiveness, we have to resist shortcuts and instead do our best to create truly representative evaluation datasets to be confident that we’ve accurately measured their performance.
Evaluation metrics are what allow us to summarise the results of the evaluation into a single number. Metrics give us a quick way to get an estimation of how close the strategy was to achieving perfect results. They are also useful if we want to compare different strategies with each other or decide whether the strategy is sufficient for our use case, removing the need to compare countless evaluation examples from different strategies against one another.
The simplest metric is accuracy, which can be calculated as the fraction of the dataset examples that were matched correctly. While a commonsense benchmark, accuracy can be misleading, and we generally do not recommend using it. To understand why, let’s consider the following small dataset and the responses from two strategies:
Input | Expected output | Strategy 1 | Strategy 2 |
---|---|---|---|
string 1 | ID 1 | ID 1 | ID 1 |
string 2 | ID 2 | ID 3 | Empty output |
string 3 | Empty output | Empty output | Empty output |
Both strategies achieved the same accuracy, 0.67, making one mistake each on the second affiliation string. However, a closer examination reveals that these error types are distinct. The first strategy matched to an incorrect identifier, while the second refused to return any value illustrating the limitation of accuracy as a measure: it generally fails to capture important nuances in strategy behaviour. In our example, the first strategy appears more permissive, returning matches even in unclear circumstances, while the second is more conservative, withholding them when uncertain. Although using such a small dataset would preclude drawing any definitive conclusions, it highlights how relying on accuracy alone can obscure differences in performance.
For evaluating matching strategies, we instead recommend using two metrics: precision and recall. To recap from our previous blog post:
Applying these measures to our prior example, the strategies achieved the following results:
As we can see, while both strategies have the same accuracy, using precision and recall better describes the difference between the two sets of results. Strategy 1’s lower precision indicates it made false positive matches, while Strategy 2’s perfect precision shows that it made none. The identical recall scores show both identified half of the possible matches.
Of course, results calculated using such a small dataset are not very meaningful. If we obtained these scores from a large, representative evaluation dataset, it would indicate to us that Strategy 1 risks introducing many incorrect relationships, while Strategy 2 would be unlikely to do so. In both cases, we would still expect approximately half of the possible relationships to be missing from the strategies’ outputs.
Which one is more important to prioritise, precision or recall? It depends on the use case. As a general rule, if you want to use the strategy in a fully automated way, without any form of manual review or correction of the results, we recommend paying more attention to precision. Privileging precision will allow you to better control the number of incorrect relationships added to your data. If you want to use the strategy in a semi-automated fashion, where there is a manual examination of and a chance to correct the results, pay more attention to recall. Doing so will guarantee that enough options are presented during the manual review stage and fewer relationships will be missed as a result.
To get a more balanced estimation of performance, we can also consider both precision and recall at the same time using a measure called F-score. F-score combines precision and recall into a single number, with variable weight given to either aspect. There are three commonly used types, each calculated as the weighted harmonic mean of precision and recall:
Each of these variants allows for fine-tuning the evaluation metric to align with your expectations for a specific matching task. Choose whichever reflects the relative importance of precision versus recall for your use case.
To summarise, to avoid falling prey to misleading sales pitches or silly quizzes, it is important to have a good understanding of the performance of any strategies you are building or integrating. With thorough evaluation, including a representative dataset and carefully considered metrics, we can estimate the quality of matching and, by extension, its resulting relationships.
Now that we’ve covered how to evaluate effectively, we can move on to some other aspects of metadata matching. Our next blog post will take a final, more holistic view of matching, exploring some complementary considerations to all of the preceding. Stay tuned for more!
https://www.crossref.org/blog/how-good-is-your-matching/
date: 2024-11-06, from: ETH Zurich, recently added
Walzer, Alexander N.
http://hdl.handle.net/20.500.11850/704476
date: 2024-11-05, from: Internet Archive Blog
The following guest post from Aaron O’Donovan (aodonovan@columbuslibrary.org), Columbus Metropolitan Library Special Collections Manager, is part of a series written by members of the Internet Archive’s Community Webs program. Community […]
date: 2024-11-05, from: Scholarly Kitchen
As artificial intelligence begins to play an ever-bigger role in the scholarly publishing landscape, how might it help solve some of the biggest challenges facing publishers?
The post The Top Ten Challenges, Needs, and Goals of Publishers – and How AI Can Help in Digital Transformation and the Open Science Movement appeared first on The Scholarly Kitchen.
date: 2024-11-04, from: Internet Archive Blog
The following Q&A between writer Caralee Adams and journalist Philip Bump of The Washington Post is part of our Vanishing Culture series, highlighting the power and importance of preservation in our digital age. Read more […]
https://blog.archive.org/2024/11/04/vanishing-culture-qa-with-philip-bump-the-washington-post/
date: 2024-11-04, from: Standard Ebooks, new releaases
An author struggles to complete his book without having any practical experience of Satanism.
https://standardebooks.org/ebooks/j-k-huysmans/la-bas/keene-wallace
date: 2024-11-04, from: Scholarly Kitchen
The Society for Scholarly Publishing is launching the Mental Health Awareness and Action Community of Interest (CoIN) Group.
The post Mental Health Awareness Mondays – SSP Launches the Mental Health Awareness and Action Community of Interest Group appeared first on The Scholarly Kitchen.