User talk:Citation bot

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Note that the bot's maintainer and assistants (Thing 1 and Thing 2), can go weeks without logging in to Wikipedia. The code is open source and interested parties are invited to assist with the operation and extension of the bot. Before reporting a bug, please note: Addition of DUPLICATE_xxx= to citation templates by this bot is a feature. When there are two identical parameters in a citation template, the bot renames one to DUPLICATE_xxx=. The bot is pointing out the problem with the template. The solution is to choose one of the two parameters and remove the other one, or to convert it to an appropriate parameter. A 503 error means that the bot is overloaded and you should try again later – wait at least an hour.

Or, for a faster response from the maintainers, submit a pull request with appropriate code fix on GitHub, if you can write the needed code.

Automatic cite magazine conversions[edit]

new bug
Reported by
adamstom97 (talk) 03:35, 1 May 2022 (UTC)[reply]
What happens
Quite often this bot converts {{cite web}} into {{cite magazine}} just because the website that is being cited is associated with a magazine. I believe this is incorrect, {{cite magazine}} should only be used if an actual magazine with physical pages is being cited. I'm not sure in what circumstances a bot would be able to determine that.
Relevant diffs/links
I have been ignoring or reverting these changes for a long time so there are plenty of examples out there, here is one recent one: diff.
We can't proceed until
Feedback from maintainers

Online magazines are magazines. The bot's behaviour is correct. Headbomb {t · c · p · b} 03:38, 1 May 2022 (UTC)[reply]

They aren't magazines, they are websites. They don't have physical pages, they don't have physical publishers, they don't use identifiers such as isbn, etc. If someone cites a web source with all of the correct parameters according to {{cite web}} this should not be randomly changed to {{cite magazine}} by a bot, which should be fixing actual errors. - adamstom97 (talk) 03:49, 1 May 2022 (UTC)[reply]
"Online magazines". It's in the name. And they do have identifiers, which for magazines are ISSNs. In this case, 1049-0434 (print) and 2169-3188 (online) Headbomb {t · c · p · b} 03:53, 1 May 2022 (UTC)[reply]
A quick Google of "do websites count as online magazines" gave several explanations for why they are not, and our own article clearly describes something that is not a basic website. Entertainment Weekly may have an ISSN for its online magazine (which this page suggests is a digital version of the magazine that can be read like a normal magazine on various devices), but it looks like is a separate thing. Is there consensus somewhere that citing a website that happens to be associated to a magazine using {{cite web}} is incorrect? Or was it just decided by the bot people that it needed to change those refs? - adamstom97 (talk) 04:49, 1 May 2022 (UTC)[reply]
Comment: it's not just with {{Cite magazine}}, Citation Bot also converts {{Cite web}} templates to {{Cite newspaper}} and {{Cite news}} from time to time, which I feel is unnecessary. InfiniteNexus (talk) 21:19, 1 May 2022 (UTC)[reply]
Pinging @Gonnym here as they may have something to add. InfiniteNexus (talk) 21:23, 1 May 2022 (UTC)[reply]
Thanks for the ping. I also don't agree with these changes and I feel that if the bot wants to continue with them, it should actually see if there is consensus for it. Gonnym (talk) 23:40, 1 May 2022 (UTC)[reply]
{{Cite magazine}} states, "This Citation Style 1 template is used to create citations for articles in magazines and newsletters." So citations of articles from the website should remain web citations and not be converted to magazine citations like the bot currently does. -- Zoo (talk) 03:55, 2 May 2022 (UTC)[reply]
Just letting this page's watchers know that a simultaneous discussion is going on at WT:MCU § Entertainment Weekly citation type. Perhaps it would be better to keeps things centralized. InfiniteNexus (talk) 04:06, 2 May 2022 (UTC)[reply]
I did not see an immediate issue with these changes, as they have occurred on MCU articles such as WandaVision for Rolling Stone, but given how widespread they are, and how the templates are only intended to be used to cite the publications themselves, whereas cite web should remain just for the websites themselves. I'm unsure if there was any consensus to make this change to the bot, but I do not see a need for changing the cite web templates. Cite news and cite web are generally the same, although cite web should remain for sites like, which has also been changed. Trailblazer101 (talk) 13:55, 2 May 2022 (UTC)[reply]
It also converts Bleeding Cool from cite web to cite news. Trailblazer101 (talk) 17:49, 3 May 2022 (UTC)[reply]

I also agree that Entertainment Weekly and Rolling Stone, if using the |url= para and clearly pulling an article from their websites, should be using {{Cite web}}, not {{Cite magazine}}. Obviously, if you are citing a print article from either, then {{Cite magazine}} should be used then (and in many of those cases, the |url= parameter would not be used at all). - Favre1fan93 (talk) 16:11, 6 May 2022 (UTC)[reply]

Also, Entertainment Weekly was never affected by this tool until recently, so what changed? - Favre1fan93 (talk) 16:11, 6 May 2022 (UTC)[reply]
It appears it could have been from this request here back in April. In this instance, the only change should be if {{cite journal}} is being used to cite Entertainment Weekly, NOT {{Cite web}}. The publication shouldn't have a "catch all" adjustment. - Favre1fan93 (talk) 16:13, 6 May 2022 (UTC)[reply]

Ok so I'm pretty sure in this part of the code, it needs to be edited at line 581 to remove 'entertainment weekly' and 'rolling stone' from "ARE_MAGAZINES" and move those to line 590 for "ARE_MANY_THINGS". I don't know if any of the other bot's files need adjustment, but this seemed to be where the issue lies. And honestly, I feel basically a lot more in the "ARE_MAGAZINES" need to move too... - Favre1fan93 (talk) 23:51, 6 May 2022 (UTC)[reply]

@BrownHairedGirl: apologies for the ping, but I saw you as contributor on this part of the GitHub. Are you able to assist in these changes? - Favre1fan93 (talk) 23:54, 6 May 2022 (UTC)[reply]
No prob at all with the ping, @Favre1fan93. It's a helpful way of letting me know that I might be able to help.
I have a vague recollection of making a related change a few weeks ago, so I could probably make this one if I approved of it and if there was consensus to do it.
But I don't approve of this change; I oppose it. I agree 100% with @Headbomb's succinct comment that {{tq|Online magazines are magazines. The bot's behaviour is correct|q=y}}. BrownHairedGirl (talk) • (contribs) 00:08, 7 May 2022 (UTC)[reply]
@BrownHairedGirl: The documentation for Cite magazine says: This Citation Style 1 template is used to create citations for articles in magazines and newsletters. I would not use this to cite an article appearing on Entertainment Weekly's website, nor Rolling Stone's; I'd use Cite web (or news). Also it doesn't help the fact going forward that EW is ceasing print publication, which even more so shouldn't use Cite magazine in my view. At the very least, I believe EW should be removed from what was done with the add requested back in April, because that was for a single-use instance where an article was citing the print magazine. I don't know how to check, but I'd gather the vast majority of EW cites on the project are from their online articles. - Favre1fan93 (talk) 00:18, 7 May 2022 (UTC)[reply]
To be honest, is there any visual difference between {{Cite web}}, {{Cite magazine}}, and {{Cite news}}? Or between |url= and |magazine= and |newspaper=? If not, what is the purpose/benefit of the bot changing the templates/parameters? InfiniteNexus (talk) 00:24, 7 May 2022 (UTC)[reply]
@InfiniteNexus: the diff is that {{Cite magazine}}, and {{Cite news}} support some parameters used only for paper publications, but {{Cite web}} does not support those parameters. BrownHairedGirl (talk) • (contribs) 00:33, 7 May 2022 (UTC)[reply]
Which is why making such a change to online magazine articles is pointless. They're not going to have page numbers or ISSNs and stuff like that. InfiniteNexus (talk) 00:35, 7 May 2022 (UTC)[reply]
@Favre1fan93: I get that. But still, I agree 100% with Headbomb's comment that Online magazines are magazines. The bot's behaviour is correct, and you appeared to have overlooked our views when interpreting the documentation for Cite magazine.
So for a ref to EW or Rolling Stone, I would prefer {{Cite magazine}}.
For me this is the same issue as using {{Cite news}} for a ref to The Guardian newspaper. Sure, most en.wp editors use the website rather than print, but The Guardian and The Observer that's appropriate because both adopted a "Digital First" strategy in 2011.
"Digital First" did not make The Guardian' cease to be a daily newspaper.
"Digital First" did not make The Observer' cease to be a Sunday newspaper.
The same applies to Rolling Stone magazine and to EW. The move online does not make either of them cease to be a magazine. They are now online magazines, not ex-magazines. BrownHairedGirl (talk) • (contribs) 00:31, 7 May 2022 (UTC)[reply]
Well seeing as the change made a few weeks back in reference to this request was for a specific instance of the print EW, and sourcing that site had been uncontested until that point (hence this discussion), I think EW should be removed from that part of the code locking it in to just Cite magazine. Or at the very least move it to "ARE_MANY_THINGS" so hopefully instances of EW using Cite web won't be touched by this bot/tool as "incorrect" (if I'm understanding these distinctions correctly). - Favre1fan93 (talk) 02:35, 7 May 2022 (UTC)[reply]
BrownHairedGirl you say that you agree with Headbomb's statement that Online magazines are magazines. The bot's behaviour is correct, but I don't think either of you have explained why you believe this or if there is any consensus to support it. In my response to Headbomb above I gave what I thought was reasonable evidence that online magazines and magazines are not actually the same thing, are you able to refute that or have you guys just unilaterally decided to make these widespread changes that are clearly controversial? - adamstom97 (talk) 06:46, 7 May 2022 (UTC)[reply]
@Adamstom.97: sorry, but I did not find your comments to be either substantive or persuasive. BrownHairedGirl (talk) • (contribs) 11:11, 7 May 2022 (UTC)[reply]
So you guys have just decided that you want it to be like this with no evidence or reasoning to support your position, and it sounds like you are planning to ignore the fact that many editors disagree with you (including some who do have evidence and reasoning that goes against your position). It would be fine for you to continue to do whatever you want if this toy of yours wasn't changing many articles across Wikipedia without consensus, but unfortunately it is. - adamstom97 (talk) 11:20, 7 May 2022 (UTC)[reply]
@Adamstom.97: Your false claim that I have offered no reasoning is deeply uncivil conduct. BrownHairedGirl (talk) • (contribs) 14:51, 7 May 2022 (UTC)[reply]

I am curious, is there a location/past discussion that justifies which publications should be using which citation template? Has consensus been reached for any? Or have requests just come to this bot noting "issues" and then those that maintain put the publications in the certain categories? I am asking because I felt seeing the tool go through and adjust EW from Cite web to magazine as vandalism given what I felt was a stable status quo on the matter. That is why at the very least I feel as I mentioned above, moving it to "ARE_MANY_THINGS" would be acceptable since the comments in the code states "These are things that are both websites and newspapers", which applies here. - Favre1fan93 (talk) 15:28, 7 May 2022 (UTC)[reply]

The tool is still being used on pages I watch and making these adjustments (obviously because nothing as changed), but I am viewing this now as WP:DISRUPTIVE. I continue to suggest my change be implemented, or a full removal of EW from the magazine-only list be made until some sort of great consensus can be reached on the matter. Though I have found the area of concern, I'm not confident in my own ability to make edits to the GitHub lest it causes greater issues. - Favre1fan93 (talk) 20:41, 7 May 2022 (UTC)[reply]
Is there a venue where we can get more voices on this? InfiniteNexus (talk) 20:56, 7 May 2022 (UTC)[reply]
I agree with @Favre1fan93 that at the very least, move EW to ARE_MANY_THINGS as it seems to fit what EW is more than just a website or a magazine. It's gotten tedious to constantly clean up after the bot and I've held off on running the bot myself, hoping for something to be changed first. -- Zoo (talk) 21:09, 7 May 2022 (UTC)[reply]
@InfiniteNexus: I have left a discussion notice at Help talk:Citation Style 1 (here) which seemed most appropriate. - Favre1fan93 (talk) 21:56, 7 May 2022 (UTC)[reply]
Add me to the list of believers in "Online magazines are magazines". Our article Entertainment Weekly clearly classifies this source as a magazine. I think {{cite magazine}} is an appropriate choice for this source. —David Eppstein (talk) 22:23, 7 May 2022 (UTC)[reply]
@David Eppstein: EW was a physical magazine that is now just an online magazine, but the sources that are being automatically changed to {{cite magazine}} are not for either, they are for the EW website which is a separate thing. - adamstom97 (talk) 23:16, 7 May 2022 (UTC)[reply]
Keeping in mind that the bot seems to be applying CS1 citation guidelines (although this is not explicit in the documentation):
If I remember correctly, {{cite web}} was originally implemented to cite websites as sources that cannot fit any other classification. In general {{cite xxx}} CS1 templates cite by work (source) type, regardless of the delivery medium or publishing platform. In this case the work type is a serial (magazine). The bot is correct in its application of the CS1 formatting guidelines. (talk) 00:59, 8 May 2022 (UTC)[reply]
But it isn't a magazine, that is the whole point of this discussion. It is related to a magazine, but these sources are for the website not magazine articles. - adamstom97 (talk) 01:05, 8 May 2022 (UTC)[reply]
I suppose a real-world example is needed. Unless a missed a diff posted somewhere of such presumably erroneous conversion. (talk) 01:13, 8 May 2022 (UTC)[reply]
Just one example from the diff included in the report summary at the start of this section: this source is an article on that was cited using {{cite web}}. It was then automatically converted into {{cite magazine}} because EW produces a magazine, but the source is for the website not the magazine. Some users have claimed that because it comes from the magazine company's website it is actually an article from an online magazine, but that is not the case. EW's online magazine is literally a digital version of a physical magazine and is available from digital magazine provider websites such as or It is a separate thing from their website at Sometimes they may include articles (or partial articles) from the magazine in a web article, but that still does not count because the actual magazine has not been cited. - adamstom97 (talk) 01:31, 8 May 2022 (UTC)[reply]
This is not correct. Many print sources have digital fascimiles, and also web-delivered editions that may or may not have different (usually additional) content. The distinction regarding the medium is independent of the type of source cited. EW is still a magazine, that may have print/digital/audio or whatever editions published. (talk) 12:50, 8 May 2022 (UTC)[reply]
Irregardless of "is it or isn't a magazine", using {{Cite web}} for such content on EW's website is not incorrect. I feel at this point the crux of my issue is that this bot is unilaterally putting content from this source in {{Cite magazine}}. As I've been pointing out, if it is put under a different classification within the bot, uses of Cite web should remain as they are (which is how they've been for this publication since only a few weeks ago with no issues), but presumably if the bot finds a bare url formatting, it would then format it to Cite magazine. - Favre1fan93 (talk) 15:13, 8 May 2022 (UTC)[reply]
It is also not incorrect to use a hypothetical {{cite serial}} for any EW version or even the also hypothetical {{cite print}} for content on EW's paper version. But CS1 generally does not cite per medium, but per work (source) type/function. If the source is classifiable as a serial: subtype magazine, and CS1 provides a specific facility for the classification, then it is best to use that specific facility. It seems that the problem here is one of disputed classification. The bot apparently applies CS1 guidelines, as also noted above. To resolve the classification dispute, CS1 would perhaps be the proper forum. But I think the current CS1 format guideline is OK, and the bot is correct in applying it. (talk) 16:33, 8 May 2022 (UTC)[reply]
For clarification, the issue at hand is a change made to the GitHub code that put EW under such classification. Another question I have regarding that code, is how were which publications put under each heading? Just by the maintainers as "bugs" arrose? This is the part of the source code in question, specifically this change that was made off of this bug request a few weeks back. Until that point, EW was not listed here and things were functioning "fine". Entertainment Weekly in my view should just be moved to "ARE_MANY_THINGS" given the comment for that classification is "These are things that are both websites and newspapers" which is 100% what EW is. - Favre1fan93 (talk) 17:18, 8 May 2022 (UTC)[reply]
The utility or applicability of the ARE_MANY_THINGS code may be ripe for questioning. Many of the items included could be characterized as one thing, or mainly one thing. The comment is comparing apples and oranges. There are also things that are both printed matter and magazines, but we don't use them interchangeably in CS1 citations. (talk) 20:45, 8 May 2022 (UTC)[reply]
{{cite serial}} is a real, though rarely used, template for episodic television, radio, web brodcast programs. Perhaps you meant If the source is classifiable as a [periodical]...
Trappist the monk (talk) 17:42, 8 May 2022 (UTC)[reply]
Correct. "Serial" was used in the bibliographic sense, which covers any periodically published item regardless of the medium. For some reason I thought {{cite serial}} was no longer around. (talk) 20:24, 8 May 2022 (UTC)[reply]
Add me to the list of those who consider that "online magazines are magazines". The medium is irrelevant. Do you categorise documents according to writing implement used to write them? A magazine is a magazine if its publishers say it is. Even if it is posted on a wall as samizat, it still a magazine. --John Maynard Friedman (talk) 10:34, 8 May 2022 (UTC)[reply]
To pick an orthogonal example, consider The Economist. It is a weekly, printed on gloss paper in demitab format. Per WP:DUCK and the logic of some editors above, we should use cite magazine. But its publishers say it is a newspaper, so we use cite newspaper. --John Maynard Friedman (talk) 12:01, 8 May 2022 (UTC)[reply]
A magazine is a magazine is a magazine is a magazine. The bot's edit was semantically correct. While not obvious to readers who consume cs1|2 citations visually – the visual renderings of the example bot edit in both {{cite web}} and {{cite magazine}} are identical. For those who consume the citations using reference management software, there is a notable difference between the metadata emitted by {{cite web}} and the metadata emitted by {{cite magazine}}. The source is a magazine so it should be cited as such using the proper cs1|2 template, {{cite magazine}}.
Off-topic: Thor: Ragnarok has 190 cs1|2 templates that have |archive-url= parameters. Archived snapshots at are no longer available. Those who care about Thor: Ragnarok might want to start revising those cs1|2 templates so that the original sources are not permanently lost when they go 404 due to link rot.
Trappist the monk (talk) 17:42, 8 May 2022 (UTC)[reply]
No one is saying a magazine is not a magazine, I am saying that there is a difference between a magazine, a digital magazine, and a website, and no one who disagrees with me has provided any actual reasoning or proof to support them other than just wanting it to be that way. And the fact is that whatever you believe about digital magazines vs. websites, it is objectively correct to use {{cite web}} to cite a web article. Regardless of the digital magazine vs. website debate, there are clearly many editors who think this bot should not be unilaterally changing {{cite web}} to {{cite magazine}}, and since there apparently was never any consensus to do it in the first place I think it's clear that it at least needs to be paused until the people who want this to happen have gained consensus for it. - adamstom97 (talk) 05:36, 9 May 2022 (UTC)[reply]
@Trappist the monk: If a web article from EW/Rolling Stone etc. (that is not in a digital/print magazine at all) uses the parameters of {{Cite web}} correctly and as intended for that template, would a management software reading the metadata be confused by what it sees or expect something different? - Favre1fan93 (talk) 02:13, 10 May 2022 (UTC)[reply]
Rolling Stone and Entertainment Weekly are magazines so their articles, regardless of how they are distributed – paper-form, electronic facsimiles of the paper-form, online portals, or any other way – are cited as magazine articles because the sources (Rolling Stone and Entertainment Weekly) are magazines. When the source responsible for the article is a magazine, cite it as a magazine.
Trappist the monk (talk) 14:37, 10 May 2022 (UTC)[reply]
But that didn't answer my question and curiosity. Stepping back from the definition of these sites, will management software reading the metadata, which you pointed out, be confused by what it sees or expect something different if Cite web is used? - Favre1fan93 (talk) 15:49, 10 May 2022 (UTC)[reply]
cs1|2 classifies each citation into one of these genres:
article, book, bookitem, conference, preprint, report, unknown
{{cite magazine}} uses the genre article because all cs1|2 templates require |title= and because magazine contain articles. {{cite web}} uses genre unknown because cs1|2 cannot know from available parameter values, what the editor is citing. For readers who consume the citations using reference management software, all {{cite web}} templates will be lumped together in the unknown genre. Misusing {{cite web}} to cite an article in a magazine, regardless of how that article is delivered from the publisher to the reader, is a disservice to the reader.
Trappist the monk (talk) 17:17, 10 May 2022 (UTC)[reply]
Then what's the point of the "ARE_MANY_THINGS" designation of this bot, if each publisher of citeable material should be fit, more or less, to one cite template? - Favre1fan93 (talk) 15:31, 16 May 2022 (UTC)[reply]
There is no point to it, and neither are the items (listed as falling under that designation) "many things". As far as I can tell they can be properly described as one thing without diminishing them. The routine seems like surplus code whose main function is to add complexity without any clear benefit. (talk) 15:10, 17 May 2022 (UTC)[reply]
Uh, how about only converting {cite web |url=<magazine, newspaper, or journal website>} ONLY IF it ALSO has one or more actual magazine, newspaper, or journal parameters, such as |page= |pages= |issn= (and maybe others that can be discovered)? ((Parameters that {cite web} does not have, such as |magazine= |volume= |issue= are possible, but less likely, because they trigger warnings.)) -A876 (talk) 23:37, 9 May 2022 (UTC)[reply]
I think that is logical and would be a good solution. - adamstom97 (talk) 23:50, 9 May 2022 (UTC)[reply]
If that can be coded into the bot, then I also agree with this option. - Favre1fan93 (talk) 02:13, 10 May 2022 (UTC)[reply]
Those parameters would be available very very rarely, if ever. So this proposal would almost entirely prevent the bot from using Cite news/newspaper. BrownHairedGirl (talk) • (contribs) 12:54, 11 May 2022 (UTC)[reply]
But it would ensure that when it does do that it is doing it correctly. - adamstom97 (talk) 06:03, 14 May 2022 (UTC)[reply]

Why is bot changing website → magazine? These are references to website, not magazine. It should be disabled. Eurohunter (talk) 19:39, 9 May 2022 (UTC)[reply]

Billboard is a magazine. Headbomb {t · c · p · b} 19:47, 9 May 2022 (UTC)[reply]

Slack in bot usage[edit]

It seems like demand for the bot has declined quite a bit, and there have been periods with no jobs running. There are often long stretches with only one job running. Perhaps the job size limit could be increased a bit, and see how it goes? Abductive (reasoning) 15:23, 13 May 2022 (UTC)[reply]

define("MAX_PAGES", 2850); now set. AManWithNoPlan (talk) 20:22, 13 May 2022 (UTC)[reply]
Part of the slack might be some significant speed-ups made to the code implemented recently. AManWithNoPlan (talk) 20:23, 13 May 2022 (UTC)[reply]
Isn't 2850 the current limit? Abductive (reasoning) 03:52, 14 May 2022 (UTC)[reply]
added a thousand. Now. 3850. AManWithNoPlan (talk) 11:43, 14 May 2022 (UTC)[reply]
Thanks! Abductive (reasoning) 19:01, 14 May 2022 (UTC)[reply]
Curious - why 3850, instead of a rounder number like 4000 or 4096? (Also, when'd the limit get raised from 2200 to 2850?) Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 19:56, 14 May 2022 (UTC)[reply]
So I tried a single-shot using the ☑ Citations button. It failed as usual. Quelle surprise. Not. If priority is to be given to these bottom-dragging trawls, please just stop wasting everybody else's time and withdraw the button. --John Maynard Friedman (talk) 20:13, 14 May 2022 (UTC)[reply]
Oftentimes the page gets processed eventually even if you see a 502 error. Also, your OAuth login to Meta expires in something like 24 hours, so if it has been more than a day since you tried to run the bot, a delay in Meta bringing up the login will look like a delay in the bot. These Meta delays are quite common. Abductive (reasoning) 00:01, 15 May 2022 (UTC)[reply]
That's activating through the 'expand citations' options. The ☑ Citations button times out. Headbomb {t · c · p · b} 02:15, 15 May 2022 (UTC)[reply]
Exactly. Like most users, I am not using a bot. --John Maynard Friedman (talk) 10:34, 15 May 2022 (UTC)[reply]

Put me down for a return to a ~1k page limit. Headbomb {t · c · p · b} 02:17, 15 May 2022 (UTC)[reply]

And me. 2850 was anti-social. 3850 looks like enemy action. Why are the very few (<5?) bot operators allowed to make this tool unusable for everyone else. Like I suspect most other 'single shot' users, I had given up on using the button because failure was the usual option. When any efficiency gain is immediately thrown to the wolves, that will persist. If the bot limit is to be increased rather than heavily reduced, let's stop pretending that there is a credible single shot option. --John Maynard Friedman (talk) 10:34, 15 May 2022 (UTC)[reply]
I agree with JMF.
If bot tasks are indeed making human submissions impossible, then some sort of queuing system needs to be fixed or placed to make sure the bot tasks go as planned while prioritizing human submissions to the bot. This should be of utmost importance as any other bug fixes are worthless if people can't even use the thing.
Even and, with a budget probably 1000x smaller than the WMF, have priority queues, so the human submitters get priority over what i presume are automated sumbitters. So it can be done, the question now is will the WMF put in the resources to do it? Rlink2 (talk) 13:00, 15 May 2022 (UTC)[reply]
It's not really why the citation button chokes. That can be due to a ton of reasons. The issue with runs bigger than 1000 pages is that certain users run the bot against more-or-less random categories and it hogs the bot for a really long time, at the expense of other more targeted runs. Headbomb {t · c · p · b} 13:34, 15 May 2022 (UTC)[reply]
I agree with Headbomb.
Last year, I reported several times on how Abductive was abusing Citation bot by running huge, untargeted runs on categories, which generated very low returns. Abductive repeatedly argue verbosely that having the bot trawl through vast sets of pages where it had almost nothing to do was somehow an appropriate use of the bot. Many other editors pointed out how that was a waste of a precious resource, because the bot could otherwise be processing jobs with much higher rates of return, but Abductive just parroted slogans along the lines of "my job is as valid as anyone else's", completely ignoring the low productivity of his jobs.
Mercifully, in the last few months, Abductive stopped doing these low-return speculative trawls. But in the last week or so, they have resumed, with the same issues: most pages in Abductive's batches are not edited, and most of those which are edited have only trivial changes such as removing unused parameters or changing the style of quotation marks.
This is not a technical problem. It is a human problem of one editor who has a severe WP:IDHT issue, and the remedy is simply to ban Abductive from using Citation bot. BrownHairedGirl (talk) • (contribs) 14:44, 15 May 2022 (UTC)[reply]
Oh, ok so its not a technical issue, rather with one person who is supposedly abusing the bot. I now understand. Well on Wikipedia, consensus is required, so if Abductive is using the bot in way that is unfair to other editors, and other editors tell him to stop doing it, then he should change or explain his behavior.
I do recall reading about this behavior in the talk page archives a while ago. User_talk:Citation_bot/Archive_29#Bot_still_being_abused_by_Abductive - not sure if it revelant. Regardless, this is a serious issue that needs to be discussed in its entirely. I think the previous thread was closed too early. Rlink2 (talk) 15:01, 15 May 2022 (UTC)[reply]
@Rlink2: there were at least half-a-dozen threads about Abductive's abuse of Citation bot, some of which were very lengthy. The issue has already been discussed in its entirety, many times ... and Abductive's tediously self-serving explanations have been heard and rejected many times.
The problem is well-documented, so there is no point in rehashing those discussions again.
Abductive is:
  1. running low-return speculative trawls of categories
  2. flooding the job queue with so many single-page requests that the other requests don't get seen, leaning to available slots going unused.
The only issue now is: will Abductive voluntarily desist from these long-running abuses of Citation bot, or do we have to ban Abductive from abusing the bot? BrownHairedGirl (talk) • (contribs) 15:16, 15 May 2022 (UTC)[reply]
The definition of Abductive reasoning is to

start with an observation or set of observations and then seeks the simplest and most likely conclusion from the observations.

The observations are the following:
  • Abductive continues to push through even though other editors tell them to stop. Per Wikipedia guidelines: Serious or repeated breaches or an unwillingness to accept feedback from the community (Wikipedia:I didn't hear that) may be grounds for sanction. if there are many threads regarding this, then it shows a repeated unwillingness to accept feedback, instead just pushing right through.
  • This comment from Abductive stood out to me: The fundamental problem is that we have all been running the bot so much that is has fixed most articles by now, so that category runs aren't going to have a high enough rate of return.. So he acknowledged that the jobs was not useful, but yet when BrownHairedGirl asked why he is still wasting his time with the jobs, he basically had no response.
So the simplest conclusion is that either Abductive should stop the behavior or be banned from the bot.
Note that I am trying to stay neutral for now because I don't know the whole story. But it seems to be a "one vs many" situation, which Abudctive being the "one". Rlink2 (talk) 15:33, 15 May 2022 (UTC)[reply]
@Rlink2: the whole story is in the talk page archives, if yo have the time and energy to wade through it. It includes many occasions when I documented in detail how Abductive was abusing the bot, with links which make it verifiable.
One of Abductive's defences of this conduct is that they are using spare capacity in the bot, which would otherwise be wasted: better, says Abductive, to use it for low-return tasks than to have it unused.
That is superficially plausible, but only superficially. The flaw in this logic is that the unless the bot has spare capacity at any given moment, it cannot respond promptly to a single-page request by an editor who is using the bot interactively to assist in their manual work on a page. It may mean that the bot does not ever process that single-page request, because it times out before a slot is available.
So ... trying to use all the bot's capacity is actually a fundamentally bad idea. Like spaces in a car park or beds in a hospital, 100% utilisation is a nightmare of users. Spare capacity is what make the system usable.
It is this problem which @John Maynard Friedman (JMF) complains about, and righty so. The bot is an invaluable tool for helping to build complex citations while writing an article. See User:Headbomb/Tips and Tricks for examples of how it can help.
This ability of the bot to take an isbn or a doi or a handle and build a full reference to a book or journal is a huge timesaver. I often use it myself when manually filling the refs on an article (as I did 3 times yesterday), and in my experience 'it can save 5-10 minutes for each reference to a scholarly journal. (The complexity of those refs needs a lot of careful checking, so 5-10 minutes each is a genuine number).
Abductive's approach of using-all-the-bot's capacity makes to the bot unavailable for this sort of work, which actively sabotages the efforts of those improve articles. I thoroughly understand why JMF is so angry, and I share that anger. BrownHairedGirl (talk) • (contribs) 16:17, 15 May 2022 (UTC)[reply]

All that argument has merit but let's not lose sight of the main issue. If bot runs are restricted to 1000 edits, no matter who is making them or how lucky they feel, at least requiring their operators to take manual intervention every ten minutes might (a) give the rest of us some oxygen and (b) give them pause to consider whether there is not something more useful to spend their (and our) time on. --John Maynard Friedman (talk) 16:02, 15 May 2022 (UTC)[reply]

@John Maynard Friedman: as above, I have huge sympathy for your frustration about the unavailability of the bot to assist in manual referencing.
However, I am exasperated by your continued refusal to distinguish between productive and unproductive batch jobs. Your desire to punish those using the bot for productive batches is a vindictive and disruptive case of WP:IDHT.
Lemme explain. For 10 months, I have used Citation bot to fill WP:Bare URLs. I use a variety of tools to build and maintain lists of bare URLs, which I feed to the bot in systematic ways designed to minimise re-processing of articles. I also tag bare URLs of types which the bot cannot fix (see e.g. the 38,000 article which use {{Bare URL PDF}}). I also follow behind the bot to use other tools to fill ref which the bot fails to fix, and to identify websites where the can fill refs ... and I have developed a bunch of tool to identify and tag dead links: in February alone, my tools identifies and tagged over 60,000 bare URL refs which were actually dead.
This work by me takes many hours per day of list-making, tagging and programming. I estimate that in the last ten months, I have put over 3,000 hours of work into this task.
I have put in that time because it is getting results. At the start of May 2021, there were ~470,000 en.wp articles with bare URL refs. Now there about 140,000. That fall of ~330,000 masks the real progress, because new bare URLs are added at a rate of over 300 per day. So without this work, the tally would have grown by ~90,000, meaning that about 420,000 pages have been cleaned up.
Note that many of those pages contained multiple bare URLs, so the total number of bare URLs filled by this work is over a million.
I do not want any praise or thanks for this. But I am angry at your contemptuous desire to sabotage this work.
I try to keep Citation bot filling bare URLs 24 hours per day, and I do that by structuring my days so that I available to start a new batch when one finishes. That means keeping my laptop open while I do other task, and it often means setting alarms so that I wake in the night to start a new batch.
Your call for me to manually intervene every ten minutes would allow me to run these big batches only if I never slept, or did anything which took me away from my desk for more than ten minutes. Since that is impossible, the effect of your proposal would be to allow me to set Citation bot to fix bare URLs for only about 4 or 5 hours per day. That would be possible only if I took on a big extra burden, and even so it would reduce the productivity of this task by about 80%. It's slow enough already, so I would simply stop this work.
If you genuinely think that me using the bot fill about a million bare URLs is silly and that I should "find something more useful to spend my time on", then please say so directly, and we can discuss whether there is consensus for your view. Please note that both @Headbomb and @AManWithNoPlan also run a lot of targeted, productive batches, so your approach would also sabotage their work.
This is far from the first time when you have failed to acknowledge the distinction between productive and unproductive batch jobs, and have chosen instead to lash out indiscriminately, lumping those of us who target the bot efficiently into te same category as those who waste the bot's time. That distinction has been pointed out to you many times, and it is made clear in his thread, but yet again you choose to ignore it. Damn you. BrownHairedGirl (talk) • (contribs) 16:58, 15 May 2022 (UTC)[reply]
I intended no contempt nor, on re-reading, do I see any reasonable inference of such. Yes, I totally understand your frustration that the behaviour as you see it of one bot operator is bringing all operators into disrepute. Yes, I notice that your runs have generally brought improvement to pages I watch. But looked at from the perspective of the many who want to validate the citations in an article they have worked on, the fact that they cannot do so appears to be caused by a much smaller number of bot operators. It is impractical for me to make a value judgement between operators and I consciously choose not to try. It is not obvious to me why a limit of 1000 articles per run will bring your work to a halt, only that you will have to keep restarting it.
The other option of course is to formally request that Abductive's bot privileges be withdrawn but that will take time. John Maynard Friedman (talk) 17:49, 15 May 2022 (UTC)[reply]
Nonsense, @John Maynard Friedman. It is perfectly practical to make distinction between bot operators, by analysing the number and value of the edits make in each batch. I set out that data in an about half-a-dozen threads last year.
I understand your frustration. But you repeatedly choose to ignore the evidence of the cause of that problem, and instead you lash out indiscriminately, and advocate solutions which would sabotage most of the productive work of the bot. BrownHairedGirl (talk) • (contribs) 19:15, 15 May 2022 (UTC)[reply]

User:BrownHairedGirl running two jobs at once[edit]

I would like to know how in is possible that User:BrownHairedGirl has consistently been running two jobs at once for the past week or so? And then has the gall to complain about my past behavior? She needs to stop running two jobs at once, as it clogs the bot for other users—no matter that her jobs seem important to her, or make a lot of edits. Abductive (reasoning) 18:23, 15 May 2022 (UTC)[reply]

Sadly but unsurprisingly, Abductive continues to fail to distinguish between my high-return batch jobs which tackle the the worst refs, and Abductive's own low-return speculative trawls which mostly make no edit at all to the listed pages.
If Abductive actually wants to unclog the bot, the remedy is for Abductive to stop running low-return speculative trawls. BrownHairedGirl (talk) • (contribs) 19:19, 15 May 2022 (UTC)[reply]
Well if Abductive's submissions aren't making any edits, and BHG's and others are making edits, than thats a clear cut case IMO. Its going to be hard to convince anyone that we should stop a highly productive edit run for one that barely does anything. Rlink2 (talk) 19:47, 15 May 2022 (UTC)[reply]

I have just spent about 5 hours today doing what I have done every day this week: checking Citation bot's history to examine the diff of every edit where the editsummary says that a bare URL was filled, while the bot works it way through my regular lists of articles with bare URLs. (It is currently processing articles with bare URLs tagged in May 2022, as of 13 May)

In each case, I copy the domain name of the filled ref to a list in my text editor. Periodically I purge it to remove domains which were in previous lists. So far, this afternoon's list has 172 new domains. In a few hours, I will stop list-making, by which time the list will probably be over 200 domains.

With that list, I wrap each domain in a wiki-search regex. Then I take each line in turn, and use a wiki-text-search within AWB's listmaker to find articles with one or more bare URL refs to that domain.

When complete, I sort the list and remove duplicates.

That builds the list of articles which I feed to Citation bot.

By doing this, I create an article list which consists solely of articles where both the following conditions apply:

  • articles which currently have bare URLs (tagged or untagged), and
  • in every case at least one of those bare URLs is to a domain which should be fixable, 'cos Citation bot has filled a ref to that domain today.

That has produced very high returns, which I have been pursuing that approach this week. The result is that so far this month, the total number of articles with bare URL's has fallen by well over 10%. (The number of untagged bare URLs has fallen from ~~116K to ~98K, the first time that the tally has been below 100K since I started this work with the tally at over 470K).

Abductive wants me to stop this productive work, in order to facilitate Abductive's low-return speculative trawls. My answer is a firm "no".

Obviously, if there is a consensus that the community would prefer the bot's resources to be used for yet more of Abductive's unproductive, no-effort-required clog-the-bot-with-categories speculative trawls, then I will abide by that decision. --BrownHairedGirl (talk) • (contribs) 20:50, 15 May 2022 (UTC)[reply]

It would be a different situation if the dispute was the type of edits being done (my batch is adding DOI information, but yours batch is adding date information). However the dispute here is having more articles be edited vs less. I can't really come up with any plausible explanation to show how using the bots capacity to focus on articles on which it will be less effective is a good idea.
When BHG makes a list, she puts in alot of time and effort into the list, and it shows. However, the lists Abductive seem to be putting do not seem to be not controlled for quality and is basically random, which results in a much lower efficency rate. If Abductive wants to use the bot, then he should make sure his list meets a certain quality and results in articles actually being edited.
I see the consensus to be in BHG's favor. I think Abductive should stop using the bot until he's figured out how to make high-throughput lists. Rlink2 (talk) 21:17, 15 May 2022 (UTC)[reply]
The bot only has four channels. If User:BrownHairedGirl uses two of them, and two other users run batch jobs (note: all other users' batch and category jobs are as "inefficient" as mine) the nobody can run the bot on a single page. The bot is supposed to run through categories and do the tedious work of deciding which ones to make edits to—this is the bot functioning as intended, not some sort of misuse of the bot. There was a user who was using the bot on more than one job at time; the bot was then reconfigured to not allow that. User:BrownHairedGirl has always called out other users when she discovered them running the bot on two jobs, even though they did it inadvertently. Now she has discovered a way to "game" the bot to run two jobs, and is doing it constantly. I barely use the bot anymore, and I always make sure that there are two channels open before running a batch. My use of the bot is not problematic, but running two huge jobs at the same time is. In fact, that was why I requested to size limit of the runs be increased: so that BHG could run larger jobs, and not be tempted to run two at once. Abductive (reasoning) 21:42, 15 May 2022 (UTC)[reply]
Abductive writes The bot is supposed to run through categories and do the tedious work of deciding which ones to make edits.
That is utter bull. The purpose of the bot is to improve references. It has the ability to skip pages where it can make no improvements, but its purpose is not to spend its time skipping pages. (It takes just as long to find that no changes can be made as to it does to make the changes).
The only purpose of that bogus assertion is to try to justify Abductive's longstanding habit of wasting the bot's time on low-vale trawls. Every bot benefits from targeting, and nearly all bots use some form of targeting. This one is no exception.
Abductive writes My use of the bot is not problematic. Wow! This is the tactic known as as the Big Lie: assert a known falsehood, and try to make it stick. After 8 months of explanation to Abductive about how their batch jobs are unproductive, it is extraordinary that Abductive is still in denial ... and even more bizarre that he still expects someone to believe him.
The issue here is very very simple. Abductive wants to displace highly productive jobs to make way for unproductive jobs. I have no idea what on earth motivates that desire, but it is clearly not the objective of someone who is here to improve the encyclopedia. BrownHairedGirl (talk) • (contribs) 22:09, 15 May 2022 (UTC)[reply]
The issue is you are taking up 50% of the bot's capacity for days at a time. The bot was reconfigured to stop users doing that. It is especially bad for people editing a single article, and can't get the bot to run. Abductive (reasoning) 23:28, 15 May 2022 (UTC)[reply]
BHG has been using Citation bot for longer than I can remember. If there was some sort of issue with her usage of the bot, it would have been raised earlier. Why is it after Abductive's picks up their usage of Citation Bot when the complaints come rolling in? Rlink2 (talk) 23:57, 15 May 2022 (UTC)[reply]
This is a "where there's smoke there's fire" argument. But If you actually look through the bot contribs, you'll see that I am not abusing the bot, and she is. Abductive (reasoning) 00:09, 16 May 2022 (UTC)[reply]
So, in Abductive's alternate reality, it is disruptive for me to use the bot productively ... but not disruptive for Abductive to use it unproductively.
Bizarre. BrownHairedGirl (talk) • (contribs) 00:28, 16 May 2022 (UTC)[reply]
It has caused problems only when you decided to run some of your low-return speculative trawls at the same time.
The solution is simple: stop doing these low-return speculative trawls.
It is now nearly a year since you began expressing your persistent desire to displace productive uses of Citation bot, absurdly claiming that your abuse of the bot for unproductive purposes is "valid". Classic WP:NOTHERE stuff. BrownHairedGirl (talk) • (contribs) 23:57, 15 May 2022 (UTC)[reply]
What you call "speculative trawls" is an approved function of the bot. Any user who runs a category would be running a "speculative trawl". Abductive (reasoning) 00:10, 16 May 2022 (UTC)[reply]
Again, not true. And very simple.
As has been explained to Abductive many times over the past year, there are categories which concentrate problems which the bot can fix. Feeding those categories to the bot can be very productive ... but randomly-chosen content categories are not productive.
Note that Abductive is still wikilawyering (badly) in pursuit of his desire to displace highly-productive bot jobs, allowing Abductive to run unproductive tasks instead.
This desire to actively impede improvement of Wikipedia is classic WP:NOTHERE stuff. It's time for Abductive to explain what exactly motivates him to impede improvement of Wikipedia. Does he just like pressing the button on the bot? Does he actively want to stop more refs being improved? Is he somehow resentful that he is unable or unwilling to devise productive bot jobs?
Go on, Abductive. Please do tell why you are so keen to make the bot less productive. BrownHairedGirl (talk) • (contribs) 00:25, 16 May 2022 (UTC)[reply]
First off, this thread is about you using 2 of the 4 channels available to the bot. For those that don't know, the bot can only run four jobs at one time. Now, if these are singleton jobs, done by the hundreds (or thousands?) of users who have activated the gadget in their preferences, they should go fast. These uses of the bot do not appear on the bot's user contributions page. But the bot also allows any user to activate it on a category, a userpage, or on a manually entered list. There are a number of users who make a regular practice of activating the bot on such jobs (batches). Just looking over the past few days, there are about a dozen such users active. When there are three batches running, clicking the button while editing an article often times out. I know this because I use that button when creating an article all the time. When there are four batch jobs running, clicking the button always times out, leading to much frustration and complaints here on this talk page. When there are four batch jobs running for couple of hours (or three?) the queue fills up and everybody gets a 503 error. Thus, it behooves us, the big users of the bot, to moderate our usage so as not to lock out the editors who are trying to create content from using the bot on the article they are working on. Now, if one user has discovered a way to trick the bot into running two jobs, and then runs jobs that take almost a day to complete, this greatly increases the chances of time-out failures and 503 errors. This is because all it takes is two more users to active a batch while two jobs are already running. Abductive (reasoning) 03:38, 16 May 2022 (UTC)[reply]
I know all that. Most other user of CB know all that.
As already noted, the solution is for Abductive to not clog up the bot with speculative trawls. BrownHairedGirl (talk) • (contribs) 03:45, 16 May 2022 (UTC)[reply]
I use the bot one job at a time. Even if I stopped using the bot, if four jobs are run at the same time, nobody else can use the bot. This has already happened since you selfishly started running two jobs. It will happen again, and then what will you do? Tell the other users running batches that it's their fault? Abductive (reasoning) 06:41, 16 May 2022 (UTC)[reply]
First off, this thread is about you using 2 of the 4 channels available to the bot. Now you are trying to change the subject by completely brushing off anything BHG had to say about the quality of your lists. If there are 2 issues, we'll discuss the two issues at the same time. No need to ignore one of them.
What metric do you use to create your lists? What is your intent with the lists? BHG has shared her process, so it would be fair for you to share yours. I looked at the contrib list and I see that BHG is basing the jobs off her lists, while you are just basing it off random categories. So it would be intresting to see Abductive's rationale. Why those categories specifically? Rlink2 (talk) 14:56, 16 May 2022 (UTC)[reply]
How do I select them? First I look to see if the bot is overloaded, in which case I don't do a run. If there are zero jobs running I hastily choose a small category to give the bot something to do. I might select a category such as Garden Plants, an area in which I am an active editor. Or I might select a maintenance category, or something in Portal:Current events. If it is a large category, I check through a few histories to see if the bot has been run on members of that category recently.
All users of the bot should be treated equally. A user taking up 50% of the bot's capacity denies use of the bot to the entire rest of Wikipedia. Also, all other users who select categories have the same or lower rate of return as I do. I went through the last few days of the bot's contribs; here are the rates of return for categories—see if you can pick mine from the list: 6.7%, 10.2%, 12.2%, 13.1%, 16.3%, 17.3%, 22.7%, 28.2%, 30.6%, 31.1%, 43.8%, and 63.3% (hint; one of mine). Here are some of BrownHairedGirl recent rates: 14.0% (399/2841, this batch took over 25 49 hours to run), 17.6%, 35.4%, and 47.0%. Abductive (reasoning) 17:09, 16 May 2022 (UTC)[reply]
A user taking up 50% of the bot's capacity denies use of the bot to the entire rest of Wikipedia. I don't know if BHG is using 2 slots or not. Maybe she is, maybe she isn't. But when she was using 2 slots no one was complaning. The complaining only started after your resumed your usage of the bot.
Assuming that your numbers were calucated with no error, the presumed low rate could be because maybe the bot is reaching diminishing returns at this point. BHG will continue to retarget and focus the lists she feeds to the bot.
The average return for categories (using your numbers) is 24%. The average return for BHG's batches is 33.3% (and that is just using 3 numbers, the real rate is probably higher). So clearly, BHGs approach of targeted lists is better than random category selection. Rlink2 (talk) 18:01, 16 May 2022 (UTC)[reply]
You can tell that she is using two channels by looking at the bot's contribs. She started complaining when I returned to the talk page to request an expansion of the size limit. I have been consistently using the bot without interruption—again, check the contribs. The categories are dinged on their rate of return because the bot counts subcategories in the total but doesn't edit them. Also, running a small category is less likely to lock people out of using the bot than a large job, and certainly less than two large jobs. Currently the bot is configured to allow category jobs 1/4 the size of list jobs because categories have a lower rate of return. Are you saying that the category function is somehow illegitimate? Are you saying the the dozen or so people in the past few days who ran categories are breaking a rule? Should they be stopped? Abductive (reasoning) 18:33, 16 May 2022 (UTC)[reply]
It's long past time for Abductive to explain why they persist in clogging-up the bot with low-return tasks when there are higher-return tasks ready to roll. BrownHairedGirl (talk) • (contribs) 19:26, 16 May 2022 (UTC)[reply]
Why don't you explain your recent job of 2841 articles that only edited 399 of them? Enough with the deflecting. Abductive (reasoning) 02:09, 17 May 2022 (UTC)[reply]
If you pick categories at random then of course you will have some times where throughput is really high. And there will be times where the targeted lists where throughtput is low. But we took the average of both, and BHGs approach works out better than the random approach by a significant amount. Rlink2 (talk) 02:52, 17 May 2022 (UTC)[reply]
My choices of how to use the bot are not random, but if randomness is bad, why don't you go to Wikipedia:Random page patrol and demand that they disband, and start a thread at the Village Pump demanding that the Random article link in the sidebar be removed? And answer the question; if my runs are above the average for category runs by other users, why don't you demand that they stop too? Abductive (reasoning) 05:57, 17 May 2022 (UTC)[reply]
@Abductive: I am not the one deflecting. You are engaged in a year-long process of trying to deflect attention from your abuse of the bot by vast numbers of speculative trawls.
Please identify your claim that I made a recent job of 2841 articles that only edited 399 of them.
I think that if the claim is true, I can guess which one that might be (and why) ... but if you are actually serious about having a meaningful dialogue rather than your usual defection tactics, please stop the passive-aggressive games and provide the links which would allow me to verify your claim and identify the reason. BrownHairedGirl (talk) • (contribs) 03:29, 17 May 2022 (UTC)[reply]
Another way of interpreting these past events is that I have done nothing wrong or unusual, and that you have singled out me for unwarranted castigation for reasons unknown. You have started threads about me and other users accidentally running two jobs at once, and then when you do it and I complained, point to your alleged better use of the bot and refuse to stop. Moreover, the difference in bot output between my jobs and yours is underwhelming. It is especially strange that you keep pointing at me because as I have demonstrated, I do better than the average user when calling for runs of categories (and my occasional batch run) for a very simple reason; I know how to find articles that have more citations than the average article. No doubt you know that the search you conducted on your recent batch that underperformed was flawed in some way. I have no problem with that (but one could argue that in your haste and under pressure to create two lists to run the bot on, you are getting sloppy). The only problem is that you are running two jobs at once—even if they had a 100% edit rate instead of your usual rate (which I guess is around 50%) you are still hogging resources meant for everybody—when other people try to run the bot on two jobs, they get the message Run blocked by your existing big run. Abductive (reasoning) 05:57, 17 May 2022 (UTC)[reply]
Quit waffling. You made an allegation, so post the links to support it. BrownHairedGirl (talk) • (contribs) 06:43, 17 May 2022 (UTC)[reply]
Okay, if you look at the bot's contribs, right now you are running two jobs in violation of the bot's rules. One job has 1432 members and one has 3847. Abductive (reasoning) 07:33, 17 May 2022 (UTC)[reply]

Abductive's attrition strategy of deflect-and-counter-attack has reached a new low: a claim about my work which was unsupported usupported by evidence, and after my 3 demands for the evidence, Abductive came to my talk to tel mw to do my own research into his claim. I refused, and when he went to collect the evidence, he had to admit that it was bogus. See User talk:BrownHairedGirl#So_we_don't_get_distracted;_you_may_find_this_useful (permalink)

This time-wasting menace has had enough of my time (as well as enough of Citation bot's time), so I won't reply further in this thread.

It would be nice if Abductive made an unequivocal apology and strck out the paras of nonsense based on his bogus claim, but I don't expect it. --BrownHairedGirl (talk) • (contribs) 07:38, 17 May 2022 (UTC)[reply]

Yes, I screwed up the example of a job where I said User:BrownHairedGirl got a 14% rate of return, and I apologized for that. But all I had to do was look in the bot's contribs to find one where her rate of return was 10.9%. Interested readers can see it for themselves here: At 22:22, 15 May 2022, BHG starts a job with 304 articles. The bot made only 33 edits to that batch, which ended at 00:27, 16 May 2022. But running the bot on any list or category, even one with a low rate of return, is a perfectly legitimate use of the bot. No, the real problem is while that job was running, she was simultaneously running a job on 2061 articles, beginning at 14:46, 15 May 2022 and ending at 06:34, 16 May 2022. That batch saw 447 edits, or a rate of return of 21.7%. Running two jobs at the same time is an abuse of the bot. Abductive (reasoning) 19:15, 17 May 2022 (UTC)[reply]
So: no proper apology, just an another attempt to cherry-pick a batch to prove a point and try to deflect attention away from Abductive's own endless run of low-return speculative trawls.
However, Abductive fails to follow the evidence properly. Like all my main work with Citation bot over the last year, that job of 304 articles is fully documented, in his case at User:BrownHairedGirl/Articles with bare links (edit | talk | history | links | watch | logs). That particular list of 304 articles is at special:permalink/1088043457#Lists, which is the list titled "Lists Bare URLs tagged in May 2022, as of 13 May - part 3 of 3". That page:
  1. set out the selection criteria
  2. lists all the articles which have been submitted to the bot in that batch
Note that in every case the articles have been chosen because every article on the list has a bot-fixable problem: a bare URL which should be filled. Note that to ensure that the bare URLs in these lists are actually fillable, I spent hundreds of hours tagging dead bare URLs with {{Dead links}} (including over 60,000 in February alone, using a suite of Perl scripts which I wrote for the purpose). For the same reason, I have also tagged bare URLs of types which the bot cannot fill: see the transclusions of {{Bare URL PDF}}, {{Bare URL image}}, {{Bare URL plain text}} and {{Bare URL spreadsheet}}. Bare URLs with those tags are ignored when making the list, adding a further level of targeting.
It's all documented, with full transparency. All the lists are in the page's history, with edit summaries identifying each list. There are well over 500 such lists, all documented at User:BrownHairedGirl/Articles with bare links (edit | talk | history | links | watch | logs) or User:BrownHairedGirl/Articles with new bare URL refs (edit | talk | history | links | watch | logs).
In the last year I have put over 3,000 hours of work into this project to tag and fill bare URLs is well documented. It has reduced the total number of articles with bare URLs from over 470,000 in May 2021 to ~140,000 in May 2022, despite articles with new bare URLs being added at a rate of over 300 per day. Anyone can roughly verify those numbers by running this (slow) search for untagged Bare URLs, and by looking at the page count of Category:All articles with bare URLs for citations (currently 54,197). Note that the total of those two is an overestimate, because they overlap.
Yet despite all that, I am being attacked here by a menace of an editor who has hogged the bot for much of the year, but has never to my knowledge documented any of their uses of the bot. Abductive has never explained the basis for choosing a single one of the thousands of categories which they have fed to the bot, not for a single one of the tens of thousands of individual pages with they repeated clog the bot's queue by piling up so many requests that other requests time out.
Citation bot is a invaluable tool, which is at its most effective when used in a targeted way. But despite Abductive having been asked many time by many editors to stop their speculative trawls and stop clogging the bot queue, they come here with the utter hypocrisy to accuse me of "abusing" the bot.
This is Wikipedia at its worst. The editor running a long-term project, thoroughly-documented, with proven success is being attacked by a long-term menace who puts precisely zero verifiable effort into selecting the batches with which they hog the bot. And they are free to waste hours of my time in this vindictive effort to smear my work, whilst making zero contribution to assist it.
Abductive complains about me running two batches at once. But since Abductive is interested only in attack and in finding decontextalised nuggets in the hope hat if hey hurl enough muck some of it will stick ... and in the knowledge that whether or not it sticks, they can waste hours of my time in rebutting their smears. This is a classic attrition strategy, which sadly is usually tolerated on wiki, and worst still actually empowered by the community's enthusiasm for yelling "uncivil" anyone who replies harshly to this sustained goading.
In this case, the reason for my running two simultaneous jobs is already explained above, in my post of 20:50, 15 May 2022.[1] The batch which Abductive points to here was a batched of tagged Bare URLs. Those batches have a much lower low rate of return than my batches of untagged bare URLs, because many of those tagged bare URLs were tagged after failing to be filled by other tools. In other words this is the return on the most problematic set.
To try to get added benefit from that low-return set, I have taken in the last ten days to leveraging those low-return batches to feed other higher-return jobs. I have done that by checking each edit in which the bot has filled a bare URL, finding domains where the bot has filled a ref, and building secondary lists which consist solely of bare URLs to fixable domains. This is very time-consuming work, but it has been very effective, reducing the total number of articles with bare URLs by almost 20,000 in the last 15 days (a fall of over 10%), and reducing the number of bare URLs on many of the remaining pages.
These two tasks have to work in parallel: I can build the lists of fixable bare URLs only by analysing the previous batches.
But Abductive objects, because me doing this reduces the number of bot slots available for Abductive's low-return, undocumented, unexplained speculative trawls. And Abductive calls my productive work an abuse of the bot. YCMTSU.
Now, having wasted yet another hour of my time on rebutting Abductive's malicious nonsense, I need to go cook supper for my hungry partner. BrownHairedGirl (talk) • (contribs) 20:57, 17 May 2022 (UTC)[reply]
Honestly I think things will be easier if Abductive stopped using the bot until he releases a detailed plan about how he plans to use the bot and people agree to the plan. To his credit, I think he has explained a little bit of his methology, but not in a way that everyone can understand.
As for randomness, I think Abductive is missing the point that Citation bot has limited capacity. People can read articles and click the "random article" button at the same time. People can do random page patrol and write articles at the same time. But citation bot only has limited capacity, so it must be used in a way that maximizes its capacity. BHG is doing her best to maximize the output of the bot. Rlink2 (talk) 22:19, 17 May 2022 (UTC)[reply]
It's a double standard that BHG can constantly run huge jobs (sometimes two at a time) that get lower rates of return than my runs, then tell me that my above average, infrequent, small jobs are somehow anybody's business. Especially if the bot is not overloaded, which I try my best not to do. Abductive (reasoning) 02:08, 18 May 2022 (UTC)[reply]
Yet again, Abductuve ignores nearly all that has been written and cherrypicks an edge-case job of mine to misrepresent it as typical of the whole of my work ... and misrepresents Abductive's own jobs as more productive, when most of their edits are trivial, and zero evidence s offered of their climed effectiveness.
This is Abductive's attrition strategy: throw out masses of unevidenced false assertions, knowing that it will take others ages to document their falseness ... and then Abductive will ignore the rebuttals anyway. BrownHairedGirl (talk) • (contribs) 03:11, 18 May 2022 (UTC)[reply]
I invite interested readers to look through the bot's contribs and see for themselves what is really going on. And I always respond to fellow editors. Abductive (reasoning) 03:34, 18 May 2022 (UTC)[reply]
They could usefully start by looking at my latest completed batch of articles with known-fixable bare URLs. This list of 2800 bot edits shows the whole of my latest batch of 1,850 articles with articles with known-fixable bare URLs. 607 of the 1,850 pages were edited; 56 of those edits filled at least one bare URL (look for "Changed bare reference to CS1/2" in the editsummary).
A explained above, that highly-productive run was possible because I found the fixable bare URLs in the less-productive run of tagged bare URls.
But I am sure that this evidence of every effective targeting of a long-standing backlog will not deter Abductive from more sniping. BrownHairedGirl (talk) • (contribs) 06:46, 18 May 2022 (UTC)[reply]
You know what they'll notice? That four jobs are running right now, including two of yours, and that means nobody else can use the bot. Abductive (reasoning) 07:00, 18 May 2022 (UTC)[reply]
No, that's not what they will notice because most editors don't get to see the engine room. But what they do see is nearly cosmetic edits like this one of yours and want to know why it makes it so important that their one-shot job timed out yet again. And how many nul evaluations were run against how many pages to deliver that one stunning edit. --John Maynard Friedman (talk) 07:17, 18 May 2022 (UTC)[reply]
Above, on this talk page, you complained that you got timed out. Now you are complaining again that your one-shot timed out. But here's the thing: because there are four batches running (none of them are mine) everyone else who tries to use the bot right now will get the time-out failure. Who ran the fourth job that just locked everybody else out? The user who was already running a job. Why did the bot do a small edit on swastika? I dunno, I ran the bot on it because it was on the Front Page, along with all the other articles that were on the Front Page at that moment. Abductive (reasoning) 07:48, 18 May 2022 (UTC)[reply]
My bad, it seems the dam has broken and the bot is now running requests I made 7 hours ago. Abductive (reasoning) 07:57, 18 May 2022 (UTC)[reply]
Indeed, @John Maynard Friedman, a high proportion of the bot edits triggered by Abductive are purely cosmetic. Hyphen-to-dash in that case, and lots of edits which just change a curly quote mark to a straight one, or remove redundant parameters or change the template type. The bot should not even be making those changes as standalone edits: such trivial tweaks should be done only as part of a more major edit.
Meanwhile my run of 1850 articles includes edits like this one[2], which filled 34 bare URLs refs. It didn't happen by accident; it was part of my targeting of bare link Youtube refs, the tally of which has in the last few days has been brought down from over 1,000 to under 100.
All those null evaluations in Abductive's speculative trawls take the same amount of bot time as evaluation which actually find a needed fix. Some bots can evaluate with trivial effort, and do the heavy listing only if a problem is found; but with Citation bot every page gets the full check ... which is why abusing it for speculative trawls is so wasteful. BrownHairedGirl (talk) • (contribs) 07:39, 18 May 2022 (UTC)[reply]
You just ran some huge jobs that got 10.9% and 21.7% edits. Talk about wasted bot time. Abductive (reasoning) 07:48, 18 May 2022 (UTC)[reply]
Dash changes are not cosmetic nor are curly quote changes. It may be worthwhile to make these as minor edits, but they are not cosmetic. Izno (talk) 19:40, 18 May 2022 (UTC)[reply]
I said "almost cosmetic". The kind of change you might do in passing in the course of a significant edit. Yes, if there were no problems with capacity, it would be unremarkable. But while there are and the effect is to get in the way of more productive edits, then it is certainly not worthwhile to make these trivial edits. --John Maynard Friedman (talk) 20:02, 18 May 2022 (UTC)[reply]
While we can look at Special:Contributions/Citation bot to find the edits the bot has made, does the bot actually log anywhere edits made versus the batch size it was requested to search against? I've seen on this page and on BrownHairedGirl's talk page, Abductive pointing out what appear to be unsupported percentages relating to job success rates. Just a few hours ago Abductive said You [BHG] just ran some huge jobs that got 10.9% and 21.7% edits. How has this success/fail rate been calculated? It seems to me as though at least some of this conversation would be easier to agree/disagree with if we had verifiable numbers.
That said, while I can sympathise with Abductive's point to a degree, I also agree that BHG's use of the bot is very productive and certainly more so at a glance than Abductive's. If I check the most recent 50,000 edits by Citation bot, only 146 of those are by Abductive. What was the batch size that resulted in those 146 edits? And is that actually helping to address issues like link rot, bare or partial citations, or is it primarily cosmetic per the example given by John Maynard Friedman?
I definitely agree though, regardless of the answers above, that targeted use of the bot, as BHG is doing, is a far more effective use of its resources than running it against random categories. At the very least, running it against maintenance categories like Category:Articles with bare URLs for citations is a much better idea than running it against (picked at random) Category:Military history Sideswipe9th (talk) 16:15, 18 May 2022 (UTC)[reply]
The batch sizes are given behind the slash. So right now a job is running that happens to have 639 members. Then I typically expand the number of contribs to 500 or more (for example 3500) and use 'Ctrl F' to have it count the number of instances that /639 appears (once the job is done, of course). Dividing gives a rate. Abductive (reasoning) 16:51, 18 May 2022 (UTC)[reply]
So does that mean, to pick one of your edits as an example Add: s2cid. | Use this bot. Report bugs. | Suggested by Abductive | #UCB_toolbar then that this edit was not requested as part of a batch, but as a single page request?
If so, then how are you determining your success rate? And how can we verify that? While there were 146 edits at the time I made my reply, 157 now, all of them have UCB_toolbar and no number at the end of the edit log. If the contributions page for the bot only shows successful edits, how can we determine how many unsuccessful single page requests you have made? Sideswipe9th (talk) 17:00, 18 May 2022 (UTC)[reply]
With those I usually just do all the articles that appear on the Front Page every day, and the Recent Deaths, but I skip ones that I have done before, like World War II, so it would be difficult for an outside observer to see what the success rate was. But if an article has very few refs, it goes very fast, and if it has a lot of refs, it usually finds an error. I estimate around 40% for the ones I click. My success rate for categories is higher that the average user who selects a category, because I pre-screen those by looking at a sample of histories to see if it or a related category has been run on them lately. I also look for topics that have more citations, as they will have a higher success rate. But I must defend the rights of all users to run whatever they wish. For example, Category:Theoretical computer science stubs would have a poor success rate, but also be over very quickly since there aren't very many citations to check. Abductive (reasoning) 17:16, 18 May 2022 (UTC)[reply]
But I must defend the rights of all users to run whatever they wish. If someone else is having an issue using the bot, they can speak up. If you are unable to use the bot, say so. Maybe its working for everyone except you. It is best to speak for yourself. Rlink2 (talk) 19:32, 18 May 2022 (UTC)[reply]
Well it is time to challenge the "right of all users to run whatever they wish", when their justification is so thin that the effect of what they are doing is WP: disruptive and one has to ask whether Wp:nothere applies. See also Tragedy of the commons --John Maynard Friedman 20:02, 18 May 2022 (UTC)[reply]
I agree with @John Maynard Friedman, and I find John's pointer to Tragedy of the commons very timely.
We have in Citation bot a very powerful tool, which can do great work to improve Wikipedia's compliance with the core policy of WP:Verifiability.
However, the bot has limited capacity. So we have a choice of approaches:
  1. Strive for efficiency: We try to regulate use of the bot to improve its efficiency, at least by eliminating the least efficient uses
  2. Free for all. We say "sod efficiency", and just leave editors to use the bot however their whimsies take them.
I am firmly in the "strive for efficiency" camp, and I agree with John that there are WP:NOTHERE issues with the other approach. Whatever You Want is great music, but a terrible way to allocate scare resource. BrownHairedGirl (talk) • (contribs) 20:47, 18 May 2022 (UTC)[reply]
If we can't take a third option of adding another two or more channels to the bot, to increase the resources available to it, perhaps even "protected" in some way to only allow the types of use that Abductive is advocating for, then I would be in favour of striving for efficient use of resources over allowing a free-for-all. Sideswipe9th (talk) 21:42, 18 May 2022 (UTC)[reply]
Abductive is advocating for the free for all, not efficiency. Surely you mean BHG here. Headbomb {t · c · p · b} 21:51, 18 May 2022 (UTC)[reply]
Just a small technical point in reply to @Sideswipe9th's thoughtful comment.
running it against maintenance categories like Category:Articles with bare URLs for citations is of course right in spirit, but not quite complete in technical detail.
I know that Swipe's comment was intended as a suggested approach rather than a how-to, but I just thought I'd note the issues in case anyone is inclined to literally try that well-intentioned example.
That particular category is a container. For all articles with WP:Bare URLs, see Category:All articles with bare URLs for citations, or see the monthly subcats such as the current Category:Articles with bare URLs for citations from May 2022.
But ... beware.
Those categories include articles identified as having bare URLs which are not bot-fixable: Citation bot cannot get a title for a PDF file or an image or a spreadsheet. Most of those unfixable file types are tagged with specific templates such as {{Bare URL PDF}}, {{Bare URL image}}, {{Bare URL plain text}}, and {{Bare URL spreadsheet}}. Asking the bot to fill those refs will just waste the bot's time ... and about 75% of all the currently-tagged bare URLs are one of those bot-unfixable types.
So the method I developed is to select only articles tagged with {{Bare URL inline}} and/or the banner template {{Cleanup bare URLs}}. That means that the bot will not waste its time on articles where all the bare URLs have been identified as PDFs. For an example, see my current batch Bare URLs tagged before May 2022, as of 16 May - part 2 of 3, which was built using this Petscan search and then refined by an AWB pre-parse.
I doubt that anyone else will try to replicate that particular task. I am just using it to illustrate how a lot of care is need in identifying which maintenance categories actually collect bot-fixable issues, and hence are a suitable basis for a CB batch job. BrownHairedGirl (talk) • (contribs) 21:30, 18 May 2022 (UTC)[reply]
I appreciate this clarification! The example categories I used were for illustrative examples towards a generalised approach, and not a specific how to as you've said. Thanks :) Sideswipe9th (talk) 21:38, 18 May 2022 (UTC)[reply]

State of the bot[edit]

The command line does work, if you have Oauth Tokens. Extensive code coverage improvements have been implemented and a huge number of new test cases have been added. This has found a couple of small bugs and should prevent new bugs. The code base has been shrunk significantly by merging duplicate code functionality into functions - this should significantly speed-up bug fixes and code development and reduce new bugs. AManWithNoPlan (talk) 01:29, 17 May 2022 (UTC)[reply]

That's great, @AManWithNoPlan. Huge thanks to you for all your hard work keeping the bot running, and continually improving it. BrownHairedGirl (talk) • (contribs) 03:33, 17 May 2022 (UTC)[reply]
The slowest step - by far - in citations is determining that DOI's are broken. That was a surprise. AManWithNoPlan (talk) 15:57, 17 May 2022 (UTC)[reply]
I am now monitoring the error logs on tool forge. Found a bug in gadget mode. My apologies for that being highly unreliable. AManWithNoPlan (talk) 19:05, 17 May 2022 (UTC)[reply]
That's time-consuming work, @AManWithNoPlan, but very valuable in improving efficiency.
I hesitate to add to your high workload, but here's an if-you-have-time-and-inclination suggestion: does the bot log failures or successes of the zotero to return a page title?
I know that any such logs would be huge, but if they could be made even for a few days, then they would invaluable for my work in bare URL cleanup. They would help identify websites where bare URLs rarely or never get filled by Citation bot, and that would allow more targeted processing.
They would be invaluable for @Rlink2's work on BareRefBot. If we could identify websites where Citation bot cannot fill the title, then articles with those URLs could be pre-processed by BareRefBot to cleanup those URLs. I know of some high-profile ones like and, but if we had a longer list then BareRefBot could make a huge dent in the bare URL backlog without burdening Citation bot for a first pass. It might also be feasible to tag such URLs as not-CB-fixable, and thereby exclude those bare URLs from my CB list-making. BrownHairedGirl (talk) • (contribs) 22:22, 17 May 2022 (UTC)[reply]
I have wondered the same thing over a year ago, and they don't. But, I will think about if it is possible to put stuff in the error log, other than crashes. AManWithNoPlan (talk) 23:26, 17 May 2022 (UTC)[reply]
I was thinking that it would need to be a pair of new logs (e.g. ZoteroNoTitle and ZoteroGotTitle), so that they would be free of other data and it would be would simple to compare them to distinguish transient failures from persistent failures. BrownHairedGirl (talk) • (contribs) 23:48, 17 May 2022 (UTC)[reply]

My new best friend will be file_put_contents( $filename, $data, FILE_APPEND); AManWithNoPlan (talk) 00:14, 18 May 2022 (UTC)[reply]

The People Endorse Your Wise Choice Of Friend. By 102% of the votes in a 105% turnout. BrownHairedGirl (talk) • (contribs) 00:26, 18 May 2022 (UTC)[reply]
Also, tracking failed ones would allow the bot to skip them in the future and speed up runs. AManWithNoPlan (talk) 15:57, 18 May 2022 (UTC)[reply]
Tracking is now running. Existing runs will not track. Files are ZoteroWorked & ZoteroFailed. Note that is includes any failure, including page not found etc. AManWithNoPlan (talk) 17:36, 18 May 2022 (UTC)[reply]
Wow! @AManWithNoPlan, you are on the case as fast a fit terrier. This is great work.
Can you give me any idea of the structure of the data files (or a few sample lines), so that I can begin thinking about how to analayse them? I want to be able to distinguish difft types of failure.
If this allows CB to skip some sites, that will be a big help to CB's efficiency.
@Rlink2: this development is very significant for bare URL cleanup. It will be a big help for BareRefBot. If we can identify websites for which Citation bot cannot fill the ref, the there will be no need to run CB on those pages first: BareRefBot can just get to work on any links to those sites.
If BareRefBot builds its own list of websites for which it can never get a title, then we can intersect that with the CB-cannot-fix set to build a list of bot-unfixable websites. That would allow us to tag those bare refs as bot-unfixable, which would allow them to be excluded from bot list selections and possibly to display a note that they are bot-unfixable. BrownHairedGirl (talk) • (contribs) 20:23, 18 May 2022 (UTC)[reply]

Bot converts a citation of a web page to a citation of a book mentioned on that web page[edit]

new bug
Reported by
– Arms & Hearts (talk) 21:53, 17 May 2022 (UTC)[reply]
What happens
The bot converts {{cite web}} citations to catalogue entries for books to {{cite book}}
What should happen
The bot should leave alone. If there was a good reason for the book to be cited instead of the web page (and if it was hypothetically possible for the bot to determine that), it should do so in the way books are usually cited, i.e. without the |website= or |access-date= parameters, rather than a muddled halfway approach (as a result of which it seems to tell us, for example, that the page on the Naval Marine Archive website was published in 1961).
Relevant diffs/links
We can't proceed until
Feedback from maintainers

Web->Book: I don't think that it was right in this case...[edit]

new bug
Reported by
Shaav (talk) 19:29, 18 May 2022 (UTC)[reply]
What happens
web citation converted to book
What should happen
shouldn't change
Relevant diffs/links
We can't proceed until
Feedback from maintainers

I suspect because the citation is for a webpage about a book, that the 'web' citation was replace with a 'book' citation... but it really was intended to be a citation for the website (I created it) because the citation is supporting the existence of the book and it's properties. I'm not sure if there's something that can be included so that it doesn't get converted again in the future. — Preceding unsigned comment added by Shaav (talkcontribs) 19:29, 18 May 2022 (UTC)[reply]

That looks fine to me? The title field still gets transformed into a link in the reflist as before. The only difference to the reader is that the title is now italicised, which makes sense as the citation should be to the book based upon how it's being used in context? Sideswipe9th (talk) 20:23, 18 May 2022 (UTC)[reply]