iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: http://en.m.wiktionary.org/wiki/Wiktionary:Beer_parlour/2022/February
Wiktionary:Beer parlour/2022/February - Wiktionary, the free dictionary

Away for a week

edit

I am going to be away on business for the next week or thereabouts. I expect everyone to be on their best behavior while I'm gone, and please try to finish writing the dictionary by the time I get back. Cheers! bd2412 T 07:07, 1 February 2022 (UTC)[reply]

I'm confused: I thought this dictionary was finished. Isn't this some alternate reality game at this point? —Justin (koavf)TCM 12:53, 1 February 2022 (UTC)[reply]
Rest assured we shall make good use of your absence. Teehee!  --Lambiam 17:04, 1 February 2022 (UTC)[reply]
Quick, while they're not looking! RF all their entires! Vininn126 (talk) 17:28, 1 February 2022 (UTC)[reply]
I'm back early, so I guess I have to excuse the fact that, contrary to Koavf's thinking, the dictionary is not finished. Perhaps even more disappointing, not a single one of my ~14,000 entries have been RF'ed. bd2412 T 04:52, 7 February 2022 (UTC)[reply]
I apologize. —Justin (koavf)TCM 05:04, 7 February 2022 (UTC)[reply]

Alternative forms

edit

So recently me and @Max19582 have run into a problem where only one definition of a word seems to have alternative forms. @DTLHS has made inline alt forms as a potential solution, or we could talk about EL. Either way this issue needs to be addressed. Vininn126 (talk) 20:26, 6 February 2022 (UTC)[reply]

Couldn't {{sense}} work? AG202 (talk) 21:19, 6 February 2022 (UTC)[reply]
It could work, but it is obviously inferior and should be discouraged in favor of associating the information directly with the sense in question. DTLHS (talk) 21:50, 6 February 2022 (UTC)[reply]
So you prefer either the template or the alternative header layout? Vininn126 (talk) 22:26, 6 February 2022 (UTC)[reply]
If there is some sense specific alternative form (I don't think that this is a common occurrence) then the template layout. DTLHS (talk) 01:30, 7 February 2022 (UTC)[reply]
This template is useful; thanks for making it; I've run into this issue myself. (I can't offhand recall an entry where all the senses are most common in one spelling, but one sense has alt forms, although I know it happens; a related phenomenon is e.g. bassinet being either a baby-bed or an alt form of bascinet, which itself has other spellings.) - -sche (discuss) 21:38, 13 February 2022 (UTC)[reply]
It happens a lot with initialisms, where they stand for something that doesn’t warrant an entry in its own right. See something like OC, where the current synonym template is less than ideal when OOC stands for exactly the same thing in the roleplaying sense. I can also see it being useful for words like judgment, where AmEng and BrEng only differ for certain senses (judgement is BrEng except when referring to legal judgments). Theknightwho (talk) 15:51, 21 February 2022 (UTC)[reply]

Updates on the Universal Code of Conduct Enforcement Guidelines Review /community calls

edit
You can find this message translated into additional languages on Meta-wiki.

Hello everyone,

The Universal Code of Conduct (UCoC) Enforcement Guidelines were published 24 January 2022 as a proposed way to apply the Universal Code of Conduct across the movement. Comments about the guidelines can be shared here or the Meta-wiki talk page.

There will be conversations on Zoom on 25 February 2022 at 12:00 UTC, and 4 March 2022 at 15:00 UTC. Join the UCoC project team and drafting committee members to discuss the guidelines and voting process.

The timeline is available on Meta-wiki. The voting period is March 7 to 21. See the voting information page for more details.

Thank you to everyone who has participated so far.

Sincerely,

Movement Strategy and Governance
Wikimedia Foundation --Mervat (WMF) (talk) 09:35, 7 February 2022 (UTC)[reply]

Stripping Karakhanid diacritics

edit

Karakhanid has page titles including ornaments on the letters, اِكّٖى instead of اكى and مَنْ instead of من. Was this a deliberate choice? It is inconsistent with Arabic, where vowel marks and shadda are stripped from links. Vox Sciurorum (talk) 20:47, 7 February 2022 (UTC)[reply]

@Vox Sciurorum: It should be اكى and the head template should be {{head|xqa|numeral|head=اِكّٖى|tr=ikkī}}. Allahverdi Verdizade (talk) 22:15, 7 February 2022 (UTC)[reply]
By the way, not all entries in Diwān should be included as Karakhanid. If it says "Oghuz" or something to that effect, it's not Karakhanid. Allahverdi Verdizade (talk) 22:18, 7 February 2022 (UTC)[reply]
‌ Thanks. Should Karakhanid use U+06a9 ARABIC LETTER KEHEH, U+0643 ARABIC LETTER KAF, or both? I'm looking at کَذْجْ (~ Turkish genç) which apparently has a typo, ذ for ن, so I am suspicious of the K as well. Vox Sciurorum (talk) 21:22, 8 February 2022 (UTC)[reply]
@Vox Sciurorum: U+0643. The category Karakhanid lemmas currently still has quite small number of entries, so if you wish, you can adopt the language and reorganize it the way you see fit. It's convenient, since basically all vocabulary comes from Diwan, which is transcribed and translated in {{r:xqa:DLT}} Allahverdi Verdizade (talk) 21:43, 8 February 2022 (UTC)[reply]
I found a copy of the reference on archive.org. Kaf is used rather than keheh. I don't know how to determine which words belong to Karakhanid as we know it. Every entry that does not mention Oğuz? Vox Sciurorum (talk) 13:16, 9 February 2022 (UTC)[reply]
If it says "In the language of Oghuz/Arghu/whatever other "dialect", then it's Karakhanid. I think those can still be added under alternative forms/some sort of notes, it is going to be useful, but just not as main entries under ==Karakhanid==. Allahverdi Verdizade (talk) 14:01, 9 February 2022 (UTC)[reply]
Is everything in there presumptively Karakhanid? I thought the book was a compilation of what we would consider different Turkish languages. Page 4: "Türk, Türkmen, Oğuz, Çiğil, Yağma, Kırgız boylarının dillerini..." If it is all Karakhanid that also solves the problem of attributing Nişanyan's references to it; they can all be {{cog|xqa}}. Vox Sciurorum (talk) 17:33, 9 February 2022 (UTC)[reply]
None of that is Karakhanid, except from Türk - which is how Kahsghari himself refers to Karakhanid. We don't always know which variety is meant by the different labels he uses. But despite the title of the dictionary "Collection of Turkic languages" a vast majority of the material in it comes from the native language of Kashghari, which is, surprisingly, the variety of Kashghar. Allahverdi Verdizade (talk) 21:26, 9 February 2022 (UTC)[reply]

We should allow romanized Sanskrit (IAST) as entries

edit

Sanskrit has a standard romanization scheme - the IAST. Currently, we allow Sanskrit entries in Devanagari as the main entry and those in the other Indic script as alternative forms (see Category:Sanskrit terms by script). This is done considering that Sanskrit "has no single script associated with it" (quoted from WT:About Sanskrit). I believe that romanized entries in IAST should be allowed as alternative forms as well for these reasons:

  1. IAST-romanized Sanskrit is used in many modern scholarly works.
  2. We currently allow romanized Pali (see WT:About Pali). It's inconsistent not to allow romanized Sanskrit.
  3. We currently allow romanized Mandarin and Japanese. This indicates that it's our tradition to allow romanized entries if there is a standard romanization scheme for that language in widespread use .Jonashtand (talk) 09:02, 8 February 2022 (UTC)[reply]
special:search/wt:vote sanskritFish bowl (talk) 09:06, 8 February 2022 (UTC)[reply]
  • Support Both for consistency with other romanization entries and the fact that I, as someone who is not knowledgeable of Sanskrit and has a passing interest in some topics related to India, have come across IAST in the wild. It would be very nice if I could look up these terms in a dictionary. —Justin (koavf)TCM 17:31, 8 February 2022 (UTC)[reply]
(Notifying AryamanA, Bhagadatta, Svartava2, JohnC5, Kutchkutch, Inqilābī, Getsnoopy): Pinging the Sanskrit working group. AG202 (talk) 17:40, 8 February 2022 (UTC)[reply]
My position is unchanged (i.e. oppose but I think a vote on this is silly) from the last time this was proposed: Wiktionary:Votes/pl-2018-12/Allowing_attested_romanizations_of_Sanskrit. Pali has liturgical books published in the Latin script, because many Buddhists preferred to standardise on one script (and I don't see any of the Brahmic scripts having a clear majority in usage for Pali). That standardised script for Sanskrit has been Devanagari in modern times, and romanisations never caught on as much. You can search for romanisations and find the Sanskrit entry anyways. I'm for entries in other Brahmic scripts for Sanskrit because there are native traditions of using such scripts for Sanskrit, but the use of Latin script for Sanskrit is limited to linguistic and pedagogical works.
It's actually Latin-centric to suggest this in my opinion. We have a principle of recording words as they are attested in usage (not mentions). There are tons of scripts that people do not know, making an entry for every language using a non-Latin script at best seems a hassle and at worst is weirdly biased, especially when we have a perfectly good search function. —AryamanA (मुझसे बात करेंयोगदान) 20:09, 8 February 2022 (UTC)[reply]
From an end-user perspective, I do not see how having automatically generated romanization pages, requiring near zero human effort, along the lines of the Chinese or Gothic ones, would hurt. These entries would not even have to be listed under alternative forms and would not show up in any lemma categories; they would be there solely for convenience.
There are certainly people who are interested in Sanskrit (or Indo-European comparative linguistics) who do not know the Devanagari script, and it's not uncommon to encounter romanized words, e.g. in etymological dictionaries. I would count myself among this group. Sure, the search engine will technically pick it up since Module:sa-headword adds the romanization automatically, but it's less user-friendly and there can be hits from other languages too. For a concrete example, let's say someone encounters "śvan" or "śván" and wants to look it up. Then they get sent to the page Special:Search/śvan (which redirects to another language entry). None of the "See also" links at the top has the Sanskrit entry. Or they land on Special:Search/śván which don't have the Sanskrit term either. If you search for Special:Search/śván Sanskrit, then the top results are other language entries that mention Sanskrit in their etymology section, so it's still not an ideal experience.
That being said, if most of the Sanskrit editors are opposed to it, I will defer to them. I just think that usability for our reader base, who are not necessarily well-versed in Indic scripts, should be a primary consideration. 70.172.194.25 21:10, 8 February 2022 (UTC)[reply]
The arguments anon is making are difficult to disagree with. Allahverdi Verdizade (talk) 21:21, 8 February 2022 (UTC)[reply]
I don't care either way much but there's nothing specific to Sanskrit about this. Propose this for every non-Latin script language with a standard romanisation or none at all. —AryamanA (मुझसे बात करेंयोगदान) 00:56, 9 February 2022 (UTC)[reply]
I anticipated this response, and honestly, I wouldn't necessarily be against that. It would require more thought, though, since in my eyes Sanskrit meets some criteria that make supporting romanizations a particularly good idea:
  • One standard romanization scheme is widely adopted, instead of several competing standards from which we would have to pick one arbitrarily, or support multiple.
  • It is common to encounter excerpts of romanized Sanskrit "in the wild" in works of various forms. To quote CFI, "A term should be included if it's likely that someone would run across it and want to know what it means." If the romanization isn't likely to be encountered, then there is not much value in supporting it. (I don't think requiring attestation of specific romanizations is a good use of editor-time, though, when that work could instead go into actual entries.)
  • As a subpoint to the latter consideration, there is even a history of publishing works in the language entirely in romanized form.
For other languages in non-Latin scripts the case may be less clear, although as I said before, I'm not going to take a general stance on the matter.
I do view the romanizations for Sanskrit as a stopgap measure, though. An alternative would be improving the search engine so that it handles such cases as "śvan" better (and perhaps allowing searching within a specific language), but I do not know how hard that would be to accomplish. 70.172.194.25 01:09, 9 February 2022 (UTC)[reply]
Object to all romanizations, like Japanese, or none of them, is just as fair an argument. I'm generally for romanizations, especially where they're standardized. It is Latin script centric, but Wiktionaries for Latin-script-centric languages should be Latin script centric.--Prosfilaes (talk) 02:57, 9 February 2022 (UTC)[reply]
  Weak support if Devanagari remains the main script. As a Sanskrit editor, I've come across lots of works with most presenting terms and phrases in the Latin script. Most if not all Sanskrit-English dictionaries also present Romanisations. Some like [1] also give quotations in Latin script without presenting a Devanagari one. Typically, non-Indians who don't know Devanagari would find Sanskrit Romanisations useful. So, in all, I don't think it's that bad an idea, even if it is somewhat inconsistent with other languages. Svartava2 (talk) 02:55, 9 February 2022 (UTC)[reply]
My feeling is we should include romanizations for important words that do not qualify as English but are often quoted in italics in historical works. Maybe not a workable rule. Also, see Wiktionary:Votes/pl-2014-06/Allowing attested romanizations and Wiktionary:Votes/pl-2018-12/Allowing attested romanizations of Sanskrit (both failed). Vox Sciurorum (talk) 17:39, 9 February 2022 (UTC)[reply]
  Weak support keeping the IAST as redirects to Devanagari.--Rishabhbhat (talk) 05:12, 13 February 2022 (UTC)[reply]
Otherwise,   Lol, no as eloquently stated by JohnC5. --Rishabhbhat (talk) 04:08, 15 February 2022 (UTC)[reply]
  Oppose per AryamanA and the previous votes & discussions, unless this proposed for every non-Latin script. Since the use of Latin script for Sanskrit is limited to linguistic and pedagogical works, it should not be given the same importance as the other scripts used for Sanskrit.
According to User:Atitarev in the previous votes:
[A]llowing romanised Sanskrit entries won't improve the state of Sanskrit contents at Wiktionary and [may] mislead users that it's [just as] OK to write Sanskrit in Roman [as it is to write Sanskrit in Brahmic scripts].
When Romanisations are allowed as entries, they are eventuall[y] treated as regular native scripts, native word, a replacement for difficult script.
Also, IAST does not have a one-to-one correspondence with Devanagari, which is why at Module:sa-utilities/documentation it says:
Although In the mainspace, only IAST is used,
In some modules, SLP1 is used for convenience of encoding because of its one-to-one encoding with Devanāgarī.
Kutchkutch (talk) 04:04, 15 February 2022 (UTC)[reply]
In response to it should not be given the same importance as the other scripts used for Sanskrit: there is a difference between allowing IAST as an alternative form and allowing non-lemma romanizations. If a secondary romanization scheme were adopted, the IAST form would not be listed under alternative forms and would not even have to be linked at all from the main lemmas (although I don't see much harm in doing so). The difference is subtle but it's there, and it also means that you'd have to click through to Devanagari to see the definitions, which may help readers to understand that the romanization is secondary. Anyway, I would guess that a Sanskrit word in IAST is significantly more likely to be encountered by a user of English Wiktionary than a Sanskrit word in most of the currently accepted scripts other than Devanagari, so if we're just trying to maximize our usefulness to end users, I don't see much reason to exclude this feature.
In response to mislead users that it's [just as] OK to write Sanskrit in Roman: We have pinyin romanizations entries but everyone still knows that you have to learn Hanzi if you want to be able to read and write Mandarin properly. If we really wanted, it would even be possible to add a ===Usage notes=== to every romanization saying "This form is not considered as valid as native scripts and is provided for convenience", etc., but I'd personally consider that to be bloat.
In response to IAST does not have a one-to-one correspondence with Devanagari: that is admittedly not ideal, but it's not a huge inconvenience because we can provide multiple lines (the way pinyin, much more ambiguous than IAST, is handled). It's also by far the most likely romanization to be encountered, so if we're going to go for one, we should choose IAST, despite its imperfections. 70.172.194.25 05:33, 15 February 2022 (UTC)[reply]
@Kutchkutch: Your argument would disallow Japanese entries in romaji, Mandarin entries in pinyin, Cantonese entries in Jyutping, and romanized Pali entries.Jonashtand (talk) 10:49, 16 February 2022 (UTC)[reply]
  • Support. Learned discussion is the place where most native English speakers will encounter Sanskrit. IAST does not have a one-to-one correspondence to Devanagari, but is just as faithful as SLP1. I would also remind you guys that it is the principle script for Sanskrit words in Monnier Williams' dictionary - derivative words are not given in Devanagari. As to Pali, Roman (IAST) is the dominant script for Pali on the Internet, with Thai in second (alphabetic) and third (abugidic) places. If the vernacular language of a book or pamphlet is English, it's Pali will be in the Roman script. Our local temple prints Pali chanting books in the Roman script. Our current rule is that Sanskrit is only allowed in Indic scripts, which strikes me as particularly racist. It is, however, consistent with a policy of making Wiktionary hard to use.

There is only one slight disadvantage to IAST - it seems not to have a published definition! One was published, but practical IAST has changed since then! --RichardW57m (talk) 17:03, 20 April 2022 (UTC)[reply]

  •   Support. I understand the concern about lemmatization -- organizing our entries sanely and effectively is an ongoing challenge.
By way of a possible reference point for how this might be achieved, I suggest that our Sanskrit editors have a look at our Japanese entries. We include romanized Japanese entries as a kind of soft-redirect to the lemma forms. Due to the multi-script nature of Japanese (kana and kanji), sometimes this redirection takes two jumps, but for Sanskrit, only one should suffice (from the romanization to the Devanagari). Samples include Japanese sakura (sakura), koshō (koshō), taberu (taberu). Note the use of {{ja-romanization of}} to point users to the Japanese-script entries. The romanized entry might include multiple instances of {{ja-romanization of}}, in those cases where there are multiple possible Japanese entries corresponding to the given romanization. A similar approach might work for those cases where the IAST spellings correspond to multiple possible Devanagari spellings. ‑‑ Eiríkr Útlendi │Tala við mig 23:19, 31 May 2022 (UTC)[reply]
  Oppose Unchanged from my previous stance. There is still evidence that romanised entries are treated as regular words by editors. As for User:Eirikr suggestion, Japanese entries can and are better disambiguated at the kana entries, rather than rōmaji, although now, lemmas are shifting back and forth between kanji and kana, so they are no longer centralised around kana as before. No such issues (as with Japanese) with Sanskrit. --Anatoli T. (обсудить/вклад) 01:40, 1 June 2022 (UTC)[reply]

As that vote is soon to conclude positively, I hereby invite all the voters (@Numberguy6, Svartava, Hazarasp, This, that and the other, Imetsia, AG202, Vininn126) (and everybody else who cares about the phrasebook) to discuss the points brought up by @Lambiam and AG202 on the vote's talk page. In particular, where does everybody stand on restructuring the politeness distinction to "polite / normal" and "familiar" instead of "familiar / normal" and "polite"? Should we change "normal" to something else as AG202 has suggested? I'm very open to suggestions!

Procedurally, I'm going to implement the vote's exact contents after it has concluded but we may follow that up (before making changes to the actual phrasebook entries) by the amendments as discussed here. Note that {{Template:policy-VOTE}} states that a discussion and consensus is necessary, for which a BP discussion suffices as the amendments discussed don't constitute a substantial change. — Fytcha T | L | C 09:24, 10 February 2022 (UTC)[reply]

I think keeping the distinction as "familiar/polite is best. Clearest, most widely understood terminology, "normaly" definitely feels out of place. Vininn126 (talk) 09:40, 10 February 2022 (UTC)[reply]
The question is though, where do translations in languages that have no politeness distinction go? This is not obvious to me, which is why I chose to add "normal" in the first place. — Fytcha T | L | C 09:56, 10 February 2022 (UTC)[reply]
If there are separate tables because some languages make the distinction, languages that do not make the distinction should be listed in both tables. This is not different from how we treat the translations of terms that are polysemic in English, such as bread. French uses the same term pain for the uncountable sense (il y a du pain) and the countable sense (il y a deux autres pains), so the same French term is present in both the “baked dough made from cereals” table and the “countable: any variety of bread” table.  --Lambiam 12:59, 10 February 2022 (UTC)[reply]
This is almost exactly what I wanted to say. You can consider it both, since there's no distinction. Vininn126 (talk) 13:42, 10 February 2022 (UTC)[reply]
@Vininn126, Lambiam: I'm also fine with that, but in that case, I think it would be best to get a bot to synchronize the two boxes for a specified set of languages. Is everybody else @Numberguy6, Svartava, Hazarasp, This, that and the other, Imetsia, AG202 also on board with this solution? — Fytcha T | L | C 10:37, 13 February 2022 (UTC)[reply]
Yeah, it's kind of messy from a technical standpoint, but it makes a lot of sense from the reader's point of view. I'm on board. This, that and the other (talk) 11:09, 13 February 2022 (UTC)[reply]
This seems logical, I'm also on board. —Svārtava [tur] 11:17, 13 February 2022 (UTC)[reply]
@Numberguy6, Svartava, Hazarasp, This, that and the other, Imetsia, AG202, Vininn126, Lambiam: Please review the revamped article are you single. — Fytcha T | L | C 09:44, 16 February 2022 (UTC)[reply]
I think that looks good. Splitting it in two makes it easier to navigate here. Vininn126 (talk) 10:09, 16 February 2022 (UTC)[reply]
I like it. It's a shame that {{t-check}} is getting a new lease on life, but that's the way it has to be.
Why is Finnish "oletteko sinkku?" under "polite" when it is marked "colloquial"? Is this one of those subtle register distinctions in Finnish that has been much spoken about lately? This, that and the other (talk) 10:49, 16 February 2022 (UTC)[reply]
"oletteko sinkku" would be a polite way of asking despite using a colloquial term, so it's completely possible albeit not very likely. A version using a colloquial form of the verb would be "ootteko sinkku". Neither get a significant amount of ghits. — SURJECTION / T / C / L / 17:50, 23 February 2022 (UTC)[reply]
@Numberguy6, Svartava, Hazarasp, This, that and the other, Imetsia, AG202, Vininn126, Lambiam: I've changed the text: diff. Feel free to voice any objections. — Fytcha T | L | C 10:45, 3 March 2022 (UTC)[reply]
In case anybody wants to help me implement the changes, here's a good boilerplate:
====Translations====
{{trans-top|familiar}}
{{trans-mid}}
{{trans-bottom}}

{{trans-top|polite}}
{{trans-mid}}
{{trans-bottom}}

{{checktrans-top}}<!-- If the term can be used in both familiar as well as polite settings, please copy it to both boxes; otherwise, please move it to the appropriate box and supply the familiar/polite counterpart to the other box. -->
{{trans-mid}}
{{trans-bottom}}

Fytcha T | L | C 10:59, 3 March 2022 (UTC)[reply]

Leadership Development Task Force: Your feedback is appreciated!

edit

(Read this message in other languages on Meta: العربية • Deutsch • español • français • Русский • 中文 • हिन्दी • বাংলা • Bahasa Indonesia •日本語 • 한국어 • Yorùbá • Polski • Português bosanski • hrvatski • српски / srpski)

Hello,

The Community Development team at the Wikimedia Foundation is supporting the creation of a global, community-driven Leadership Development Task Force. The purpose of the task force is to advise leadership development work.

The team is looking for feedback about the responsibilities of the Leadership Development Task Force. This Meta page shares the proposal for a Leadership Development Task Force and how you can help. Feedback on the proposal will be collected from 7 to 25 February 2022.

Thank you, --Mervat (WMF) (talk) 12:17, 8 February 2022 (UTC)[reply]

Quoting a quoted text

edit

I have been attempting to add a quotation of a French-language book in an English-language book to the second-listed sense in our entry for the French word parachute. The French book is Guillaume Dustan (1996) Dans ma chambre (in French), Paris: POL, page 71. The English-language book is David Caron (2009) My Father and I: The Marais and the Queerness of Community, Ithaca, New York: Cornell University Press, →ISBN, page 106. The guidance provided in the documentation for {{quote-book}} is as follows:

 

If the quoted text is from book A which states that the text is from another book B, do the following:

  • Use title, edition, and others to provide information about book B. (As an example, others can be used like this: "others=1893, page 72".)
  • Use quoted_in (for the title of book A), location, publisher, year, page, oclc, and other standard parameters to provide information about book A.
     

Following that guidance, this is what I come up with:

* {{quote-book |fr |year=2009 |author=Guillaume Dustan |authorlink=Guillaume Dustan |title=Dans ma chambre |trans-title=In My Room |other=1996, page 71 |quoted_in=David Caron, ''My Father and I: The Marais and the Queerness of Community'' (in English) |location=Ithaca, New York |publisher=Cornell University Press |page=106 |isbn=978-0-8014-4773-0 |passage=En dessous il y a les godes et les plugs, rangés par taille sur deux étagères: deux gros plugs, quatre petits, quatre godes doubles, huit godes simples. En dessous il y a le petit matériel, accroché à des clous: cinq paires de pinces à seins différentes, des pinces à linge, un '''parachute''' pour les couilles, tin collier de chien, deux cagoules, une en cuir, une en latex, six cockrings, en acier, en cuir, simples ou avec serre-couilles incorporé, deux étuis à bite {{…}}}}
  • 2009, Guillaume Dustan, Dans ma chambre, quoted in David Caron, My Father and I: The Marais and the Queerness of Community (in English), Ithaca, New York: Cornell University Press, →ISBN, page 106, 1996, page 71:
    En dessous il y a les godes et les plugs, rangés par taille sur deux étagères: deux gros plugs, quatre petits, quatre godes doubles, huit godes simples. En dessous il y a le petit matériel, accroché à des clous: cinq paires de pinces à seins différentes, des pinces à linge, un parachute pour les couilles, tin collier de chien, deux cagoules, une en cuir, une en latex, six cockrings, en acier, en cuir, simples ou avec serre-couilles incorporé, deux étuis à bite []
    (please add an English translation of this quotation)

It's possible that I misinterpreted the documentation, but assuming I followed the instructions correctly, I think the documentation must be in error. It's not really clear which page number or year belongs to which book and I think the original work's year should be the at the beginning. It's also missing most of the publication information for the French book and it doesn't allow for a URL for the English book.

Now this is a bit hack-y, and it's still missing the French book's publication information and the English book's URL, but I came up with this as an alternative that still uses the quoted_in parameter:

* {{quote-book |fr |year=1996 |author=Guillaume Dustan |authorlink=Guillaume Dustan |title=Dans ma chambre {{noitalic|[''In My Room''], page 71}} |other=2009, page 106 |quoted_in=David Caron, ''My Father and I: The Marais and the Queerness of Community'' (in English) |location=Ithaca, New York |publisher=Cornell University Press |isbn=978-0-8014-4773-0 |passage=En dessous il y a les godes et les plugs, rangés par taille sur deux étagères: deux gros plugs, quatre petits, quatre godes doubles, huit godes simples. En dessous il y a le petit matériel, accroché à des clous: cinq paires de pinces à seins différentes, des pinces à linge, un '''parachute''' pour les couilles, tin collier de chien, deux cagoules, une en cuir, une en latex, six cockrings, en acier, en cuir, simples ou avec serre-couilles incorporé, deux étuis à bite {{…}}}}
  • 1996, Guillaume Dustan, Dans ma chambre [In My Room], page 71, quoted in David Caron, My Father and I: The Marais and the Queerness of Community (in English), Ithaca, New York: Cornell University Press, →ISBN, 2009, page 106:
    En dessous il y a les godes et les plugs, rangés par taille sur deux étagères: deux gros plugs, quatre petits, quatre godes doubles, huit godes simples. En dessous il y a le petit matériel, accroché à des clous: cinq paires de pinces à seins différentes, des pinces à linge, un parachute pour les couilles, tin collier de chien, deux cagoules, une en cuir, une en latex, six cockrings, en acier, en cuir, simples ou avec serre-couilles incorporé, deux étuis à bite []
    (please add an English translation of this quotation)

But I wonder if the quoted_in parameter would be best avoided altogether. The documentation mentions that while the use of 2ndauthor, title2, location2, etc. is typically used for a new version of the book, that assumption can be "override[n] [with the newversion parameter] by indicating 'quoted in' or 'reprinted as'." There's just one brief mention of this in the documentation that took me forever to find as it's listed under "Quoting a new version of a book". (I can't begin to tell you how much time I've sunk into trying to format this one single quotation, and while I'm fairly new to Wiktionary, I'm normally quite adept with wiki templates.) Using that parameter, this is what the same citation looks like:

* {{quote-book |fr |year=1996 |author=Guillaume Dustan |authorlink=Guillaume Dustan |title=Dans ma chambre |trans-title=In My Room |location=Paris |publisher=POL |page=71 |newversion=quoted in |2ndauthor=David Caron |year2=2009 |title2=My Father and I: The Marais and the Queerness of Community |worklang2=en |location2=Ithaca, New York |publisher2=Cornell University Press |isbn2=978-0-8014-4773-0 |page2=106 |passage=En dessous il y a les godes et les plugs, rangés par taille sur deux étagères: deux gros plugs, quatre petits, quatre godes doubles, huit godes simples. En dessous il y a le petit matériel, accroché à des clous: cinq paires de pinces à seins différentes, des pinces à linge, un '''parachute''' pour les couilles, tin collier de chien, deux cagoules, une en cuir, une en latex, six cockrings, en acier, en cuir, simples ou avec serre-couilles incorporé, deux étuis à bite {{…}}}}
  • 1996, Guillaume Dustan, Dans ma chambre, Paris: POL, page 71; quoted in David Caron, My Father and I: The Marais and the Queerness of Community, Ithaca, New York: Cornell University Press, 2009, →ISBN, page 106:
    En dessous il y a les godes et les plugs, rangés par taille sur deux étagères: deux gros plugs, quatre petits, quatre godes doubles, huit godes simples. En dessous il y a le petit matériel, accroché à des clous: cinq paires de pinces à seins différentes, des pinces à linge, un parachute pour les couilles, tin collier de chien, deux cagoules, une en cuir, une en latex, six cockrings, en acier, en cuir, simples ou avec serre-couilles incorporé, deux étuis à bite []
    (please add an English translation of this quotation)

Given the limitations of the quoted_in parameter and the mental gymnastics required to make sense of which parameter is associated with which work, I would propose deprecating it in favour of newversion (which could possibly be renamed or restructured for clarity's sake). Alternatively, if there is no desire to deprecate the quoted_in parameter, the documentation should probably be changed to be more straightforward and to allow for a somewhat clearer format as I illustrated above.

Additionally, url2 and accessdate2 don't seem to work. Those should probably be added so as to allow for a link to the work being directly referenced ([2], in the case of my citation).

Any thoughts? Graham11 (talk) 03:50, 12 February 2022 (UTC)[reply]

@Benwing2, Sgconlaw. I have to say I've found this aspect of the templates rather unsatisfactory too. This, that and the other (talk) 05:53, 12 February 2022 (UTC)[reply]
Ditto here too. I've been really confused trying to use the quoted_in parameter as well. AG202 (talk) 05:58, 12 February 2022 (UTC)[reply]
I think |quoted_in= is an artefact left over from an older version of the template. Personally, I never use it. Instead, I use |newversion=, |2ndauthor=, |title2=, and so on. Perhaps we can deprecate |quoted_in= and have a bot replace all the uses. — SGconlaw (talk) 06:17, 12 February 2022 (UTC)[reply]
If we recommend using |newversion=, ... when copying a quote from a book quoting another book, what should be the replacement text for the guidance in the documentation for {{quote-book}}?  --Lambiam 11:08, 12 February 2022 (UTC)[reply]
@Sgconlaw Is there a programmatic way of replacing |quoted_in= with |newversion=, |2ndauthor=, |title2=, etc.? If so can you specify it? If there is such a way, I can do a bot run to eliminate |quoted_in=. Also can you update the docs in {{quote-book}} to remove the ref to |quoted_in=? Benwing2 (talk) 19:18, 12 February 2022 (UTC)[reply]
@Benwing2: hmmm, it seems like |quoted_in= allows editors to enter a freeform citation, which will probably make it difficult to use a bot to split up the various parts of the citation into |2ndauthor=, |title2=, and so on. Perhaps we can start with using your bot to compile a list of all the places where |quoted_in= has been used? Hopefully the list is short, in which case the changes can be made manually. — SGconlaw (talk) 19:51, 12 February 2022 (UTC)[reply]
@Sgconlaw No need for a bot here; I added a tracking category in the code, here: Category:Quotations using quoted-in parameter (it has a hyphen in it because you can't really enter a category name with an underscore param; it's converted to a space). The category will fill up gradually over the next few days. Benwing2 (talk) 20:27, 12 February 2022 (UTC)[reply]

Silesian Proper Vs Silesian Polish

edit
(Notifying BigDom, Hythonia, KamiruPL, Tashi, Luxtaythe2nd, Max19582, Hergilei, Shumkichi): as the main Polish editors, and also opening this conversation here. Following up on Talk:fanga, we should talk about how to handle words that are shared between Silesian Polish and Silesian Proper, namely, words that have been added to "Polish" with the Silesian tag (i.e. Polish with Silesianisms) but only exist in Silesian proper. Many were added, it seems, before Silesian was solidified as an L2 (and I don't care what certain Polish linguists have to say about that, we treat Silesian as a language, not a dialect). Furthermore, we need to discuss spelling - Silesian has no standardized alphabet (yet, however they are making a Silesian dictionary which will be completed in 10~/+ years time, so we'll have to wait...), and it seems that Steuer is the "main" orthography in use now. Should we standardize that as the main spelling, while using other orthographes as "Alternative forms of"? If a word has been added to Polish but the spelling is different, should we just move the page to the new spelling, changing the L2 in the process? This is quite a sticky mess, and I am particularly interested in Kamiru's and Shumkichi's input (when they eventually get unbanned), as they are the most knowledgeable and engaged in Silesian as a language. Vininn126 (talk) 09:58, 12 February 2022 (UTC)[reply]
Regarding the Silesian ortographhy: I thought Steuer has been deprecated and supplanted with Ślabikŏrz, which seems to be the prefered alphabet among contemporary Silesian language activists, such as Grzegorz Kulik, whose Silesian book translations use Ślabikŏrz. KamiruPL (talk) 11:27, 12 February 2022 (UTC)[reply]
Oh, alright. If that is the case, then let's use Ślabikŏrz ewentualnie adding Steuer forms as deprecated spellings. Vininn126 (talk) 11:30, 12 February 2022 (UTC)[reply]

Law French

edit

Is there a process to go about adding an etymology-only language? I want to add Law French, which is a descendent of Old Norman and Anglo-Norman (with influence from metropolitan French). The reason for its name is that it was used for many years in the English and Irish legal systems, and quite a few legal terms have been borrowed into English. Theknightwho (talk) 17:49, 12 February 2022 (UTC)[reply]

@Theknightwho It's just a case of getting consensus that it should be added, and then adding it. Is Law French a complete language, though? The only snippet of Law French I know is the following famous quote, which mixes in a lot of English (and some Latin):
Richardson Chief Justice de Common Banc al assises al Salisbury in Summer 1631 fuit assault per prisoner la condemne pur felony, que puis son condemnation ject un brickbat a le dit justice, que narrowly mist, et pur ceo immediately fuit indictment drawn per Noy envers le prisoner, et son dexter manus ampute et fix al gibbet, sur que luy mesme immediatement hange in presence de court.
Benwing2 (talk) 19:23, 12 February 2022 (UTC)[reply]
@Benwing2, Theknightwho: I'm not a linguist, but it doesn't seem to me that Law French is a complete language in itself. (Happy to be corrected.) If it isn't, maybe it would be better added as a topical label at "Module:labels/data/topical"? — SGconlaw (talk) 19:55, 12 February 2022 (UTC)[reply]
I would say that it is a complete language: English law reports were (I think exclusively) written in Law French from the 14th to the 18th centuries, despite pleadings taking place in English. You're also absolutely right that there was a huge amount of Latin and English mixed in, but you have to be careful with interpreting English words in Law French, because many of them are phono-semantic matchings of the Anglo-Norman, even where the ordinary English word has significantly diverged in meaning. A good example of this is something like market overt, which is a matching of the Anglo-Norman term marché ouvert (open market). It has nothing to do with "overtness" in any conventional sense.
I have also found this entertaining excerpt: "Law French was so separate that a French person living in England during the time of Elizabeth I referred to it as a house that was so ruined you could barely tell there had ever been a house there at all." Theknightwho (talk) 20:09, 12 February 2022 (UTC)[reply]
@Theknightwho Can you give some examples of Law French? Benwing2 (talk) 20:16, 12 February 2022 (UTC)[reply]
Let me have a look for some old case reports. Theknightwho (talk) 20:24, 12 February 2022 (UTC)[reply]

Right - here's an example. The first column is the original text of part of the preamble to a 1429 statute of Henry VI, outlining the present law on dealing with those who forcibly enter onto land. This comes from vol. 2 of The Statutes of the Realm, so we know that it is (almost pedantically) accurate. The second column is that same passage as purportedly recorded in this 1608 collection of the statutes, while the third is the English translation given in Statutes of the Realm:

Leaving the abbreviations aside, you can see that the 15th c. Anglo-Norman already displays quite a bit of Anglicisation (e.g. convict, gaol), and the early 17th c. edition extends that further. Examples are:

Hopefully that's enough to demonstrate the point? Theknightwho (talk) 23:14, 12 February 2022 (UTC)[reply]

That looks to me like an Old French analog to Mediaeval Latin: a language that's basically extinct as a first language, but kept alive as a second language for continuity with old writings and for certain specific purposes. Chuck Entz (talk) 23:56, 12 February 2022 (UTC)[reply]
It’s probably an analogue to New Latin, with some kind of Anglo-Norman being the Medieval Latin analogue, but yes that is correct! Theknightwho (talk) 00:03, 13 February 2022 (UTC)[reply]
If the request is for it to be an etymology-only code, what terms were borrowed from Law French that weren't either already English or derived from Anglo-Norman or Old French (or in some cases Latin)? You note cases where terms were borrowed into Law French from English and are not good French, but in those cases, it seems backwards to say that e.g. market overt) is a "learned borrowing" into English from Law French, if the term as found in Law French was taken from English; in English, it seems to be a calque of the Anglo-Norman / French phrase. - -sche (discuss) 15:43, 18 February 2022 (UTC)[reply]
Two points:
  1. I would disagree with this analysis. The term was not calqued from Anglo-Norman into English: it was phono-semantically matched from Anglo-Norman into Law French by Anglicising it. “Market overt” is not a term that was borrowed from English, and (importantly) would be a totally inaccurate translation anyway as it would have been completely meaningless; the genuine English translation is “open market”. Rather, Law French terms came about by taking English words that are phonetically similar cognates, but (as in the case of “overt”) have often significantly diverged in meaning, while also keeping the Anglo-Norman word order. This was not a learned borrowing either: it was a conventional one, occurring gradually over a period of time by exposure to English. However, the joining of the two Anglicised words only occurred in Law French, before it was then borrowed as a defined term into English when Law French was banned for court records in the 18th c. I would also disagree that it makes no sense for words to be borrowed and then borrowed back: such terms are called twice-borrowed (see Category:English twice-borrowed terms).
  2. Anglo-Norman terms such as profit à prendre were never used in English until Law French was banned. They are an example of Anglo-Norman terms that were simply wholesale moved into English at that point, but only via Law French first. Theknightwho (talk) 18:37, 18 February 2022 (UTC)[reply]

Just to bump this - am I okay to go ahead with this? I would like to propose the code fro-law. Theknightwho (talk) 09:52, 15 March 2022 (UTC)[reply]

Weird senses in Turkish entries by User:Sae1962

edit

@Lambiam, İtidal, BurakD53 (consider adding yourselves to Module:workgroup_ping/data)

The user in question has added tons of secondary senses titled "third-person singular present simple indicative positive degree of" to Turkish aorists, see e.g. öldürür and öldürmez. Verbs don't have degrees, though Turkish aorists can (sometimes?) be used adjectivally, so I'm not sure what their intention really was. It seems like they wanted to say that öldürür can be derived from both öldürmek as well as öldürmemek? If that's the case, I strongly disagree; not only would the semantics of both senses under this nonsensical analysis be perfectly identical, it is also not how any Turkish reference work (that I'm familiar with) analyzes these forms. The fact of the matter is that this simply adds nothing of value to the entry. I also don't think we should add these forms as antonyms of each other; this would be equivalent to adding gitmiyor and gidiyor as antonyms of each other, which is again nothing but useless, boring bloat and another place to make errors (see aşmaz). — Fytcha T | L | C 12:12, 13 February 2022 (UTC)[reply]

To be is not the “positive degree” of not to be. Whenever I encounter such nonsense I remove it (der, biter, duy, kıyma).  --Lambiam 12:26, 13 February 2022 (UTC)[reply]
We consider öldürmez as the negative of öldürmek in Turkish lessons in Turkey. Because we don't have -z as a aorist suffix, we claim -mez/-maz as the negation suffix of the aorist. But actually -r simple present suffix and -z are the same suffixes. Turkish is a Shaz Turkic and that's why -r became -z in negative form. You can see Chuvash вӗлер-ӗр (vӗler-ӗr) and вӗлер-ме-р (vӗler-me-r) are the cognates of Turkish öldürür and öldürmez. So, in reality, öldürmemek takes the -z(<*-r) suffix and öldürmez is the aorist of öldürmemek. BurakD53 (talk) 14:28, 13 February 2022 (UTC)[reply]
I'd say that öldürmemek is the negative of öldürmek, whereas öldürmez is the negative of öldürür. All are listed in the conjugation table of öldürmek.  --Lambiam 22:15, 13 February 2022 (UTC)[reply]
I think, in terms of organization, it is probably easiest to claim that -mez is a derived form of the -mek lemma rather than the -memek form. All grammar works that I'm aware of define -mez to be indivisible. This saves a click when looking up negative aorists and we currently treat negative infinitives as non-lemmas anyway. — Fytcha T | L | C 22:39, 13 February 2022 (UTC)[reply]
I have been tempted to remove a number of "second person singular negative imperative" forms of verbs ending in -memek, but left them alone Vox Sciurorum (talk) 14:41, 13 February 2022 (UTC)[reply]
These are IMO rather harmless by themselves, but should either be given as a plain imperative of a -memek verb, or (I think preferably) as a negative imperative of a (positive) -mek verb. When they are homographs of a noun (often originally a verbal noun) (e.g. birleşme, evlenme, gözleme), I think that is by itself an argument for listing the verb form.  --Lambiam 22:08, 13 February 2022 (UTC)[reply]
Oh, I think you mean cases like geliştir: “second-person singular negative imperative of geliştirmemek”. Yes, please kill these when sighted, which I just did for this one.  --Lambiam 22:27, 13 February 2022 (UTC)[reply]
@Lambiam, Vox Sciurorum: Please also remove nyms in non-lemma entries (diff) in addition to the duplicate senses/derivations (diff). This is all just a huge load of hogwash. — Fytcha T | L | C 22:33, 13 February 2022 (UTC)[reply]
This user sometimes had ...odd... ideas of how things should be defined, also in English and German. - -sche (discuss) 21:50, 13 February 2022 (UTC)[reply]
There were a whole bunch of templates they created for such forms that ended up being deleted some years back. Chuck Entz (talk) 22:14, 13 February 2022 (UTC)[reply]

Forming a standardized process for discussions about online-only sources and attestation

edit

Recently, Wiktionary:Votes/pl-2022-01/Handling of citations that do not meet our current definition of permanently archived passed and amended Wiktionary:Criteria for inclusion, most notably to include the sentence "Other online-only sources [i.e., those other than Usenet groups] may also contribute towards attestation requirements if editors come to a consensus through a discussion lasting at least two weeks." Although this addition changes our rules regarding online-only sources and attestation, a number of details are left open. I believe it would benefit us all to further amend Wiktionary:Criteria for inclusion and specify more details about how this new type of discussion should operate. From what I can tell, the following are some of the details that would benefit from being specified about this new type of discussion:

  1. Where should online-only source acceptability discussions take place? Anywhere? Requests for verification? The Beer Parlour? A new page akin to English Wikipedia's Reliable Sources Noticeboard?
  2. How should the results of online-only source acceptability discussions be recorded or archived? Not at all? On the talk pages for entries? By month, like for the Information Desk? In a centralized location, like exists with English Wikipedia's Wikipedia:Reliable sources/Perennial sources or our Wiktionary:Idioms that survived RFD? At Wiktionary:Criteria for inclusion itself?
  3. How broad can these discussions be, what can constitute an "online-only source" under discussion? Only a single webpage at given URL? All pages originally published at a particular domain? All pages originally published by a particular organization? All pages hosted at particular domain? All pages hosted by a particular class of website sites? The entire web?
  4. Is there a particular set of criteria that editors should generally consider or apply when evaluating an online-only source's acceptability? None? Public accessibility? Likely permanence of the source? Degree of editing and editorial oversight?
  5. Should all quotes from online-only sources be counted towards our attestation requirement in a way on par with other media, e.g. print books, or should some online-only sources be allowed to count towards our attestation requirement under different different rules, e.g. some number of quotes from Twitter counts equal to a single quote from a print book.
  6. Should a discussion be able to override a previous one and change the acceptability or terms of acceptability of an online-only source for counting towards our attestation requirement? This might be relevant, for example, if all copies of an website are deleted, including those accessible through the Wayback Machine. In such a case, a question might arise: should a discussion be allowed to invalidate past quotations from that website from supporting a term's attestation? Or should it be the case that once a online-only source is considered acceptable, quotes from it will always count towards our attestation criteria? (See this lawsuit settlement announcement for details about why this hypothetical scenario is reasonable.)
  7. Should any online-only source acceptability discussions take place before these details are specified?

I ask these questions in hopes of starting a discussion that will set a clear path forward for our community and dictionary. I hope that the conversation and ideas can be used as the basis for a future vote that further develops our criteria for inclusion. —The Editor's Apprentice (talk) 20:21, 13 February 2022 (UTC)[reply]

  • On the location, Beer Parlour or a new place. RFV is too obscure.
  • On some of the other questions, I have repeately pointed out the need to separate permanence and editorial quality. Once upon a time a durably published book probably had decent editing. But the self-published crap that makes its way to Google books should not receive the same respect, nor should forum posts. For a highly precise technical term maybe the three independent uses are all there are, and that's fine. Street talk should be common to be included. (That is less demanding than "clearly in widespread use.")
  • On permanence, we should only be adding sources that are stable. If the site has a long history of stable URLs and is backed up elsewhere (including in paid services like Lexis-Nexis) I am not worried about durability. MySpace, Facebook, Twitter, and so on... no. I added a Donald Trump tweet or two with a comment that it was considered durable because his tweets as President were being deliberately archived off site. And then his account went poof, but the tweets should still be verifiable. If a quotation is no longer available then a term that previously passed RFV could fail when re-challenged.
  • On citation discounting, or in addition to it, if we allow user-generated content we should not let any single site with user-generated content count for more than one of the minimum three citations. We should not include Reddit slang, Twitter-only words, Farkisms, and so on unless they escape into the wild.
  • Outside of the scope of the recent vote, I would accept non-durable citations of online chatter combined with durable mentions describing the word as popular. Not merely saying it was used three times, but saying it was common enough to make major news sources. Vox Sciurorum (talk) 21:22, 13 February 2022 (UTC)[reply]
    @Vox Sciurorum "We should not include Reddit slang, Twitter-only words, Farkisms, and so on unless they escape into the wild." for Reddit slang & Twitter terms... why not? We allow fandom-created terms, but more importantly, in our current CFI, we allow any and all Usenet terms to stay provided that they have solely three cites, see the continuing discussion of D*rky C*ntinent for that one. I'd be fine with heavily upping the citation requirement and archiving tweets, but barring words just because they're used on Reddit or Twitter doesn't seem to fit the goals of modernizing Wiktionary to include current slang. I still think that Usenet should've been deprecated and I'm on the fence of creating a discussion to up the required Usenet cites to pass RFV and create an entry. AG202 (talk) 22:44, 13 February 2022 (UTC)[reply]
    We should revoke Usenet's privileged status as a "durable" source because the view of the community is if we allow once bad source we have to allow them all. All existing quotations from Usenet should become not countable towards CFI. All the precious tech terms, all the rare misspellings, and all the stupid insults gone, like tears in rain. Vox Sciurorum (talk) 23:56, 13 February 2022 (UTC)[reply]
    My apologies, I completely forgot that you were the one who authored the Deprecating Usenet post on Beer Parlour, so apologies for being rather direct on that part. Nonetheless, I still think that we could find a common ground for online forums in general. AG202 (talk) 00:24, 14 February 2022 (UTC)[reply]
  • I think a factor has been overlooked that needs to be addressed: independence. A lot of the online sites have tiny mutual-reinforcement bubbles scattered throughout. They have their own terminology that's meaningless to anyone outside those bubbles, and no one who isn't perceived by the site algorithms as potentially interested will even be aware that most of those bubbles exist.
Pre-internet, a term used by a small group of school buddies or members of some local club wouldn't be publicly accessible. Now, the equivalent can be easily found in online searches. How do we distinguish something used only by the same dozen people to discuss things that are specific to that group from something that pops up here and there because everybody knows it, even if they don't use it very often? Chuck Entz (talk) 23:41, 13 February 2022 (UTC)[reply]
  1. Perhaps on the citations page, allowing users to comment on certain citations and challenge them.
  2. I think archiving them on the discussions page makes sense.
  3. In response to Vox - how would we check whether a given link has been deleted or not?
  4. In response to Vox again, I agree, and perhaps one criteria should be diversity of sources - however many web-sources we end up having to provide as per Increasing the number of citations required for Usenet and updating CFI should represent a wide variety of sources, not all from the same website.
Vininn126 (talk) 08:49, 14 February 2022 (UTC)[reply]
For 1: Trying to find precedent on citations pages would be a nightmare. I think the noticeboard idea would a) allow the most transparency, b) encourage conversation and c) provide an easy-to-read archive. brittletheories (talk) 12:42, 15 February 2022 (UTC)[reply]
  • Re 1: How about having an official (2-week) vote that appears on WT:Votes/Active for whitelisting new online-only sources? Might at the start create a bit of a vote spam but who cares? Better than have those discussions fly under one's radar; I for one don't frequently monitor the BP.
Re 2: I think having a central location (that is being linked to by the newly-added sentence in WT:CFI) that allows quick and easy CTRL-F'ing is by far the best solution (I really like that Wikipedia page). It should also of course be split by language where applicable. — Fytcha T | L | C 12:27, 14 February 2022 (UTC)[reply]
1. Ideally, a new vote board not unlike en.wp's RSN, as mentioned. Sources would be voted on in much the same way. BP is also an alternative; RFV should stick mostly to specific words rather than sources in general.
2. They should absolutely be archived if by source. Ideally there should be a table summarizing allowed sources and also a separate one for sources that the community opposed accepting.
3. This will depend on the discussion. An idea for example would be to allow tweets only from verified accounts (those with a checkmark) to be citations, but not other tweets; this doesn't fall into any of the mentioned categories.
4. There should not be a particular set of criteria, but all of those should be expected to be considered by editors.
5. Leaning towards the latter. The most important thing is to have the word being used in multiple independent online sources, and perhaps the requirements for the number of cites should be higher than for print media.
6. Yes, discussions should always be able to overrule previous discussions, even if this means some words may stop being attestable.
7. Ideally not.
SURJECTION / T / C / L / 12:34, 15 February 2022 (UTC)[reply]
I'd really avoid only allowing verified accounts to be citations. It's first actually decently easy to get verified on Twitter, and there's the whole trope about blue check marks being removed from reality, so I wouldn't use them as the standard. Additionally, a lot of coinages on Twitter don't actually come from verified accounts, but rather from everyday accounts, see: yassification for an example, so excluding them would not be in our best interest for covering everyday slang. AG202 (talk) 15:44, 15 February 2022 (UTC)[reply]
I strongly believe allowing any tweets as sources will lead to ruin, but I'd rather have that argument when it actually matters. — SURJECTION / T / C / L / 10:26, 16 February 2022 (UTC)[reply]
@Surjection Has allowing Usenet for however long it's been allowed on Wiktionary led to ruin? If your answer is no, then you really need to assess what your rationale is behind being against any tweets being used for citations. AG202 (talk) 15:40, 16 February 2022 (UTC)[reply]
Nobody uses Usenet anymore and its heyday was long before Wiktionary was even a thing. Twitter users on the other hand are exactly the kind of people who'd be willing to play the long game just to get an entry here for a word that nobody uses and even those people that do never use seriously. The same applies to most social media. Creating Twitter or other social media accounts is also a lot easier, which makes it even harder to tell which uses are independent. The two are simply not comparable. — SURJECTION / T / C / L / 15:48, 16 February 2022 (UTC)[reply]
@Surjection that's not true actually. The below discussion has examples where Usenet is still used and has been used in the past decade in high numbers, and it's still being quoted in number of Wiktionary entries with dates as soon as 2021 (hell, I myself quoted a Usenet citation a few hours ago with a 2021 date for theyfriend). Also as mentioned in the below discussion, the barrier of entry to Usenet is absolutely not higher than making a Twitter account and may actually even be easier. @Equinox, @Fytcha, @WordyAndNerdy I'm sure could give more info on that point. And actually I'd say that it is easier to tell which uses are independent on Twitter, being that you can see who's following whom, who interacts with whom on their timelines, and more, to see if people are just using a word within their own group, whereas with Usenet that information is not available. And why do you think that most social media users would do that to begin with? The vast majority of Twitter users don't even know what Wiktionary is to be quite frank. There's the same level of risk on Twitter that there is on Usenet, and folks have even been mentioning how the same could be done through research articles as well, so I'd suggest once again that you actually assess what it is you have against Twitter users. AG202 (talk) 15:54, 16 February 2022 (UTC)[reply]
It's not specifically against Twitter, but also applies to Reddit, Tumblr, and whatnot. The barrier to entry part is a definite lie; most people who have ever used an online service know what Twitter is, most don't know what Usenet is and that barrier is already too much for most of them to handle. The fact that many social media sites have a culture of "shitposting" should be more than enough evidence for the fact that people will definitely push a word just to get it on here. The independence part likewise seems to stem of ignorance; anyone who has ever used Usenet knows that people are identified with their email addresses, which especially in earlier messages are practically always either ISP messages with real names or in some cases anonymous remailers. Most of Twitter accounts are squarely in the latter group. But, since you insist on dragging this topic over to it, I will point out that I'd be completely fine with banning Usenet quotes past a certain date point. The most important thing is to not turn Wiktionary into another copy of Urban Dictionary by letting anyone enter basically any word they want; one of those burning trashpiles is enough. — SURJECTION / T / C / L / 16:02, 16 February 2022 (UTC)[reply]
@Surjection Less people knowing about Usenet, does not mean that it's quote "a lot easier" to make an account on Twitter. I mean Google Groups allows me to respond directly to Usenet discussions with solely having a Google account, and I'm sure that if more people knew about it and cared about it and wanted to use it, they'd do so. Usenet not having a culture of shitposting? Now that's really funny to me because I've definitely seen some of the most vile shitposting on the platform that I've ever seen in my life, I mean just see the citations for D*rky Cuntinent, like there are whole newsgroups(?) dedicated to stuff like that and newsgroup is literally in the definition at shitpost. Re: independence, from the Usenet cites that I've seen and cited and read, ummmm I've definitely seen a ton of anonymous accounts than not, and in the below discussion @-sche literally provided an example where a Usenet user had multiple accounts which led a word to fail RFV. To call that ignorance is completely unfair, when there are literal examples that were brought up not even 48 hours ago. AG202 (talk) 16:12, 16 February 2022 (UTC)[reply]
These are more arguments against allowing Usenet and not for allowing any posts from social media sites. I definitely didn't say and perhaps mistakenly implied that Usenet doesn't have "shitposting", but it's definitely not the same kind - what you're describing seems to be edgy for the sake of being edgy, which is different than "let's get this word recognized for the lulz". Certainly the latter is more innocuous ultimately but neither should count in my opinion. I think the example that was mentioned below proves the opposite of what you're trying to; that it's possible, at least in some cases, to identify when two different "accounts" on Usenet are in fact the same person, but I have little faith of doing that in anything but the most obvious cases for social media sites. — SURJECTION / T / C / L / 16:33, 16 February 2022 (UTC)[reply]
@Surjection The reason why I brought them up was to show how Usenet is really not that much better on that front. The example from the below argument was to show how it's possible to have multiple accounts on Usenet as well, and if it came to it for Twitter, it'd be easy to do the same thing there too, with the reasons that I've already mentioned. It's just that we haven't had Twitter cites before so there's no direct example. Also, I think you're really underestimating how easy it can be to tell if an account is of the same person or group, at least on Twitter. I mean, there are whole recurring memes about it and plenty upon plenty of examples of folks calling out accounts for obviously being the same (see: the popular copy-paste meme accounts like @Dory that used to be around circa 2017), as an active Twitter user for the past several years. And with "let's get this word recognized for the lulz", I've still yet to see an example of that happening with Twitter, and the one example I saw with Reddit for Petersonian got in anyways with cites elsewhere. I just think that a lot of the issues with at least Twitter are misconceptions and that given more time, it'd be shown how useful Twitter can actually be for including the slang of today. I mean, even the OED uses and trusts Twitter for quotations, so it's definitely not farfetched for us to use it as well in this day and age. AG202 (talk) 16:56, 16 February 2022 (UTC)[reply]

To crosslink an important policy discussion arisen about an individual word: Wiktionary:Tea room/2022/February#sniddy. I stress my point that the interpretation of the new rulings that certain sources were supposed to be “voted in” is incorrect. The voters did not intend to hold such discussions or process filed applications for new words but instead the vote has given editors carte blanche to add words of which the editors believe that by the kind of sources they appear they would not fail RFV discussion, RFV discussions now explicitly allowing consensus to keep words in consideration of “other (than Usenet) online-only sources”, for which consensus to become manifest two weeks have been deemed enough. I also stress that this is my understanding of the vote from its inception. Fay Freak (talk) 16:04, 16 February 2022 (UTC)[reply]

Increasing the number of citations required for Usenet and updating CFI

edit

Spurred by several discussions related to recent vote on CFI, RFD discussion for D*rky C*ntinent, the subsequent discussion in Beer Parlour, and the discussion over deprecating Usenet, I believe that it's time that we increase the number of citations required for a term to be created based on Usenet alone. There have been multiple instances where folks have stated that we should avoid including Twitter or Reddit because there's the chance that a niche community could create a word that never escapes those spaces and have it have an entry on Wiktionary. There's also the claim that including those sources could lead to nonce words that target specific individuals or overall be offensive; however, in most, but not all, conversations that I've had, when asked directly about that applying to Usenet, it goes frustratingly ignored. Thus, I'm making this post to push for the same standards for Usenet that we give to other online forums and spaces. Its durability on Google Groups, which has a similar hypothetical issue of deletion that Internet Archive has, should not be the sole reason that any and all terms on it, provided that they have only three cites over a year, can be included on Wiktionary. Let alone the fact that it's a very niche community tailored to specific demographics, which brings up the issue of why it is preferred over other online communities. Additionally, as a secondary suggestion, if it's supposed to be the representative of online communities back in the day, then we should include a time restriction as well, since I've seen Usenet cites from last year be used for entries, when in that case, it really is just like any other online community. AG202 (talk) 00:18, 14 February 2022 (UTC)[reply]

  SupportFish bowl (talk) 02:16, 14 February 2022 (UTC)[reply]
  Strong supportSvārtava [tur] 04:23, 14 February 2022 (UTC)[reply]
It'd undercut the idea that we're only concerned with how durable a source is, but I'd be fine with a cutoff (I think someone suggested not using Usenet from after 2005), so we wouldn't have to search for and delete a lot of old entries, but could reduce the amount of "how come I can't cite my blog, if Usenet is grandfathered in?". I support increasing the number of citations required of online sources, including but not limited to Usenet; it wouldn't actually prevent rare terms (slurs like have been discussed lately, etc) from squeaking in, since even if we e.g. quadruple our threshold, that's still letting in vanishingly rare things only twelve people (or, twelve usernames, possibly just two people) online have ever used, but ... that's better than allowing in things only that three people online have ever tweeted or newsgroup-ed ... - -sche (discuss) 04:20, 14 February 2022 (UTC)[reply]
  Support (I guess we're using icons in this thread?) I suggested a cutoff like 2005. Failing that, demote Usenet to blog status retroactive to the beginning of time, have a bot move all its quotations to the Citations namespace, and let the RFVs mow down precious slurs as users like. Let the Usenet envy end. "And the trees were all kept equal by hatchet, axe, and saw." Vox Sciurorum (talk) 20:19, 14 February 2022 (UTC)[reply]
I'm not quite as opposed to this idea as I was at the time of the previous discussion now that 1. we can allow WWW sources (details to be determined), which reduces the need to rely on Google Groups to cite common Internet lingo; and 2. after seeing the example of D*rky C*ntinent, which nobody would ever realistically encounter and want to look up nowadays, but which has sufficient usage on Usenet (possibly just the same user or two using a bunch of different names, but this is hard to prove). OTOH, downgrading Usenet might lead us to exclude some genuine old jargon, like all-elbows to take a recent RfV example, which would be a shame, so I'm not entirely ready to jump on the anti-Usenet bandwagon. 70.172.194.25 05:06, 14 February 2022 (UTC)[reply]
Wait @Fytcha, yeah this is where the concerns were brought up, with sche and the above user's comments. Apologies that it's not a direct example, but their concerns are valid that it's tough to prove whether or not they're actually three different users. AG202 (talk) 14:59, 14 February 2022 (UTC)[reply]
On second thought, I'm now leaning more toward agreeing with WordyAndNerdy on this one. I don't look forward to the removal of lots and lots of genuine terms for which Usenet is the main surviving record, nor to the added burden of citing regional or (Usenet era) slang. 70.172.194.25 16:59, 14 February 2022 (UTC)[reply]
I think someone asked for examples of multiple accounts being the same person; one is Talk:pukeskin, where from looking at how other users refer to one user of the term, it's clear he's the same person as another user of the term. Suppose he'd made one more post under a third account, and other users had just not pointed out it was the same guy: then you'd be left with a much more ambiguous situation where instead of being one guy's nonce (not inclusion-worthy) it might just be an extremely rare flash-in-the-pan racial slur from one early-2000s internet fandom/community (vital to Wiktionary's core mission to include, apparently). (Yes, as people have said, you could submit anonymous or pseudonymous articles that are meritorious enough that three different editors publish them in print newspapers, but it's astronomically easier online where you can just spend a few minutes creating three usernames and don't have to convince anyone to publish. I concede that it's not that much harder to create twelve usernames and so upping the number of required citations might not actually be a way of discouraging this, while it would hit some obscure but 'organic'/multiuser terms.) - -sche (discuss) 18:18, 14 February 2022 (UTC)[reply]
I would say that the solution to this is to say that if there is reasonable evidence that the term was gamed (e.g. all posted within hours of each other with no other uses in existence), then it is fair to override the usual RFV requirements. Theknightwho (talk) 18:34, 14 February 2022 (UTC)[reply]
@Fytcha AG202 (talk) 20:25, 14 February 2022 (UTC)[reply]
So the point remains that our Usenet attestation criteria have never been gamed, right? pukeskin doesn't exist after all. And as such, the point also remains that I don't want to delete an unforeseeable number of entries for a hypothetical possibility that has not been shown to have occurred ever. I do acknowledge the underlying issue but it seems to be independent of Usenet: A professor could instruct two of his PhD students to use a term he coined; I could just go and self-publish three books on Lulu under pseudonyms. And even acknowledging that this abuse is easiest done on Usenet, I agree with -sche that there's not much of a difference between using 3 or 12 different user names on Usenet if somebody really wanted to game it. The argument to depreciate Usenet because of a hypothetical attack vector is not convincing to me. — Fytcha T | L | C 09:12, 15 February 2022 (UTC)[reply]
While I see and agree with your concerns with respect to Darky Cuntinent, I am worried about other articles such as masktard that seem to fail RFV (despite obviously existing) if we decided to depreciate Usenet. As it stands, I am   opposed to the deprecation of Usenet but my opinion might change depending on which new online sources will be permissible in the near future. — Fytcha T | L | C 08:38, 14 February 2022 (UTC)[reply]
  Support with a caveat - I mentioned this in Forming a standardized process for discussions about online-only sources and attestation, but I think increasing the count of internet uses as well as increasing the DIVERSITY of our web-citations. That is, we should be able to find this word not on only one website, so if someone presents only X Reddit quotes (or only X Twitter, only X Usenet, etc), then some of those should be disregarded, at least until quotes from other websites can be provided. I think we should still be allowed to use Usenet as a source of... sources, but not solely, it should be used in conjunction with other cites. Vininn126 (talk) 08:53, 14 February 2022 (UTC)[reply]
@Fytcha With updated CFI, words like maskt*rd should pass regardless for being a hotword and because, unlike D*rky C*ntinent, it's actually found outside of Usenet. Also all the quotes listed are from 2020~2021, which brings up my point that, at this rate, it's really like any other online forum from our current day and age, just with a weird special status. AG202 (talk) 11:03, 14 February 2022 (UTC)[reply]
@AG202: masktard only truly passes as soon as we have approved sources that make it pass, which we currently haven't. — Fytcha T | L | C 12:00, 14 February 2022 (UTC)[reply]
  Strongest possible oppose. This is an arbitrary and punitive proposal that would absolutely decimate our coverage of many areas of language. Early Internet slang? Gone. Large swathes of fandom slang, especially early Internet-adjacent fandoms like Buffy, X-Files, and Xena? Gone. Early gaming and TCG slang? Gone. Obscure regional slang unlikely to be found in print? Poof. Google Groups has also been nerfed in the last few years. It's now impossible to search multiple newsgroups at once. I have to know exactly what group to search in order to find something. If I'm forced to find extra cites for no good reason, then I'm probably not going to bother searching at all. And I'm one of a handful of contributors willing take the time to scour Usenet these days. Want to make it easier to attest current slang? Let's add Twitter as a corollary source. Want to safeguard against low-quality citations? Let's create a whitelist of acceptable sources like Wikipedia. But this proposal would fix nothing while breaking a lot. WordyAndNerdy (talk) 12:33, 14 February 2022 (UTC)[reply]
@WordyAndNerdy, early Internet slang could easily be included by using the pre-2005 criteria for Usenet quotes. Additionally, the proposal is not asking that Usenet not be entirely be removed as citable for RFVs, but that the number of citations needed by increased from solely 3. How many will be needed? Unsure. Re: Google Groups, I'm still able to search multiple Usenet groups very easily, so I'm a bit confused on that point (maybe it's because I have a Google account?). I'm also not sure how creating a whitelist of acceptable sources like Wikipedia would fix the issues that arise with words like D*rky C*ntinent? This proposal is being made to safeguard against nonce words being created on Wiktionary just because they have three cites across a forum, especially when it's not obvious that they're not made by the same person or group. Fandom slang, early gaming, & TCG slang would not be affected as they would obviously have more cites than solely 3 and would likely be found in other sources that'd fall under our new CFI. I do see the point about obscure regional slang, but I still think that they'd be found in other forums as well, and worst-case, those could be dealt with on a case-by-case basis. AG202 (talk) 12:42, 14 February 2022 (UTC)[reply]
@AG202: If there's any doubt that a term's citations are independent, it should be brought to RFV. Re "Fandom slang, early gaming, & TCG slang would not be affected as they would obviously have more cites than solely 3": Maybe more than 3 Usenet cites full stop, but more than 3 Usenet cites from different people on different sub-fora preferably from different years? That is far from a given. — Fytcha T | L | C 12:54, 14 February 2022 (UTC)[reply]
@Fytcha Currently there's no requirement for different sub-fora in CFI. To quote CFI:
"This serves to prevent double-counting of usages that are not truly distinct. Roughly speaking, we generally consider two uses of a term to be "independent" if they are in different sentences by different people, and to be non-independent if:
  • one is a verbatim or near-verbatim quotation of the other; or
  • both are verbatim or near-verbatim quotations or translations of a single original source; or
  • both are by the same author."
If there's a de facto requirement that they be from different subfora, it's not mentioned in CFI, so I don't see that as a pressing issue. AG202 (talk) 12:58, 14 February 2022 (UTC)[reply]
@AG202: This would create a situation in which hundreds if not thousands of entries would need to be shored up to meet an arbitrary new standard or risk being deleted. As one of the more prolific Usenet attesters, I don't envy the prospect of having to spend time fireproofing a decade's worth of work, when my time/energy on-wiki could instead be invested in tasks that remain to be done. The creation of a Wikipedia-style whitelist would help preclude links to fringe sites or self-promotional material, a concern that has come up in previous discussions. WordyAndNerdy (talk) 13:11, 14 February 2022 (UTC)[reply]
@WordyAndNerdy That's assuming that all of them would actually be sent to RFV, but more importantly, I want to bring up the point again that if the concern is about early Internet slang, then we could set a year requirement, such as updating Usenet in CFI to "Usenet, 2005 and earlier", because I still don't understand why 2021 cites are being used for Usenet when it's just another everyday forum at that rate. Edit: Because in the current state we're in, someone could really go on Usenet now, get three of their acquaintances, create a word, use it over 2 years in different spaces, and suddenly it's a word on Wiktionary. Heck, just do it in a month, and it's suddenly a hot word. It's a lofty concern, yes, but it's the same concern that's been leveraged several times towards me against the usage of Twitter. (Edit 2: And at least with Twitter, we can actually verify and see people's circles to see if folks follow each other or interact in the same spaces, whereas Usenet is much more of a black box.) There needs to be some kind of check, let alone the fact that it's being used to cite some of the most vile words and spaces I've ever seen with solely three cites. The Wikipedia suggestion still doesn't address the issue with the aforementioned word. AG202 (talk) 13:17, 14 February 2022 (UTC)[reply]
Imposing a cut-off date for Usenet cites would also be arbitrary and create more problems than it would solve. It's not like everyone just stopped posting on Usenet in 2005. The Anonymous movement was active on Usenet from around 2008 to 2012. That has allowed us to document 4chan slang that we might not be able to attest otherwise because it never went mainstream (e.g. rickroll) or made it into print media. Fandom was also fairly active on Usenet during the same timeframe, probably because this was a period when the online fan community as a whole was homeless, having been driven off LiveJournal by the 2007 purges but a few years out from making a home on Tumblr and Reddit. I've long advocated allowing Twitter citations as an up-to-date corollary to Usenet. WordyAndNerdy (talk) 13:56, 14 February 2022 (UTC)[reply]
@WordyAndNerdy I see your point, and I wish that we could give Twitter the same leverage, but I've consistently received pushback on it (which really hurts the coverage of AAVE and everyday slang as a side note, as entries have been deleted for not being "durably archived"), so I really want to make sure that we don't give any one preference to any one forum at this rate. And my main point is that I really just don't want repeats of D*rky C*ntinent where any vile racist neologism can gain traction from being listed on Wiktionary from solely three cites, and folks at RFD keep quoting the Usenet line in CFI to vote to keep it. I wish we could have a stricter CFI for offensive terms but that also got significant pushback. Maybe having a stricter CFI for offensive terms found solely online? I'm not sure. AG202 (talk) 14:39, 14 February 2022 (UTC)[reply]
@AG202: I disagree that (2021) Usenet is just like any other forum. Firstly, the hurdle to be able to post something on Usenet is a lot higher (doesn't it require a paid subscription?) compared to, say, Twitter. Secondly, the amount of content moderation is almost zero on Usenet which also contributes to it being more permanent than most other fora. To the second point you might reply that we'd archive tweets anyway, but to that I say that we can do the same for Usenet so the point remains that Usenet is more durable, being durably archived by both Google as well as the archive website(s), whereas tweets are only durably archived by the archive website(s).
The other point you bring up, that we currently leave the door open for anyone (and their two friends) to post the same nonsense three times on Usenet: That first has to happen; I don't want to make sweeping changes to WT:CFI that will lead to the deletion of numerous entries just because of a hypothetical scenario that has never been shown to actually happen. — Fytcha T | L | C 13:57, 14 February 2022 (UTC)[reply]
You can't know much about Usenet, to be honest. Paid subscription?! It's the original, e-mail style forum used by academics, before social media or even the Web existed. Equinox 13:59, 14 February 2022 (UTC)[reply]
@Equinox: You and I remember a time when access to Usenet was included free in most ISP's service packages and there was DejaNews for the unfortunate. But ISPs stopped offering default Usenet access well over a decade ago and DejaNews is long dad. Most Usenet readers are now private subscription-based services. WordyAndNerdy (talk) 14:14, 14 February 2022 (UTC)[reply]
It's also entirely feasible that a sufficiently motivated person could game print media to get their protologism on here. Self-publish a book featuring your self-coinage through Lulu and then twelve months later have a couple of friends submit articles to local papers, college papers, etc. CFI can never hope to entirely prevent such gamesmanship. (How would one tell organic linguistic development from an exceptionally patient person trying to make fetch happen?) It can only hope to prevent most gamesmanship by making it hard. WordyAndNerdy (talk) 14:36, 14 February 2022 (UTC)[reply]
You're not wrong here, but Usenet makes it a lot harder to actually connect the dots; whereas with Twitter or publishing books or articles under your name, it's easier to see one's circle and see if a word gained traction primarily in that circle. AG202 (talk) 14:49, 14 February 2022 (UTC)[reply]
@Fytcha "The other point you bring up, that we currently leave the door open for anyone (and their two friends) to post the same nonsense three times on Usenet: That first has to happen;" I'm pretty sure that this concern has been brought up by other folks that it's happened before (but unfortunately I don't remember exactly what it was). Re: the hurdle to be able to post on Usenet, it doesn't seem like it requires anything paid to post to Usenet newsgroups. In a cursory search I found at least two servers that allow you to post to Usenet newsgroups for free, so it doesn't really seem like the barrier of entry is that high like Equinox said. (Also, it seems like Google Groups allows me to reply to Usenet posts, though I'm not sure how that works @Equinox maybe you could illuminate on that). The issue about content moderation is, to me, actually a point against Usenet, as that means that spam & the issues brought up earlier can lead to more offensive and derogatory nonce terms gaining traction. Re: durability, I think I made my position clear on that, that just because Usenet is deemed more durably archived, doesn't mean that it deserves the pedestal that we give it right now. AG202 (talk) 14:30, 14 February 2022 (UTC)[reply]
@AG202: (This is only tangentially relevant to the topic at hand.) There has recently been a discussion (that I can't remember the details of unfortunately) on whether the uses of terms in scientific papers by two different (but closely associated) scholars are independent. I think the situation is similar here; if someone used a word in a thread and two more people replied to the topic while incorporating the word into their replies, that would sound pretty iffy to me and I wouldn't want to let that count as three independent citations, even if it passed by strict application of WT:CFI (which does however use the language "roughly speaking"; I don't understand the cited passage as a hard and fast rule). I for one always try to find Usenet cites that are maximally spread out in time, by different authors and on different boards for this exact reason. — Fytcha T | L | C 14:00, 14 February 2022 (UTC)[reply]
You have a point here, though it is easier to do that on an online forum. AG202 (talk) 14:46, 14 February 2022 (UTC)[reply]
@WordyAndNerdy: Good points, I agree with you. Add to this list early speedrunning vocabulary (from the time where speedrunning was mainly discussed on Usenet/mailing lists). Another point that I've almost forgotten is Alemannic, which, as I've recently learned, does exist on Usenet (see e.g. amigs). There is currently no Alemannic term that is only citable with Usenet but you never know; I'm thankful for every means to cite Alemannic. Your point about the time it takes also strongly resonates with me; searching for three truly independent and maximally spread-out (in terms of time) citations can be a challenge sometimes; if I had to instead search for, say, twelve, it would take multiple hours to just create a single entry. — Fytcha T | L | C 12:48, 14 February 2022 (UTC)[reply]
@Fytcha I do want to make it clear that I understand the plight of having to find sources, which is part of what makes this more frustrating for me. Before the CFI change (and still currently), trying to find "durably archived" sources for current slang only to be told that I had to find "newspapers with a print edition" or other issues actually did make me spend several hours to close to a day to find cites for words like Mickey Mouse ring for example, even though it has widespread usage on today's forums. So to see words like D*rky C*ntinent pass the mark with solely three cites just because they're from Usenet was a glaring discrepancy. Edit: Side note: it actually is a bit frustrating how much time it takes me to make entries while looking for cites, ex: ᄒᆞ다 (hawda) took me several days, so I really do understand how long it takes to make entries. Even still though, I think that Usenet needs to be on the same level as other online forums rather than being elevated on a platform, or at least Usenet before 2005 should be the standard if we're concerned about older slang being lost. AG202 (talk) 12:55, 14 February 2022 (UTC)[reply]
Additionally, Alemannic is a Limited Documentation Language, so the concern about finding sources for that isn't that big of a concern for me, as solely one Usenet citation would work anyways. AG202 (talk) 13:27, 14 February 2022 (UTC)[reply]
@AG202: If you propose Usenet citations to count less for WDLs (not counting as one full cite), then it will surely also count less for LDLs. It would be inconsistent to change their value for some languages but not the others. — Fytcha T | L | C 14:03, 14 February 2022 (UTC)[reply]
@Fytcha I'm honestly not sure how citing works for LDLs on that front when it comes to being durably archived, but I wouldn't be opposed to having one Usenet cite count for LDLs, being that the citation requirements are already different, see: dictionary mentions counting for LDLs, whereas they don't count for English. AG202 (talk) 14:33, 14 February 2022 (UTC)[reply]
@AG202: Okay, that's a fair point on the topic of LDLs. — Fytcha T | L | C 14:39, 14 February 2022 (UTC)[reply]
  •   Oppose A solution in search of a problem, that would also increase workload. You aren’t ever satisfied. Fakes are always possible with great enough effort. You would need a general clause about organicity use, perhaps vote for my proposal, where I stressed that a term shall not be a “protologism … that barely lives outside a familiar circle”, when it later turns out that the new one sucks too much because of participation needed and lacking substantial standard. Fay Freak (talk) 14:12, 14 February 2022 (UTC)[reply]
Please don't make assumptions about me or my satisfaction. I've told you to stop doing this before, otherwise as stated, I won't engage with you further. I voted for the new proposal and am still very much in support of it. I just think that Usenet should be on the same level as other online sources that we'd be individually reviewing. AG202 (talk) 14:25, 14 February 2022 (UTC)[reply]
@AG202: You have not or only contradictorily written what you think then. If Usenet should be on the same level as other online sources and can be so because of the new rule then what to do is just removing the Usenet clauses from the CFI, not increasing some threshold or implementing additional restrictions. Fay Freak (talk) 14:37, 14 February 2022 (UTC)[reply]
@Fay Freak That's one of the ideas that's been talked about actually, so yes, I would be fine with doing that. AG202 (talk) 14:41, 14 February 2022 (UTC)[reply]
On that point, @WordyAndNerdy, @Fytcha instead of increasing the cites required for Usenet, how about removing the Usenet line from CFI, and having it fall under the recently proposed online citation rules (that are still being fleshed out, I admit) so that there are more detailed standards for it. Not necessarily increasing cites, but making sure that any and all terms, especially offensive/derogatory ones, don't suddenly gain space on Wiktionary. Edit: Maybe something like WT:About Usenet that not only helps clear up the standards around citing from Usenet, but also maybe helps new users learn how to use it and how to interact with it, because I've sure had to learn how to interact with it, and a page would've been helpful to clear things up. AG202 (talk) 14:44, 14 February 2022 (UTC)[reply]
@AG202: Sorry, but I strongly oppose everything that makes a distinction based on offensiveness / vulgarity (or any other semantic property, for that matter). I stand by WT:NOT: "Wiktionary is not censored (nor is it content-rated)." Whether it's a rare biochemical term that barely scrapes by with 3 obscure papers or a rare racial slur that survives RFV with 3 independent Usenet citations, it makes no difference to me. — Fytcha T | L | C 14:57, 14 February 2022 (UTC)[reply]
I honestly can't personally understand how you'd see those as on the same level especially after the recent issue, but I don't think I'd be able to convince you otherwise. How about generally retiring Usenet from CFI and creating WT:About Usenet or something similar though? Minus the offensive/vulgarity part. AG202 (talk) 15:04, 14 February 2022 (UTC)[reply]
Given that one of the users in this very thread has a history of inserting links to a neo-Nazi site into entries, that's not a hill anyone with an interest in this site's long-term reliability should want to die on. WordyAndNerdy (talk) 15:08, 14 February 2022 (UTC)[reply]
@AG202: Tinkering with the current Usenet wording in CFI is likely to cause both foreseeable and unforeseeable problems. It would seem preferable to modify CFI to explicitly allow for other online sources like Twitter, rather than try to resolve the odd privileging of Usenet a decade too late. That bed was made a long time ago. Usenet is baked too deeply into Wiktionary now. WordyAndNerdy (talk) 15:28, 14 February 2022 (UTC)[reply]
@WordyAndNerdy That would be preferable, and I'm wondering how much "As Wiktionary is an online dictionary, this naturally favors media such as Usenet groups, which are durably archived by Google." [emphasis provided by me] could be tinkered with. However, I highly doubt that would pass, just based on the very very strong opinions in regards to Twitter that I've seen here, so I'm really trying to find a compromise, which is part of why I've been trying to have this discussion, and I appreciate y'all for taking part in it. AG202 (talk) 15:32, 14 February 2022 (UTC)[reply]
@WordyAndNerdy, AG202: This comment resonates a lot with me; I agree that the way forward now is elevating other sources as opposed to pushing down Usenet, which I am personally also much more in favor of. Just a couple of days ago the vote has passed that increases the number of citable sources, so I think we are currently making progress in that direction.
On the topic of AAVE that I've seen AG202 bring up a couple of times: Has it ever been proposed that AAVE terms be treated like an LDL with regard to attestation? That would allow a lot more AAVE terms to be cited. It is also undeniable that AAVE is much more sparsely documented than just plain English so having different attestation criteria seems to be logical; the classification of what constitutes a language and what a dialect is fuzzy, so whether a lect is well-documented or not should not be connected to which label we have applied to it, but rather to how well that specific lect is documented. — Fytcha T | L | C 09:29, 15 February 2022 (UTC)[reply]
(Unrelated, but one tip for citing AAVE under current policies is to search lyrics on Genius.com. A lot of phrases are used in rap and hip hop music, many of which are from durably archived albums, etc., even if you can't find uses in books.) 70.172.194.25 09:34, 15 February 2022 (UTC)[reply]
(I wish I could ping you but alas). Using songs & also TVs & Movies is a good point, though they're not as easy to cite as say like tweets. @Fytcha I've actually thought about that (and a more radical policy of just separating AAVE out completely), but I'm not sure how best to go about it. AG202 (talk) 03:11, 16 February 2022 (UTC)[reply]
Don't forget that Usenet was employed as a standin for online forums in general, since none of them were allowed due to the durability requirement. Now that Usenet doesn't have to bear that burden alone, we can afford to be more critical of Usenet citations in RFV. I don't think absolute rules are the answer, I think we just need to get more sophisticated about assessing issues like independence. I can see how we might look at how diverse the places are that usage appears, and look for repetition of incidental wording that might indicate copying.
We need to remember that all the tests in CFI are for the purpose of determining whether terms have become part of the language, and attestation is just our tool for seeing if that's the case. Chuck Entz (talk) 15:36, 14 February 2022 (UTC)[reply]
@Chuck Entz I agree with you on your points. It's just difficult to see how that'll be applied when in RFV & RFD votes, folks point towards CFI as the rationale for keeping terms like the main one in question. Thus, that's why I wanted to change CFI directly. I would love for us to be more critical with how we do RFVs and RFDs and how we go about looking at cites, but as long as Usenet is in CFI as is, those questionable terms and cites will stay. RFV especially, because three Usenet cites is an almost automatic "Cited", though @Kiwima could maybe help out with elaborating on that, because I could be mistaken. AG202 (talk) 15:41, 14 February 2022 (UTC)[reply]
Could we have a rule like we added for websites, where we can evaluate individual Usenet quotations on a case-by-case basis? I feel like not all posts are equal. And maybe for some regional slang or old vocab, three comments is all you'll be able to find. But if they're high quality comments, perhaps that is sufficient. 70.172.194.25 16:45, 14 February 2022 (UTC)[reply]
"I want this word, so I want this post." I think that leads to much argument for little if any gain. Better to have a predefined standard where most everybody will agree that it is met or not, at the cost of having a few words be included or excluded that would be excluded or included with word-specific discussion. For example, "Usenet gets full credit for posts made before 2006 and none for posts made after 2005" or "Usenet can only attest terms with the (computing) label." Or the current "Any post, any topic, any time." Or "No posts from Usenet." Vox Sciurorum (talk) 20:46, 14 February 2022 (UTC)[reply]
  Strong oppose - this is holding one form of language to a different standard for an arbitrary reason, while placing an enormous burden on the community in order to save entries that already have three independent sources as per RFV. Theknightwho (talk) 18:17, 14 February 2022 (UTC)[reply]
It's already being held to a different standard for an arbitrary reason. I am searching a solution that would put other online sources in line with it, either by limiting the pedestal that Usenet is on at the moment, or by elevating other online sources to be on the same level (preferred solution, but unfortunately very unlikely to pass). AG202 (talk) 20:23, 14 February 2022 (UTC)[reply]
  Oppose - I can understand why GoogleBooks and Usenet have a special status as they’re durably archived and I think that we should stick with the requirement that we need to produce 3 independent and durably archived cites that clearly use a word or phrase with its stated meaning, preferably at least a year apart, to verify an entry. I would suggest an improvement would be if we allowed some non-durably archived sources too but required more of them, perhaps upping the requirement in such cases to 5 cites? For terms where mentions are easier to find than usages then we could even allow 7 mentions instead. Overlordnat1 (talk) 01:04, 16 February 2022 (UTC)[reply]
@Overlordnat1 I would be in support of that, but I still think that Usenet being durably archived shouldn't give it the power that it has over other online sources. If Twitter were durably archived, I guarantee you, folks would not be pushing for it to be added to CFI. If I were to add more of my own personal experiences, it really just seems like some folks have favoritism towards Usenet (which is perfectly fine, especially if it was your primary circle and you grew up with it) and use the durably archived clause as an easy-to-defend rationale. I just wish that they'd be more open to other completely valid websites as well, especially in this day and age. AG202 (talk) 03:09, 16 February 2022 (UTC)[reply]
@AG202 I certainly wouldn’t oppose adding Twitter if it were durably archived (and tweets that can be found on the Wayback machine are durably archived in my book) but it never looks good when a dead link appears in an entry. If we allow 5 non-durably archived sources, whether Twitter or elsewhere, to count then even if one or two links go dead (or someone deletes their tweet or changes their settings so it can’t be seen) then we’ll still have 3 or 4 good cites. Overlordnat1 (talk) 04:27, 16 February 2022 (UTC)[reply]
@Overlordnat1 That'd sound good to me, hopefully more people can come around to that. As a side note though, I'm still on the fence about how I feel about archiving non-public figures' tweets as it's breaching an issue of privacy and unintended reach, but that's probably a convo for another time. AG202 (talk) 04:36, 16 February 2022 (UTC)[reply]
In another subthread on this page you declared an opposition to (the purely hypothetical example of) restricting Twitter citations to verified accounts, which is more or less a proxy for being a public figure. If you think archiving tweets of non-public figures on the Wayback Machine is a violation of privacy, I don't see how archiving them to public-facing dictionary entries or Citations namespace pages is any better (unlike the Wayback Machine, Wiktionary has mirrors and is indexed by Google). 70.172.194.25 07:48, 16 February 2022 (UTC)[reply]
I see your point, but imho archiving on Wayback Machine is very different from quoting on Wiktionary. Archiving on Wayback Machine saves the entire page, including the person's image, the entire tweet, the thread that it's in, and additional data that's in relation to the tweet, and it stays online even if said person deletes the tweet or goes private. Whereas, quoting the tweet on Wiktionary, just has the person's public display name, the snippet of the tweet (or whole tweet if actually needed), their username, and a link to the tweet and can be deleted off of Wiktionary (and hopefully the mirrors update as well, which the ones I've looked at seem to do so!) if the person deletes the tweets or goes private, much more easily than trying to delete something off of Wayback Machine. So yes, they are rather different cases to me personally as someone who uses Twitter and has thought about the implications of my own tweets being archived on Wayback Machine. I'm not sure I feel the most comfortable with archiving tweets to Wayback Machine with the express purpose of being used on Wiktionary. Though even when citing a tweet to Wiktionary, I'd still try to be careful about the implications. It's not a black-or-white issue. AG202 (talk) 08:14, 16 February 2022 (UTC)[reply]
My personal preference for a way to avoid privacy issues would be to go for high-profile posts when possible (verified user, large follower base, large number of replies/reactions, etc.). If we need to use a low-profile post, either because it is one of few options available, or because it is important in some other way (good example of usage, early example or potential original source, etc.) then I would prefer to archive it. I don't think we could even have a policy against submitting links to the Wayback Machine, because anyone can do it and there's no way to tell who did it (possibly not even a user of Wiktionary), only when. It's unenforceable. You could propose a policy against linking to Wayback Machine versions of posts... but I'd be against that too. That said, if someone deletes a post and we can substitute an alternative post, then I do think we should remove and replace it; but we cannot really guarantee that it doesn't still exist in a Wiktionary database dump somewhere. 70.172.194.25 08:51, 16 February 2022 (UTC)[reply]
To be clear, I'm not proposing a policy against archiving tweets; I'm just stating my own personal feelings on archiving low-profile tweets and that I personally don't feel comfortable archiving tweets like that. What other editors do on that front is up to them, I'd just suggest thinking about the implications of it. AG202 (talk) 09:05, 16 February 2022 (UTC)[reply]
Please let's not increase the requirements beyond 3. That just adds more work for ourselves. Please let's not cite sources that can't be verified later. That would just make the entire process questionable. I'm fine adding reasonable restrictions to Usenet so long as other durable sources open up (like verified Twitter perhaps). E.g. Two Usenet cites must be not only independent but also a year apart to both count. DAVilla 08:18, 21 February 2022 (UTC)[reply]
The tension between attestation policy before and after the new two week debate policy is at its heart a question of the legitimacy of Wiktionary as a dictionary enterprise rather than a practical issue of inclusion of particular words. The editors feel like they want more internet in their internet dictionary. Who doesn't? I opposed the policy because I didn't think Wiktionary needed it to be a more legitimate dictionary. But you do always get that feeling that it is just not right to demand three oclcs or usenet or similar- we're on the internet here, and there are legitimate words out there that seem they can kind of be documented, but maybe not perfectly according to a perhaps too demanding or perhaps too rigorous standard. Of this new two week debate policy, Lazarus would say: "I call it an alternative warp. It's sort of a negative magnetic corridor where the two parallel universes meet. It's sort of a safety valve. It keeps eternity from blowing up." --Geographyinitiative (talk) 02:09, 16 February 2022 (UTC)[reply]
  Oppose A durably archived source is a durably archived source, even when it is hard to search. As to "gaming" WT:ATTEST it is effectively done on Google Scholar when a professor and two former students use a term that nobody else does. I fail to see why UseNet should merit special punitive treatment. I do get a hint of indirect prescriptivism and possibly even censorship in this line of discussion. DCDuring (talk) 02:33, 16 February 2022 (UTC)[reply]
Prescriptivism? Special punitive treatment? I don't know how pushing for other modern online sources of slang to be included on Wiktionary and pushing for them to be on the same level as Usenet is prescriptivism... If you read the exchanges with other folks, the main issue that I had was that other online forums like Twitter & online news were being excluded for cites, and people kept telling me that the rationale for excluding Twitter was the worry about three people making up a word and gaming the system. So my question was, can't the same be done with Usenet? I wasn't trying to make Usenet have punitive special treatment, but actually put it on the same level as other online sources, whether by elevating them to the level of Usenet (which garnered significant pushback for the aforementioned issues) or deprecating Usenet to their level. I'm not even going to address the censorship issue because that's been addressed already. AG202 (talk) 03:05, 16 February 2022 (UTC)[reply]
  Oppose This would effectively kill any coverage of internet slang, especially stuff that's exclusive to certain subcultures or fandoms. It's already hard enough to cite terms that don't appear in books, this would just make it worse. If anything, I think we should open the floodgates to newer internet sources instead of treating Usenet as some sort of holy grail simply because of how old it is. Our job is to document all words in all languages, and prescriptivist policies only defeat that purpose. Binarystep (talk) 06:39, 21 February 2022 (UTC)[reply]
  • Late counting: I count 5-7 for support-oppose. I am not 100% clear about what the proposal is: the thread title asks for increasing the number of quotations while the text of the proposal also says: "I'm making this post to push for the same standards for Usenet that we give to other online forums and spaces": but that would mean Usenet is disallowed as much as other online forums and spaces. Did each of the opposers really oppose increase of quotations? Or did they only oppose wholesale ban of Usenet, as in "opposed to the deprecation of Usenet"? Increasing the number of quotations required from 3 to 6 (for instance) would not "effectively kill any coverage of internet slang", to use another quote. If I count DAVilla's non-iconed "Please let's not increase the requirements beyond 3", we get 5-8. --Dan Polansky (talk) 07:24, 31 October 2022 (UTC)[reply]
    To be quite fair, at that time, that wasn't an official proposal with the purpose of garnering votes, it was purely meant to be a discussion starter. I also feel this post-commentary is removed a bit from the context of that time. My proposal was not meant for Usenet to be banned at all. AG202 (talk) 14:25, 31 October 2022 (UTC)[reply]

Is Usenet a "durably archived" source?

edit

DCDuring wrote, "A durably archived source is a durably archived source, even when it is hard to search." I think that this raises a more fundamental question: is Usenet a "durably archived" source, as the term has come to be used?

The other day, I tried to verify the first quotation at "cumsicle", which was from alt.sex.phone.ads. I followed the link to Google Groups only to find a notice stating "You don't have permission to access this content / For access, try joining the group or contacting the group's owners and managers". I tried joining the group and was met with an error message ("An error occurred while joining the group"). I'm not especially familiar with Usenet, but as best as I can tell, either the post is no longer able to be viewed or the only way of viewing it now is with the permission of the administrators of alt.sex.phone.ads.

The argument for Usenet qualifying as a "durably archived" source despite no other website so qualifying seems to be that we can be confident that the Google servers will be running forever. But, like any other website, we can't be certain that the publisher won't take the content down. Assuming I'm not missing something here (and I don't discount that possibility), this seems to be what happened in the case of alt.sex.phone.ads (provided that the content wasn't already limited to this closed group at the time of its citation). Graham11 (talk) 08:42, 17 February 2022 (UTC)[reply]

That's interesting, actually. @Fytcha, @WordyAndNerdy AG202 (talk) 13:29, 17 February 2022 (UTC)[reply]
Well, I guess we should go back to hard-copy only, because almost anything online can be taken down in our new censorious age, including from the Wayback Machine, which might lose its funding. Come to think of it, books etc can be burned, libraries may no longer be funded or be destroyed. Oh, dear, maybe enwikt won't last forever. DCDuring (talk) 15:53, 17 February 2022 (UTC)[reply]
? At this rate I don't think you know what "censor" means (or if you do, you're using it spuriously) And if you think that Usenet being randomly not archived on Google Groups is a marker of a new "censorious" age, then wow I can only imagine what you'd think of actual censorship that happens in other countries and also several decades ago, where people can actually be killed, vs the simple worry of not finding something online that is still accessible elsewhere. This isn't the first time this has been said to you either. The reason I tagged WordyAndNerdy & Fytcha actually is to see if they could provide more information on that happening and to talk about what that means in regards to being "durably archived", not to discuss whether or not this is "censorship". AG202 (talk) 16:09, 17 February 2022 (UTC)[reply]
I certainly wouldn't suggest that. But I think this case does suggest that our current rationale for allowing Usenet citations while disallowing other web citations that are reasonably securely preserved – i.e., that Usenet is "durably archived" in a way that sources preserved through the Internet Archive are not – doesn't hold water.
I don't presume to know what the answer is in terms of the broader debate around the attestation requirement. But the distinction being drawn between Usenet and the Internet Archive generally seems dubious and, in light of this new evidence, certainly cannot be drawn on the grounds of "durability". Graham11 (talk) 05:25, 18 February 2022 (UTC)[reply]
I have downloaded and grepped a relatively small number of raw mbox archives of giganews hosted by the Internet Archive, which seemed to be very complete (I noticed messages that were missing from Groups). If you're wondering why I went through this, it was to cite /thread since Google's own search ignores punctuation. So I can attest that at least anything prior to 2014, the date of the giganews dumps, seems to be preserved, even if Google Groups goes away entirely. There are a lot of items in the IA collection other than the 2014 giganews stuff, including more recent backups, but I have not looked into anything else in detail. 70.172.194.25 16:02, 17 February 2022 (UTC)[reply]
In this case, since we're technically relying on Internet Archive for Usenet backups, it's really really ironic that we're not allowing Internet Archive to be included in the "durably archived" clause. AG202 (talk) 16:12, 17 February 2022 (UTC)[reply]
I tried grepping the archive of that one group for the post Graham11 mentioned and I couldn't find it, so maybe the archive.org backups are not as complete as I thought. I also note that Giganews was only founded in 1994, so those particular items wouldn't have any older posts (although the "Usenet Historical Collection" may). 70.172.194.25 06:41, 18 February 2022 (UTC)[reply]
(Replying to @Graham11 without reading the interceding comments)
It's been a long time since I properly grokked the architecture, but from what I remember Usenet operates kinda like BitTorrent, or the blockchain/crypto. The data isn't hosted on one private server or network of private servers, but spread across a public, decentralized network of nodes. Anyone with sufficient resources (server space, electricity, etc.) could maintain an archive of Usenet. Google Groups is probably the best known extant Usenet archive. There's also Narkive and Usenet Archives. I believe Google Groups was deemed "durably archived" because Google is a large corporation that has the resources to conceivably maintain a Usenet archive for a long time. But Google still takes down Usenet posts for various reasons. They may have taken a page from Craiglist's book and summarily shut down adult services-related newsgroups like alt.sex.phone.ads after SESTA-FOSTA rather than try to navigate new legal complexities. But it may still be possible to locate copies of that post on other Usenet archives. It likely hasn't been deleted from Usenet itself - it's just been removed from the Google Groups back-up of Usenet. WordyAndNerdy (talk) 01:43, 23 February 2022 (UTC)[reply]
Thanks for the background, WordyAndNerdy. Like I said, I'm not especially familiar with Usenet, so that certainly helps.
Let's use the quotation at "cumsicle" as a case study. (It's unfortunate that the one convenient example we have is rather vulgar, but alas.) Usenet Archives appears to have only six records from alt.sex.phone.ads in 2004 and this isn't one of them. (Incidentally, I should note that I also tried searching for the first five words of the message's subject line directly at Usenet Archives but my search was still loading half an hour later.) At Narkive, no matter what I search, I get a 404 page. I'm not sure if either website is indexed by Google, but either way, I can't find an archive of the post via Google Search. Do you know of any other websites that might have a record of the message, WordyAndNerdy (or anyone else)?
If there is no reliable archive of Usenet that isn't liable to take down messages (as it appears Google is liable to do), Usenet is, at best, no worse a source for citations than websites archived by the Internet Archive. Surprisingly, there doesn't appear to be anything in WT:CFI explicitly prohibiting the citation of unpublished works like unpublished letters or emails, but there probably ought to be for the sake of verifiability. (At least in the case of books, for example, most countries appear to have legal deposit regulations.) Usenet messages that haven't been archived by Google or the Internet Archive, or at least a website like Narkive, would, to my mind, fall into the same category as a private email in terms of being able to be verified. Graham11 (talk) 08:03, 23 February 2022 (UTC)[reply]
It's possible to find raw Mbox files with messages from that newsgroup on the Internet Archive: Giganews Archive, Usenet Historical Collection (you have to search for the right group). But as far as I can see, the message is not in those files. Or if it is, it's not findable by grepping with the subject line or message given in the quotation. You can find other posts on the group using the word in question, but that's not the point. Now, given the distributed nature of the system, it's possible that someone somewhere retains a copy of that message, but that doesn't mean it can be tracked down. 70.172.194.25 08:21, 23 February 2022 (UTC)[reply]
If there could be a copy in private hands somewhere but there is no way to track it down, that would put it functionally in the same category as a private letter I received in the post, no? Graham11 (talk) 08:39, 23 February 2022 (UTC)[reply]
In the 2000s authorities noticed there was porn on Usenet and made threats. You know there's illegal and immoral stuff on Usenet and you carry it anyway. The reaction of some ISPs was to drop alt. entirely. I have never seen a discussion of which newsgroups are citable. The "big 7" were widespread, alt.* less so and inconsistent within the hierarchy, alt.binaries.* less carried and less likely to be archived, and regional newsgroups less than that. Vox Sciurorum (talk) 10:37, 23 February 2022 (UTC)[reply]

Concordance namespace?

edit

I found out today that we have a dedicated Concordance: namespace, with a handful of stuff in it. As far as I can tell, these pages are linked from precisely nowhere within the main dictionary, so I suspect I might not be the only one who is learning of this namespace for the first time.

I also found Category:Concordances, which tries to provide a rationale for its existence: "Someone reading the stories of Sherlock Holmes, for example, may need to look up rare and obsolete words even if they are a native speaker". True, but I can just type the rare and obsolete words into the search bar as I run across them!

There are only 16 concordances in the namespace. They are a wild mix of stuff:

  • Concordance:Bible - an English concordance of "the Bible". We don't get to learn critical details such as which English translation it covers.
  • Classic novels like Concordance:Moby-Dick, inexplicably split up into 135 separate concordances, one for each chapter
  • Concordance:Engines - apparently from some kind of engineering podcast, put there to "test our coverage" - that is, editor-facing rather than reader-facing material.
  • A couple of foreign-language ones like Concordance:French New Testament, which doesn't give the number of times each word occurs, so isn't IMHO a true concordance.

Does anyone find these pages useful? If so, how can we make them easier to find? At minimum I would think the concordances could be moved to the Appendix; it seems weird to have a whole namespace for this stuff. This, that and the other (talk) 08:53, 14 February 2022 (UTC)[reply]

It is an odd duck and I'm not clear on why this space exists apart from appendix, but I think it's no harm to keep it as it is. As this becomes an all-purpose word-thing reference (a dictionary, a thesaurus, a rhyming dictionary, etc.), then maybe we'll expand to more concordances over time. —Justin (koavf)TCM 08:56, 14 February 2022 (UTC)[reply]
(copy pasting my comment, which got deleted for some reason) This feels like something that could be done on Wikisource. Having a corpus is crucial for dictionaries, and Wikisource (among other things) sorta fits that bill. They should be able to automatically create this sort of thing, I feel. Vininn126 (talk) 09:14, 14 February 2022 (UTC)[reply]
I agree that it seems like they could be automatically generated by a concordancer from Wikisource content. Not sure whether this hypothetical idea would be better implemented as an on-wiki Lua script or a separate tool on the WMF servers like PetScan, but the latter sounds more practical to me (either would probably have to be heavily cached to reduce load times).
Manual sifting might be better at filtering out terms of specific importance to the text than a simple comparison with base rate frequencies in the entire language corpus, but I don't know if doing such filtering is even desirable. Other than this, I don't see much need for human intervention.
One exception to the above idea is that, obviously, you can't use Wikisource to generate a concordance from a non-free text that is not allowed on Wikisource. See e.g. Concordance:Foucault's Pendulum. 70.172.194.25 02:54, 15 February 2022 (UTC)[reply]
Isn't a concordance supposed to include the surrounding context? These are more like word lists. 70.172.194.25 19:41, 14 February 2022 (UTC)[reply]
Yes, for sure, but this is also the start of making a concordance. The dictionary part isn't finished, nor is the thesaurus part of our work, and clearly, neither is the concordance. —Justin (koavf)TCM 20:19, 14 February 2022 (UTC)[reply]
I have no qualms with deleting the name space and related pages now. If Wikisource wants to develop something, that is their call. Given you can search the dictionary, I don't see and real purpose for maintaining concordanecs on Wiktionary. —The Editor's Apprentice (talk) 07:31, 15 February 2022 (UTC)[reply]
Agreed. This feels far more suited to Wikisource, and the current layout is completely useless. Theknightwho (talk) 17:12, 16 February 2022 (UTC)[reply]

Non-lemma form entries are confusing to casual readers

edit

Something I've noticed repeatedly is that many casual readers do not actually follow through to the lemma form. We are only able to observe the behavior of a small portion of readers, namely those who actually bother to leave comments about their experiences. Here are some examples of what I mean:

There are likely more, but these are just the ones I noticed and remembered enough to link. Is there anything we could do to nudge users to click through to the lemma forms for definitions? I do think providing glosses with |t= may be helpful (as Svartava did on catfishing), but it adds redundancy and makes it harder to keep things in sync. 70.172.194.25 19:28, 14 February 2022 (UTC)[reply]

[Edited. Some more examples I found (one search query): Talk:marmalise, Talk:dei, Talk:wergild, Talk:jiving, Talk:mattoids, Talk:hooligans, Talk:πρᾷον, etc. There are a lot of these. And for every reader who bothered to write a post, there's no knowing how many walked away confused without leaving a note. 70.172.194.25 22:56, 14 February 2022 (UTC)][reply]
We might be able to identify these automatically by looking at the pageview stats (find pages with many views which contain only non-lemma forms). – Jberkel 23:07, 14 February 2022 (UTC)[reply]
Something like jumping to (the appropriate definition at) the lemma after the three seconds during which the casual readers sits in stunned inactivity? DCDuring (talk) 19:41, 14 February 2022 (UTC)[reply]
Not sure about the hard redirect thing. How would we understand "inactivity"? Vininn126 (talk) 20:33, 14 February 2022 (UTC)[reply]
Also, wouldn't work if you have multiple entries on the page. Thadh (talk) 20:47, 14 February 2022 (UTC)[reply]
You could delete them and forcibly train readers to lemmatize their own words. Alternatively we could create some sort of Lua monstrosity that reads the lemma entries and copies the definitions. DTLHS (talk) 19:43, 14 February 2022 (UTC)[reply]
Underline links. Equinox 19:44, 14 February 2022 (UTC)[reply]
Might seem crazy, but add text next to the link that says "see there for more". I can't believe people sometimes... Vininn126 (talk) 19:59, 14 February 2022 (UTC)[reply]
That would be horrible clutter, wasting the time of people with basic computer literacy. Equinox 20:02, 14 February 2022 (UTC)[reply]
(e/c) I too have seen this. While ultimately we can't fix stupid, I agree we should consider ways of making things clearer. My most realistic suggestion is to have the form-of templates end their verbiage with something like "— please click through to entry for definitions"; Equinox's suggestion of underlining links is also a great easy-to-do idea; and we could let logged-in users could opt out of seeing either the extra verbiage or the underlining, like there's that gadget that controls whether you see T:,. Since most form-of entries are the only thing on their pages, so making them use more memory isn't as prohibitive an idea as for other kinds of entries, another idea is to have the form-of templates extract and automatically supply glosses from the lemma so things don't fall out of sync, but this is probably a bad idea and runs into problems if you try to gloss (say) running by listing all the senses of run, or if a nonlemma page is already close to the memory limit, so we'd need to have an off-switch (or separate template) for such cases. A radical idea is wherever a string of letters is only a nonlemma form of only one other entry (and not where it's a nonlemma form of 2+ entries like perches, or is homographic to a lemma form in any language, etc), hard-redirect it to the lemma, thus at least sharply reducing how often this happens. For languages with few inflected forms this would also have other useful effects; for example, I really doubt most people think to click through to inflected form entries to check if they have unexpected pronunciations, so listing those pronunciations in the main entries would be more visible. (For Latin, we'd have to list the pronunciations in the inflection tables, because listing them all in the pronunciation section would be a mess.) But that'd be quite a change. - -sche (discuss) 20:04, 14 February 2022 (UTC)[reply]
(after edit conflict)
We could limit whatever "dumbing down" we decide on to unregistered users. DCDuring (talk) 20:07, 14 February 2022 (UTC)[reply]
Is that technically possible? The main part of a page is generated once and cached. You would need CSS to detect whether a user is logged in. And then you annoy unregistered users who know to click through. Vox Sciurorum (talk) 20:39, 14 February 2022 (UTC)[reply]
We've effectively gone part way toward that kind of thing by making the default "dumbed down" and having others opt out. Unfortunately that means those "in the know" opt out and further distance themselves from the ordinary users' experience. I don't know whether we could do something directly by user class. Perhaps we could have a toggle that allowed switching between "dumbed-down" and "expert"/"insider" views. That might even lead more casual users to register. DCDuring (talk) 02:37, 15 February 2022 (UTC)[reply]
I cannot imagine that we split the plural noun hoods (now defined as “plural of hood”) into five senses (or eleven if subsenses are included). Adding noun senses to a Verb section at catfishing was obviously misguided; it is unclear that spelling out the different senses of this verb form would have helped. Somewhat paradoxically, I think the user who was confused by the plural noun jobsworths would have been instantly enlightened by a usage example (The city offices are staffed by unhelpful jobsworths), whereas the user having an issue at gaslighting may have been befuddled by the presence of the (single) usex. We do not have separate entries (in a Noun section) for the gerunds of many English verbs and I don’t quite see why we have one for gaslighting, but given that we do have one, I think we should list all senses; defining the noun setting as “gerund of set” would not do either.  --Lambiam 10:55, 15 February 2022 (UTC)[reply]
This is sometimes made even worse when a word is used in senses that are exclusively plural, but for whatever reason someone has listed all of them on the singular, with a note like "(in the plural)" next to those entries. As an example, see supply. It's total nonsense to lay things out like that with uncountable senses, really, which is why I didn't notice til quite a bit after I'd added a new plural-only definition to supplies. We're not just building a database here - it's got to be human-friendly too! Theknightwho (talk) 17:09, 16 February 2022 (UTC)[reply]
It should be human-friendly, and that's why definitions have often been centralized in the singular/lemma form—because even veteran editors of this very site don't necessarily think to look for the definition of a word at one inflected form of another; many people know enough to know what the lemma to look up is, and hiding some of the definitions in an inflected form results in them not being found. (It's basically a separate category of issue, alongside and in a contrary direction to the other people who only think to look up whatever inflected form they copypaste out of some text, and don't think to look for the lemma ... but putting some definitions in lemmas and some definitions in inflected forms might be like the worst solution to those two problems, relative to either putting all the definitions in the lemma or summarizing the definitions in all places.) If we hide any definitions on inflected forms, we need to make that apparent via a {{used in phrasal verbs}}-type pointer so people know to also look at the inflected form, like at message, megrim, peanut. - -sche (discuss) 20:51, 16 February 2022 (UTC)[reply]
The thing is, I'm not sure that I'd agree that supply is the lemma when it comes to senses that cannot be used in the singular. The same way that jean isn't the lemma of jeans. It's not like we'd be distributing the the definitions at random, after all. There is also the relatively straightforward solution of signposting, too. Theknightwho (talk) 22:38, 16 February 2022 (UTC)[reply]
  • This is a great question to ask. I just want to note how awesome it feels to look up a verb on Word Hippo and see synonyms in the same tense. And for some words, there can be synonyms that don't take the same form! So sometimes I wonder if we might be approaching this completely wrong. If every gerund can be a noun, why don't we define it as such? The fundamental unit isn't a lemma entry, it's a definition that can tie to several forms. As soon as someone defined catfish as a verb in a slang sense, we should have gotten the act of catfishing in that sense for free. DAVilla 07:42, 21 February 2022 (UTC)[reply]

Separating sources by level of editorial control

edit

Straw poll: should weighting of attestions be different for

  • Words appearing in professionally edited text, or written by professional writers in their professional capacity. This includes major newspapers, major journals, and books you would find in a regular book store if anybody went to book stores any more.
  • Words appearing in informal contexts which are more likely to have typos, misconstructions, misconceptions, nonce words, and in-jokes: blog posts, self-published books by people without a track record of paid writing, tweets, newspaper comments sections, etc.

This division correlates with durability and register, with the first category tending to be more durable and higher register. And it's not a strict dichotomy; a lot of low grade scientific journals don't correct ESL errors or unnecessary coinages but still beat the average newspaper comments section. Vox Sciurorum (talk) 20:37, 14 February 2022 (UTC)[reply]

I think distinguishing by professionality is very discriminating. Publishing work from a secondary occupation may be as good. And full-time journalism is abysmal as well these days, systematically so and necessarily, because of its dependent nature. Nowadays one may read only blogs, due to their providing the coverage of topics that the supposed professionals are destitute of. Blogs are an open genre that is not constrained to be not of high quality: mind also the many academic blogs. Apart from the fact that the genres often converge that you don’t whether there is a collective of journalists behind a blog: We should consider the care usually taken by a specific author or editor, and his possible interests, rather than cubbyholing him into a genre. Fay Freak (talk) 22:31, 14 February 2022 (UTC)[reply]
I think Vox Sciurorum is referring to the "quality" of the language (i.e. how educated or professional it sounds), not whether the content is "fake news" or whatever! Equinox 22:41, 14 February 2022 (UTC)[reply]
I wasn’t about the content either. But he seemed to devise qualities or ranks of authors in certain “capacities”. Of course we would always consider whether a word may just be typo, misconstruction and so on, by how the text is else and by how the author is else. To say, we are seeking to bypass fake language and undue descriptions of language rather than fake news. And mainstream journalists may be and are worse an echo chamber than 4chan is, in many respects that is and other respects than the confused internet scribbler. So there is no ranking and no weighing in a sense that would lead to it. Fay Freak (talk) 23:06, 14 February 2022 (UTC)[reply]
Interesting. It sounds like crowd-sourced prescriptivism. DCDuring (talk) 02:42, 15 February 2022 (UTC)[reply]
Of course there's a difference. If you go and buy a printed newspaper or a book, you can be fairly sure it will be properly spelled and capitalised. On a Web forum it might be all lower-case with no spellings checked and no punctuation. Equinox 02:46, 15 February 2022 (UTC)[reply]

weird category I encountered

edit

https://en.wiktionary.org/wiki/translationese

There is a category in there with a "en:" prefix, which seems to me like some mistake someone made. Anyone fix it?

User670839245 (talk) 04:29, 15 February 2022 (UTC)[reply]

@User670839245: It is fine. These are topical categories and the "en:" prefix is the language code for "English". —Svārtava [tur] 04:32, 15 February 2022 (UTC)[reply]
And to piggyback off of this comment, if you want to see terms related to translation studies in other languages, take a look at Category:Translation studies. —Justin (koavf)TCM 04:35, 15 February 2022 (UTC)[reply]

Changing the default citation form for Latin verbs

edit

At the moment, we cite nearly all our Latin verbs in their first person singular present indicative forms (hereafter abbreviated to ‘FSPI’). For instance, the verb meaning ‘love’ is cited as amo, a form which literally means ‘I love’.

I submit that it would be far better to cite Latin verbs in their present active infinitive forms (hereafter abbreviated to ‘infinitives’). For the aforementioned verb, that would be amare, which means 'to love'.


The first point to consider is that there exist Latin verbs that cannot have an FSPI form in the first place. Examples of this type include impersonals such as grandinat (it is hailing), decet (it is suitable), and libet (it is agreeable)

Note how Wiktionary is forced, in each of these cases, to use the third-person singular as the citation form, given the lack of a FPSI counterpart.

Next, there exists a larger class of words which were always, or nearly always, impersonal in Classical Latin, but which may have had marginal personal usages, generally in other periods and rarely, if ever, in the first person singular.

On Wiktionary, each of these has been awkwardly split into two entries, one for the impersonal usages, the other for the personal ones. Examples of this type include paeniteo/paenitet, taedeo/taedet, ningit/ninguo (also ningo), pluit/pluo, oporteo/oportet, and pigeo/piget.

None of these splits would be necessary if we were to simply use infinitives as citation forms. To the best of my knowledge, every Latin verb has an infinitive, which presents an excellent opportunity for consistency across entries.


The second point to consider is that the practice of citing Latin verbs in their FSPI forms—when the latter actually exist—is at odds with nearly all of our Romance entries. We cite all French, Catalan, Spanish, Portuguese, Italian, and Romanian verbs, without exception, in their infinitive forms; the only Romance language for which that is not the case is Aromanian, and that is only because it has no infinitive in the first place.

Consequently, for perhaps tens of thousands of Romance etymologies, editors have felt the need to provide two Latin forms and explain their grammatical relation. Consider the etymology provided for Spanish cenar:

Everything after the third word is redundant; nothing is gained by providing the form cēnō and explaining that cēnāre is its present active infinitive. A reader unfamiliar with either language might erroneously conclude, from the way that this etymology is presented, that the Latin form cēnō does not actually survive in Spanish, or that some other drastic shift has taken place, since the Spanish verb is said to derive from a 'special' form of the Latin verb, rather than from the 'default' form cēnō. Anyone familiar with Romance will, of course, know that Spanish does retain ceno, which still means what it did in Latin, but not everyone consulting Spanish entries will have a background in the subject.

(Also note how, in the above etymology, Latin cēnō has been erroneously glossed as ‘to dine’, rather than the correct ‘I dine’, presumably by an editor more used to working with languages where the standard practice is, reasonably enough, to use infinitives as citation forms.)

That, however, was a fairly tame example. When a Romance etymology involves an intervening Vulgar Latin form, the result can be a labyrinthine nightmare. Here, for instance, was the etymology given for Spanish llover before I changed it:

Anyone could be forgiven for feeling bewildered by it. Following my proposals, we would instead have the following description:

That is, in fact, an accurate summary of what happened, and if a reader wants to see other inflexions of the Latin verb (or, for that matter, the Portuguese one), they may consult the respective entry.

Needless to say, were we to simply use Latin infinitives as citation forms, editors would not feel the need to unnecessarily clutter etymologies like this.


The third point to consider is that citing Latin verbs by their FSPI forms fails to reliably indicate a key piece of information: their class.

Latin verbs, at least the regular ones, fall into four main classes. One example from each is shown below:

Note how the ending is shared by classes I and III. (Class II has -eō, which is different.) As a result, whenever one is confronted with a new verb ending with , one cannot use its FSPI citation form to tell what class it belongs to.

Now compare their infinitives:

All four classes show distinct endings (in each case a different vowel followed by /-re/), such that one can always determine a regular verb’s class from its infinitive, which one cannot do with the FSPI forms.


Finally, while the authors of the main Romance etymological dictionaries are for the most part deceased, and hence cannot be asked directly, it appears that they appreciated some or all of the above points, since they consistently cited Latin verbs in their infinitive forms. This can be seen throughout Meyer-Lübke’s Romanisches Etymologisches Wörterbuch, Von Wartburg’s Französisches Etymologisches Wörterbuch, Coromines & Pascual’s Diccionario crítico etimológico castellano e hispánico, Treccani’s Enciclopedia Italiana, Alcover’s Diccionari Catala-Valencia-Balear, the Trésor de la Langue Française informatisé, and DEX online. No doubt this is also the case for other sources that I have yet to peruse.


In sum, using the Latin infinitives as citation forms would offer the following advantages:

  • Internal consistency in Latin entries, as well as external consistency with the vast majority of Romance entries.
  • Decluttering Romance etymologies of redundant and potentially misleading information.
  • Reliably indicating Latin verb class while citing only a single form.
  • Consistency with all Romance etymological dictionaries (that I am aware of).

The only drawback that comes to mind is the amount of work that it will take to effect this change. Perhaps a program can be written to do it automatically; it would simply have to transfer most of the information from a given Latin verb's FPSI form to the entry for its infinitive. ('I' can be automatically replaced with 'to' in the definitions.) I do not expect that a program could be trusted to change the Romance etymology sections correctly, but that can be left for individual editors to carry out gradually as they encounter the relevant entries, much like the ongoing replacement of {{etyl}} with {{der}}. Nicodene (talk) 04:43, 15 February 2022 (UTC)[reply]

As someone who is okay at a few Romance languages and not okay at Latin, this seems very sensible to me and thoroughly considered. —Justin (koavf)TCM 05:00, 15 February 2022 (UTC)[reply]
It should be noted that, while use of the infinitive would have the benefit of Consistency with all Romance etymological dictionaries, it would also be inconsistent with most dictionaries of Latin, including the ones we most commonly give as references, Lewis & Short and Félix. In fact, the tradition of lemmatizing verbs at the first-person singular goes back to the time of the ancient Romans themselves. For comparison, we also lemmatize Sanskrit verbs under the third-person singular form as is traditional, and Ancient Greek verbs under first-person singular (although this is also what modern Greek uses, unlike the descendants of Latin). I'm personally happy to part with ancient Roman tradition if it makes sense, and I think the benefits of switching to the infinitive for Latin may very well outweigh the costs. 70.172.194.25 06:03, 15 February 2022 (UTC)[reply]
During a discussion on Discord, two editors who deal with Ancient Greek said that they thought it would be a good idea to do the same with their verb entries. Perhaps they could join in here to elaborate on their reasons.
As for the Latin grammatical tradition, part of me does regret defying the likes of Lewis & Short on this matter, not to mention the Romans themselves. Still, progress is a matter of breaking with tradition. Nicodene (talk) 08:25, 15 February 2022 (UTC)[reply]
Not so simple. It would require editing every single Latin entry with a verb section, since all nonlemmas link to the lemma, and pretty much every link to any Latin verb in any etymology anywhere, and I would be surprised if it didn't require recoding of a number of templates and modules as well. What's more, you'd be moving descendants sections in every Latin verb entry that has them, which means that every Proto-Italic and Proto-Indo-European entry (and quite a few entries for terms in many other languages that were borrowed into Latin as well) with {{desctree}} pointing to a Latin verb entry will end up in CAT:E with a module error until that template is updated. Oh, and don't forget all the Latin non-verb entries that have verbs in the derived terms and see also sections. I think we're looking at in the neighborhood of half a million Latin entries at the very least, and given that the vast majority of verb entries in Romance languages have etymologies that link to Latin ancestral forms, and given the enormous number of direct or indirect borrowings in other languages, a million-plus affected entries sounds quite plausible. Chuck Entz (talk) 06:56, 15 February 2022 (UTC)[reply]
Would a multi-tiered process work? Something along the lines of:
  • 1- Bots copy relevant information from the current lemmas to the infinitives.
  • 2- Humans adjust anything important* to relate to the infinitives.
  • 3- Bots remove most information from the current lemmas, leaving the infinitives as the new lemmas.
* By 'anything important' I mean anything that, if left unchanged, would result in an error during step 3.
Needless to say, I'd help out whenever human intervention would be necessary, e.g. manually sorting out desctrees, if it comes to that. Nicodene (talk) 07:58, 15 February 2022 (UTC)[reply]
While I’m not opposed if this can be pulled off without too many accidents, I don’t see what stands in the way of already now making these simplifications in Romance etymology sections (like “From Vulgar Latin *plovere, an alteration of Latin pluere”).  --Lambiam 09:57, 15 February 2022 (UTC)[reply]
Yeah, I think we can (continue to) simplify the etymology sections whether or not we change which form is lemmatized (which is, as Chuck notes, a daunting task and one at odds with the Romans' own approach). Either link directly to the lemma/FSPI and accept the difference in ending, link to the infinitive and accept that people will have to make an extra click to reach the definitions, or link to the lemma/FSPI but pipe the link to display the infinitive; any of these works. (And ask the few users who add most of the unneeded verbiage to stop.) - -sche (discuss) 01:52, 16 February 2022 (UTC)[reply]
  Support Without wading too much into the details above, it would be good to get rid of this inconsistency. I know I'm just repeating what Nicodene has already said, but it's particularly annoying when verbs don't have a FSPI form (see Category:Latin third-person-only verbs), or when the FPSI is used for verbs that are only very rarely invoked in that sense, making the definitions read quite strangely. Theknightwho (talk) 07:42, 16 February 2022 (UTC)[reply]
I don't really buy this argument; this discrepancy exists on other Latin dictionaries too. If anything, it's a plus, as it shows instantly when a verb is impersonal. Having said that, I do agree that definitions like "I rain" are ridiculous, but the solution there would be to simply use infinitives in the definitions, like Sanskrit verbs do despite not being lemmatised under their infinitive. I also agree that the split of occasionally-personal-but-mostly-impersonal verbs across two entries is highly undesirable. This, that and the other (talk) 11:08, 16 February 2022 (UTC)[reply]
It does occur to me, while we're talking about this in general, that it would be good for the verb forms to have glosses. When you're dealing with the third-person plural future perfect active indicative, it can be quite difficult to keep track of all of those qualifiers, when all it means is "they will have X". I don't mean full entries, but something that intuitively communicates the tense with respect to that particular verb. I'm reasonably sure it's possible to do, but it would definitely need some thought behind it due to the obvious logistical issues. Theknightwho (talk) 17:02, 16 February 2022 (UTC)[reply]
  Support For all the above reasons and because none of the reasons given by people opposing this are real arguments. Sartma (talk) 17:10, 23 February 2022 (UTC)[reply]
  Oppose Boo, what inconsistency—how is that enough to discomfit you? You can already consistently cite Latin verbs in their infinitive forms in Romance entries, which reliable indicates Latin verb class while citing only a single form, that’s why the main Romance etymological dictionaries do so. I don’t have felt the need to provide two Latin forms and explain their grammatical relation in the fashion Latin cēnāre, present active infinitive of cēnō (to dine). If anyone needs such explanation then he has never cared about Latin details to the slightest anyway, and etymologies shan’t be only about lemma/citation forms, so of course the feeling such a need was wrong, and I always simplified these mentions to the scheme {{m|la|ceno|cēnāre}}. Fay Freak (talk) 10:05, 16 February 2022 (UTC)[reply]
Two different kinds of inconsistency are noted- debatably, even three. Nicodene (talk) 10:35, 16 February 2022 (UTC)[reply]
None discomforting in the mere fact of Latin and Romance having different citation forms. One must have a goldplated aesthetic standard to reckon this inequality concerning. The other inconsistencies do not make the effort worthwhile. Pending complete execution of this proposal there would be even as great inconsistencies for years in other pages which cited Latin by the first person singular present. It is more effective to purge the obtuse linking and ignore this antinome that one only sees if one looks upon the whole dictionary from a bird's eye view. Romance editors also wrongly link Arabic forms with the article الـ (al--), it does not mean we should create Arabic noun inflections with it; editors should know, and there are not as many that one could not widely enough make them know. There is no natural likelihood of error from the current approach but merely a wrong adjustment of thought, to use a picture of programming: only a setting to be swapped, no need to recompile and reboot the whole program: if you get used to citing the forms of a language in a certain way once there is nothing to disturb anymore, citation forms are an obvious convention merely all the time. Fay Freak (talk) 11:56, 16 February 2022 (UTC)[reply]
I suppose two people can look at precisely the same information and still come to opposite conclusions.
On the other hand, it seems that virtually everyone is in favour of changing the cluttered Romance etymologies. One wonders, then, why tens of thousands of them were made in the first place. Perhaps everyone thought that most others approve of them- I certainly did. Nicodene (talk) 19:22, 16 February 2022 (UTC)[reply]
An interesting idea. I need to think about it some more, but I'm certainly open to it. It would likely cause difficulty for inquam alone, while improving the situation for impersonal verbs (and, of course, etymologies). We even have support from lemmings: the Medieval Latin dictionaries, Du Cange and DMLBS, lemmatize verbs under the infinitive. Given the scale of the change that would ensue, I think a formal vote is inevitable.
A prerequisite to any change in this space, in my view, would be a resolution to the issue discussed above: #Non-lemma form entries are confusing to casual readers. This, that and the other (talk) 11:08, 16 February 2022 (UTC)[reply]
  Oppose Accept it, different languages in the same family can have different citation forms. In Celtic, we live with this fact all the time; the modern Goidelic languages are in the imperative 2sg, but their ancestor Old Irish lemmatizes in the present 3sg and their close relatives the Brittonic languages lemmatize at the verbal noun. Also pinging @Mahagaja due to his experience in both Celtic and other classical languages. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 17:37, 20 February 2022 (UTC)[reply]
And how is that at all a good thing? If anything that could also benefit from a change. Nicodene (talk) 01:22, 21 February 2022 (UTC)[reply]
I’ll just add that stuff dealing with Classical Gaelic/Early Modern Irish often uses 1st sg., so in just Goidelic itself you go through different citation forms, from 3rd sg. pres. (7th–12th c. language) → 1st sg. pres. (late 12th–17th c.) → 2nd imperative (modern langs). And I don’t think it’s very problematic. (Well, there’s no separate Classical Gaelic on Wiktionary, but that’s another story and I’m considering starting a discussion about splitting Mod.Ir. into Mod.Ir. + Classical Gaelic again.) // Silmeth @talk 10:33, 21 February 2022 (UTC)[reply]
  Support To be honest I thought the infinitive was the only form ever used for referring to a Latin verb, as it is in all the Romance languages I know anything about. The arguments for consistency given above seem compelling. (Can anyone explain where the idea of using the 1st pers sing came from?) Imaginatorium (talk) 07:07, 2 March 2022 (UTC)[reply]
Re "Can anyone explain where the idea of using the 1st pers sing came from?", that was because the Romans (i.e. actual Latin speakers) themselves often did so, and Latinists thus followed their lead. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 04:25, 3 March 2022 (UTC)[reply]
Sure, but the crucial difference between Romans and anyone in the third millenium is that they were native speakers, and knew what the words were already. (Do we have any existing printed Roman dictionaries to show us an example?) Actually I spot that my tiny schoolboy Collins Latin dictionary has the headwords in the form "a'm/o -are avi atum". An electronic dictionary should present all four forms, instead of being hung up on what wikt would look like if it were not electronic. Imaginatorium (talk) 05:08, 10 March 2022 (UTC)[reply]
  Oppose Every language has its lexical habits. For Latin as evidenced in most dictionaries, that is to use the first-person present for verbs. This is a long-established practice and shared across dictionaries in different languages. (See, e.g., Georges' Ausführliches lateinisch-deutsches Handwörterbuch, Gaffiot's Dictionnaire illustré Latin-Français, and, as already noted, Lewis and Short.) That being the case, anyone even semi-serious about learning Latin is going to encounter verbs in the first-person present. Changing Wiktionary's default to the present infinitive would simply add one more step for any learners depending on it; at the same time, it would separate Wiktionary from established lexical practice. — This unsigned comment was added by 3charles3 (talkcontribs) at 16:33, 24 May 2022.
Plenty of Latin dictionaries use the present active infinitive form. It is old-fashioned to use the first person present active indicative. Theknightwho (talk) 21:13, 26 May 2022 (UTC)[reply]
Can you name some example dictionaries that do that please, so we can see them? I would be interested to take a look. The three I named are all freely available online. Only one of the dictionaries that I use lists verbs using the infinitive (viz. the DMLBS). The rest are, alas, "old-fashioned." 3charles3 (talk) 12:31, 27 May 2022 (UTC)[reply]

Validity of Categories

edit

Justinrleung and Frigoris have advised me to ask you guys about the validity of two sub-categories I made within Category:English transliterations of Mandarin terms which are called Category:English terms derived from Tongyong Pinyin and Category:English terms derived from Wade-Giles. Is the naming consistent with the overall structure of Wiktionary and the policies of the site? Thanks for any comment. If there's no comment, I will assume it's pretty much okay for now. --Geographyinitiative (talk) 23:56, 16 February 2022 (UTC)[reply]

I'm not familar with how these type of categories are usually named, but I do have one note. I think a name like Category:English terms derived from Wade-Giles transliterations would be clearer for the Wade–Giles category. I know Wade–Giles by itself can be used to refer to the system and its transliterations, but I think it would be helpful for general users to specify that. I don't think "transliterations" needs to be specified for the pinyin category because the word pinyin already refers to transliterations. —The Editor's Apprentice (talk) 20:32, 20 February 2022 (UTC)[reply]
Thanks for your feedback. I remember reading near the front of the 2012 edition of Chinese History: A New Manual that Wilkinson feels that using the Roman alphabet (and others?) to write Chinese character pronunciations is not an act of transliteration (to transliterate) but is actually an act of transcription (to transcribe). Hence at first glance I am leery of the change. Another similar alternative might be Category:English terms derived from Wade-Giles romanizations. --Geographyinitiative (talk) 01:45, 23 February 2022 (UTC)[reply]
@The Editor's Apprentice, Justinrleung, Frigoris, 70.172.194.25 & anyone interested: I reread EA's comment and I agree with the issue of helping general users with a clarifying word- I think I am going to change the name to Category:English terms derived from Wade-Giles romanizations unless there are any objections. I will do it in the coming week or two. More evidence that this might be an appropriate change [3] --Geographyinitiative (talk) 17:53, 25 February 2022 (UTC) (modified)[reply]
That seems like a reasonable change to me. (I'm okay with it either way, though.) 70.172.194.25 17:49, 2 March 2022 (UTC)[reply]
Personally speaking, I like these categories. I don't think there's an existing standard for terms by transliteration scheme, but the way you've named them seems consistent with how the language derivation categories work. And it seems like something that could be of interest. 70.172.194.25 01:48, 23 February 2022 (UTC)[reply]
Thanks, that's basically what I have been thinking. --Geographyinitiative (talk) 01:53, 23 February 2022 (UTC)[reply]

What shall we do about {{shi-IPA}}? Metaknowledge made it a redirect to {{zgh-IPA}} last July; as a result, shi (Tashelhit) entries are given zgh (Moroccan Amazigh) pronunciations. Maybe that in itself is OK; I don't know anything about either language, so maybe they do have identical spelling-to-pronunciation rules. But an undesirable side effect is that Tashelhit entries are categorized into CAT:Moroccan Amazigh terms with IPA pronunciation instead of CAT:Tashelhit terms with IPA pronunciation. Since {{shi-IPA}} is only used on four entries, I'm tempted to simply replace it with manual {{IPA}} calls (giving the transcription automatically generated by {{tgh-IPA}}) and then delete the template, but perhaps someone can duplicate Module:zgh-pronunciation as Module:shi-pronunciation and make the necessary changes. —Mahāgaja · talk 13:14, 19 February 2022 (UTC)[reply]

You don't even have to duplicate the module to fix the miscategorization. You can just copy the source of Template:zgh-IPA to Template:shi-IPA, and change it to use {{IPA|shi|...}} instead of {{IPA|zgh|...}}. 70.172.194.25 04:14, 20 February 2022 (UTC)[reply]
@Mahagaja: You could've just asked me! The short of it is that we aim to eliminate the zgh code, but I haven't gotten around to dealing with it. Please don't replace the transclusions with manual IPA calls, because I'm hoping that Malku will eventually add narrow IPA as well to the module. If you want to do something immediately, you can change the categorisation logic in the module. —Μετάknowledgediscuss/deeds 04:29, 20 February 2022 (UTC)[reply]
The module isn't what categorizes entries currently. The template does. So you can do what I mentioned previously. 70.172.194.25 04:43, 20 February 2022 (UTC)[reply]
I've followed 70.172's suggestion. Thanks for your help! —Mahāgaja · talk 09:11, 20 February 2022 (UTC)[reply]

Protected edit request: square meal

edit

To be added:

  1. Used other than figuratively or idiomatically: see square,‎ meal.

Thanks. 70.172.194.25 05:07, 20 February 2022 (UTC)[reply]

Done. —Justin (koavf)TCM 05:28, 20 February 2022 (UTC)[reply]

Galician < Old Portuguese?

edit

At the moment Wiktionary labels the ancestor of Galician as 'Old Portuguese', as can be seen here. Considering that the language originated in Galicia, and spread southward from there into Portugal, the labelling is clearly backwards: if anything, Portuguese derives from Old Galician. Modern scholarship, however, favours the neutral term Galician-Portuguese for the common ancestor of the two languages. See, for instance, the Oxford guide to the Romance languages or Nigel & Harris' The Romance Languages.

In an attempt to fix the ahistoricity of Wiktionary's deriving Galician from Old Portuguese, editors have manually changed 'Old Portuguese' to 'Old Galician and Old Portuguese' in hundreds, perhaps thousands, of Galician entries; cf. abella, chover, and sobre. That is understandable but inefficient. A better solution would be to rename the common ancestor to Galician-Portuguese, which would bring Wiktionary in line with Wikipedia and common sense. Nicodene (talk) 01:53, 21 February 2022 (UTC)[reply]

Another sensible suggestion. —Justin (koavf)TCM 18:35, 21 February 2022 (UTC)[reply]
Totally in favour, this name has nothing historical and is totally misleading. Oigolue (talk) 11:52, 2 March 2022 (UTC)[reply]
@Nicodene What do you think about adding "Old" to it? "Old Galician-Portuguese". Same question for "Old Navarro-Aragonese", currently simply named "Navarro-Aragonese". I also vote very much in favour of renaming Old Portuguese (and, unfortunately, systematically going through all those Galician entries that currently say "Old Galician and Old Portuguese"...)--Ser be être 是talk/stalk 18:56, 3 March 2022 (UTC)[reply]
Adding 'Old' is also an option, to make it clear that the chronolects in question are not modern or early-modern. On the other hand, that may inadvertently imply that there does exist a modern Galician-Portuguese (not that it would be entirely absurd to see things that way) or a modern Navarro-Aragonese. Nicodene (talk) 00:47, 4 March 2022 (UTC)[reply]
  Strong support for "Old Galician-Portuguese". I would just like to add that the argument from Oigolue's about historicity is not the best, since then "(Old) Galician-Portuguese" couldn't be supported either. - Sarilho1 (talk) 09:22, 30 March 2022 (UTC)[reply]
I think that Old Galician-Portuguese could be a better long term solution, as Galician-Portuguese is sometimes used as the name of the macrolanguage/family. On the other hand, Galician-Portuguese is the usual designation for the common ancestor/medieval variety of Galician and Portuguese.--Froaringus (talk) 09:45, 30 August 2022 (UTC)[reply]

Wiki Loves Folklore is extended till 15th March

edit
Please help translate to your language
 

Greetings from Wiki Loves Folklore International Team,

We are pleased to inform you that Wiki Loves Folklore an international photographic contest on Wikimedia Commons has been extended till the 15th of March 2022. The scope of the contest is focused on folk culture of different regions on categories, such as, but not limited to, folk festivals, folk dances, folk music, folk activities, etc.

We would like to have your immense participation in the photographic contest to document your local Folk culture on Wikipedia. You can also help with the translation of project pages and share a word in your local language.

Best wishes,

International Team
Wiki Loves Folklore

MediaWiki message delivery (talk) 04:50, 22 February 2022 (UTC)[reply]

Anglo-French and Anglo-Latin

edit

Should Anglo-French and Anglo-Latin be added as etymology-only languages? An example from the Middle English Dictionary (siclatǒun): “Etymology: OF [Old French] siglaton, ciclaton, segleton, singlaton, silaton, (chiefly) AF [Anglo-French] siclatun, ciclatun & AL [Anglo-Latin] cyclaton, ciclatoun, sicladon”. Pinging @Theknightwho. J3133 (talk) 02:32, 23 February 2022 (UTC)[reply]

No, better not start with this. There is such a thing as Category:German Latin but this is not a variety to be noted but the coincidental circumstance of a term being employed in a certain place, a circumstance which is not very regular in occurrence, not as systematically as “British English” and “US English”, and not as Anglo-Norman, as it was nobody’s native language: you see the difference? The MED used it for being short and analogous in style. Fay Freak (talk) 03:08, 23 February 2022 (UTC)[reply]
I've not looked into this properly yet, but I don't think native language has anything to do with it, really. Theknightwho (talk) 03:40, 23 February 2022 (UTC)[reply]
A language dying out naturally immunizes it, in a way, against change. This is mostly as fewer people ever altogether employ it and that more deliberately. What otherwise would appear as a regiolectal or sociolectal feature ends up an idiosyncrasy. Fay Freak (talk) 11:39, 24 February 2022 (UTC)[reply]
That's an interesting point, and I'm inclined to agree, though I'm not sure that it applies to Anglo-Latin. While it wasn't anyone's native language, Latin absolutely was the language of scholarship and law. If you look at the Appendix to the 1912 edition of Court-hand restored, which is a "Glossary of Latin words found in records and other English manuscripts, but not occurring in Classical authors.", it goes on for 68 pages. It's then followed by 35 pages of Latinised forms of English and Irish place names, bishoprics, surnames and first names. Judging by the number of entries per page, I'd say that there are around 4,000 words and 2,000 proper nouns - all of which are justifiably called Anglo-Latin. Theknightwho (talk) 12:43, 26 February 2022 (UTC)[reply]

All multi-word entries should be marked as idiomatic OwO

edit

This entire discussion looks like a work of abstract art. bd2412 T 04:20, 24 February 2022 (UTC)[reply]

No doubt, but there's no reason to take up the whole page with it. I collapsed it. Chuck Entz (talk) 04:54, 24 February 2022 (UTC)[reply]

The Call for Feedback: Board of Trustees elections is now closed

edit
You can find this message translated into additional languages on Meta-wiki.
More languagesPlease help translate to your language

The Call for Feedback: Board of Trustees elections is now closed. This Call ran from 10 January and closed on 16 February 2022. The Call focused on three key questions and received broad discussion on Meta-wiki, during meetings with affiliates, and in various community conversations. The community and affiliates provided many proposals and discussion points. The reports are on Meta-wiki.

This information will be shared with the Board of Trustees and Elections Committee so they can make informed decisions about the upcoming Board of Trustees election. The Board of Trustees will then follow with an announcement after they have discussed the information.

Thank you to everyone who participated in the Call for Feedback to help improve Board election processes.

Best, Movement Strategy and Governance
--Mervat (WMF) (discusscontribs) 14:57, 24 February 2022 (UTC)[reply]

How does everybody feel about that subcategory? I wanted to create it for German Bildungssprache, Burschensprache=Studentensprache, Ausländerdeutsch, Bergmannssprache etc. (cf. also Sprache#Derived_terms). @Mahagaja, Fay Freak: You could be interested in this. — Fytcha T | L | C 15:27, 25 February 2022 (UTC)[reply]

Support for all languages. This is also necessary not to clutter the language category, in this case Category:German language. And consequential, the regiolects already have categories, Category:Regional German, Category:Regional Ancient Greek, Category:Regional Arabic (don’t really care that it is not named regiolects, this would only be on Nicodene’s level of concern). Likewise chronolects should be added, useful for Category:Hebrew language, to group Biblical Hebrew (3 c, 108 e), Medieval Hebrew (11 e), Mishnaic Hebrew (16 e), Paleo-Hebrew (4 e).
The sociolectal categories are of course subsidiary to topical categories: Though there is a biology sociolect, having Category:Biology is enough; but it does not work for the said groups; bar perhaps miners’ language. Fay Freak (talk) 15:48, 25 February 2022 (UTC)[reply]
@Fay Freak: I've added it: diffFytcha T | L | C 17:47, 28 February 2022 (UTC)[reply]

Chronolects (e.g. Early Modern English)

edit

I have added Category:Early Modern English and added the terms emblemishment and misseinterpretacion to it (and no doubt there are a few thousand more that would fit as well). How should this be categorised? At the moment it's just part of Category:English language, but it feels like there's scope for labels for other time periods as well - particularly when it comes to things that have a habit of changing every few years like slang (e.g. Victorian words like bag of mystery). Is there an agreed upon way of doing this? Theknightwho (talk) 02:25, 26 February 2022 (UTC)[reply]

In light of the suggestion about sociolects above, I have discovered that these are called chronolects. Theknightwho (talk) 02:53, 26 February 2022 (UTC)[reply]
@Theknightwho What exactly do you mean by "how should this be categorized?". If it's about the categorization of words into a time period with labels, for Early Modern Korean, prior to our recent change towards having it as a separate L2, we'd add it to {{lb}} as like {{lb|ko|Early Modern}}, and it'd auto-categorize to Category:Early Modern Korean. Korean isn't the only community that does this, and seeing that the EME label does exist, I don't see why that couldn't be applied to English as well. (Notifying Mahagaja, Metaknowledge): AG202 (talk) 04:02, 26 February 2022 (UTC)[reply]
I should clarify that I'm referring to categorisation of the category Category:Early Modern English. In other words, should we have Category:Chronolects for each language in the same fashion as the suggestion for sociolects above? Theknightwho (talk) 04:04, 26 February 2022 (UTC)[reply]
Ahaaa got it, thanks, in that case, I don't see why would couldn't create Chronolects? Though I know that some folks have the valid opinion that categories shouldn't be made if there aren't at least "10" entries in it, so that may be a concern. AG202 (talk) 04:07, 26 February 2022 (UTC)[reply]
This valid opinion refers to topical categories, by and large. And even in those, there is the objective of not having too large categories but outsourcing content in subcategories; here in particular, as said, we aspire to avoid to clutter and maintain a reasonable subcategorization for any Category:langname language. However there is still a threshold, though lower: for English we probably get by well with “dated”, “archaic” and “obsolete”, and have even more specific date ranges for some words. Fay Freak (talk) 19:13, 26 February 2022 (UTC)[reply]
I think we could probably subdivide archaic and obsolete forms into Early Modern English (c. 1450 to c. 1750) and Late Modern English (c. 1750 to c. 1900), which are the two recognised transitional forms between Middle English and Modern English. I have to say that "Middle Modern English" would probably have been a better name for the later one, but I didn't get to make the decision, unfortunately. Theknightwho (talk) 23:35, 26 February 2022 (UTC)[reply]

Request to bulk delete pages containing only incorrect Spanish forms

edit

I've found ~1000 pages that contain nothing more than an incorrect Spanish form. Here's the list: User:JeffDoozan/lists/es_pages_to_delete

I've checked that the pages contain only a pronunciation section (optional), the POS declaration, and the (incorrect) form entries. There's no additional text on the sense lines or Usage Notes or anything mentioning that the form is an alternative spelling or whatever, just the sense (incorrectly) asserting it to be a form of a given lemma. Can an admin with a bot please delete these? Here's the same list in a more bot-friendly format. JeffDoozan (talk) 13:25, 26 February 2022 (UTC)[reply]

How did you identify these? I'd like to do the same for German, every now and then I stumble on bot-created forms which were based on an incorrect version of the declension table. Maybe it's enough to look for form-of entries without incoming links? – Jberkel 18:42, 26 February 2022 (UTC)[reply]
Looking for form-of entries without incoming links would be a very clever way to make a similar list. I made this list the hard way: I extracted all of the Spanish lemma headwords from the wiki dump and then generated all of the possible forms for them using custom Python reimplementations of the {{es-adj}}, {{es-noun}}, {{head}}, and {{es-conj}} templates to build a database of all the "declared" forms. Then I re-scanned the wiki dump looking for anything claiming to be a form that wasn't in the database and then filtered out any pages with text or templates outside the form of templates. JeffDoozan (talk) 21:07, 26 February 2022 (UTC)[reply]
The hard way, maybe, but still very clever. The “easy” way will not work anyway if the entries are created based on existing inflection tables and the bot links the terms in these tables.  --Lambiam 22:52, 26 February 2022 (UTC)[reply]
Beware many of the plurals identified in this list are plurals of recent borrowings that are in actual use in my experience. E.g. póster may claim its plural is pósteres (it sometimes is) but pósters is by all means an alternative plural in real use. The verbal forms seem to be all crap though, and I'd just delete them with prejudice. By the way, the "bot-friendly" page is the same page you gave first; I think you intended to give a different link.--Ser be être 是talk/stalk 05:39, 5 March 2022 (UTC)[reply]
Thanks for all the reviews, I removed the verb forms mentioned by @AG202 and all plurals, per @Ser be etre shi. I made the same changes to the bot-friendly pages and fixed the link so it now points to the correct page. JeffDoozan (talk) 17:38, 5 March 2022 (UTC)[reply]
Thank you !!! And especially thank you for the clean up, surprised that these have been entries for so long. AG202 (talk) 18:46, 5 March 2022 (UTC)[reply]
@Benwing2: what needs to changed to make our conjugations for erguir match the RAE's entry on erguir? JeffDoozan (talk) 17:50, 5 March 2022 (UTC)[reply]

@DCDuring, Jberkel, Vininn126, Chuck Entz, -sche, General Vicinity, Allahverdi Verdizade: I have started to work on a new vote for collocations. The vote is not ready to go yet but before I spend more time polishing the wording and the specifics, I'd like to get some more input. — Fytcha T | L | C 14:53, 26 February 2022 (UTC)[reply]

Please make sure to include this option Allahverdi Verdizade (talk) 15:12, 26 February 2022 (UTC)[reply]
@Allahverdi Verdizade: Where exactly do you see the advantage in adding |colloc=1 as opposed to replacing {{ux}}/{{uxi}} with {{co}}/{{coi}}? — Fytcha T | L | C 15:31, 26 February 2022 (UTC)[reply]
Perhaps that should be part of the vote. Implementing them as a new template as opposed to integrating them with someone existing. Vininn126 (talk) 20:04, 26 February 2022 (UTC)[reply]
I think it's a well-written out proposal and you've collected a lot of good ideas, thank you for taking the time to write it. I think the instruction to put collocations after all nyms but before all example sentences could draw some complaints because 1) I don't think it's actually codified anywhere that that nyms must be placed above example sentences and 2) it sounds like it's prioritizing collocations over existing example sentences, which may put off some editors who aren't excited about adding collocations. I might rewrite that to be "after any nyms or example sentences" or omit it entirely. Also, it's unclear whether collocations in a ====Collocations==== section must use the {{co}} templates or if they could be free-form. I would prefer that they be required to use the {{co}} templates, but I expect that might be a controversial opinion. JeffDoozan (talk) 22:28, 26 February 2022 (UTC)[reply]
You raise a lot of good points. Especially the last one. Vininn126 (talk) 22:47, 26 February 2022 (UTC)[reply]
@JeffDoozan: On the last point: This was an oversight on my part; I've quickly fixed it here. I strongly agree that they should be wrapped in these templates so that they can be categorized correctly etc.
Regarding the first point with the order: Yes, that's actually a trickier point than I initially thought. I personally always order uxes after nyms, but maybe some people see that differently. I actually thought about standardizing that some time back, i.e. get a bot to order everything below a sense as: 1. nyms (in the same order as their headers in WT:EL) 2. uxes 3. quotes. I think the savest bet would be to not specify any order at all in this vote and then to perhaps see where the community stands on my proposed order in a separate discussion/vote after collocations (hopefully) passed. — Fytcha T | L | C 23:58, 26 February 2022 (UTC)[reply]
@Fytcha: WT:EL: “An alternative to listing synonyms in a separate section is their placement immediately under the corresponding definition lines with {{synonyms}}”. J3133 (talk) 00:37, 27 February 2022 (UTC)[reply]
@J3133: Good catch. Assuming nobody would be in favor of interspersing non-nyms between the synonyms and the other nyms, the only question that remains is whether collocations should always come before of uxes. I'd personally say so because 1. that way uxes and quotes are grouped together as they should be 2. collocations (like nyms) feel like they are more inherent properties of a word/sense whereas examples/quotes merely demonstrate it. — Fytcha T | L | C 10:05, 27 February 2022 (UTC)[reply]
@DCDuring, Jberkel, Vininn126, Chuck Entz, -sche, General Vicinity, Allahverdi Verdizade, This, that and the other: Sorry for the mass ping again but I just created User:Fytcha/Collocations and amended the vote accordingly. Please tell me what you think because this is getting close to the version that I'd want to go to the ballot with; all that needs to be done from my POV is to flesh out point number 2 on the vote page to contain some nice verbiage similar to WT:EL#Synonyms. — Fytcha T | L | C 21:44, 2 March 2022 (UTC)[reply]
I've come around to DCDuring's suggestion of only doing nouns and verbs. It would be nice to be able to include adjectives/adverbs for a few corner cases, such as zdecydowana większość, unless we are counting that as a noun collocation. Aside from that, I'd be fairly happy with this. Perhaps it'll be best to introduce something smaller, so that the community would be less inclined to shoot it down (if we were to try and include everything possible right off the bat, I dunno how people would react). I guess we'll just have to put it out there and see! Vininn126 (talk) 22:22, 2 March 2022 (UTC)[reply]
@Vininn126: I agree that we should eventually also allow adjective / adverb collocations (I have a couple in mind that would make sense) but at the same time I think nouns and verbs are a safer bet. The list of admissible PoS can always easily be expanded given that we have consensus. — Fytcha T | L | C 23:31, 2 March 2022 (UTC)[reply]
Can you add to that page an example of a collocation that belongs at a verb entry? Perhaps my ignorance showing here, but I'm having trouble thinking of any. This, that and the other (talk) 22:52, 2 March 2022 (UTC)[reply]
@This, that and the other: German einjagen is a clear example. As for English, perhaps bode for which we even document the collocations in the label. — Fytcha T | L | C 23:29, 2 March 2022 (UTC)[reply]
We don't have good tools for collecting high-frequency collocations of any kind. KWIC displays from large corpora would be very useful. Perhaps we could do something with a Google N-Grams download.
There are many archaic/obsolete/dated English verbs that haven't quite fossilized, but only collocate with a small number of nouns. Heed and hark might be examples, but there will be many more. DCDuring (talk) 00:50, 3 March 2022 (UTC)[reply]
Good examples, thanks both. This, that and the other (talk) 01:27, 3 March 2022 (UTC)[reply]
I think we need to see this for a polysemic word, not any extreme case, but one with, say, 5 definitions. DCDuring (talk) 01:47, 3 March 2022 (UTC)[reply]

Inaccurate Wikipedia article

edit

The Wikipedia article "Proto-Sinaitic Script" has a table of letter lineages that is misleading. It portrays the Hebrew square-script letter θaw as being descended from whatever the proto-Sinaitic letter for the proto-Semitic phoneme /θ/ was. This is inconsistent with the table which has for example the Latin letter F descended from ו not פ etc. It even shows that /ʃ/, /ɬ/, and /θ/ were all written with the letter 𐤔 and then has it split. It also has proto-Semitic /x/ and /ɣ/ with ח and ע as they should be and not with כ and ג. Dngweh2s (talk) 18:02, 26 February 2022 (UTC)[reply]

@Dngweh2s: You are posting on Wiktionary, which is the wrong venue for this. Fay Freak (talk) 19:16, 26 February 2022 (UTC)[reply]

Request for interface administrator permissions

edit

@Chuck Entz, Surjection: I kindly request to be granted interface administrator permissions. I want to fix and improve MediaWiki:Gadget-SpecialSearch.js. — Fytcha T | L | C 16:59, 27 February 2022 (UTC)[reply]

While I'm not opposed, I still don't know what the process for appointing interface administrators even is. — SURJECTION / T / C / L / 18:21, 27 February 2022 (UTC)[reply]
@Surjection: Me neither. Here are the precedents (picking the newest ones from Special:ListUsers/interface-admin): Special:UserRights/Ruakh seems to have self-assigned it out of necessity (GP), Special:UserRights/Surjection was after a BP discussion and consensus (BP), Special:UserRights/Gamren was because another sysop asked them to be assigned this right (GP), and Special:UserRights/Benwing was after a personal request (user page). — Fytcha T | L | C 19:01, 27 February 2022 (UTC)[reply]
We're going to need a full CV, every address where you've lived since achieving adulthood, a statement of net worth, and your credit rating to qualify you for the skills examination. By one hour before the skills exam, you should send me the account numbers and passwords for all your banking, investment, and retirement accounts. For security reasons you should use my Wiktionary e-mail. DCDuring (talk) 20:34, 27 February 2022 (UTC)[reply]
Since it appears it's just given out arbitrarily, I just went ahead and added it.   DoneSURJECTION / T / C / L / 12:25, 28 February 2022 (UTC)[reply]
@Surjection: Thank you! — Fytcha T | L | C 12:30, 28 February 2022 (UTC)[reply]

Avogadro's number is given as a proper noun but the other two as uncountable common nouns. The fine-structure constant is given as a common noun but the synonymous Sommerfeld's constant is given as a proper noun. Planck's constant is given as a common noun but Einstein's constant as a proper noun. Which is it? IMO, these are proper nouns because they name unique entities. Thoughts? fine-structure constant is a bit tricky because it's not capitalized, but IMO proper noun vs. common noun is a semantic issue, not a spelling issue. Monday and Tuesday as (rightly IMO) given as common nouns despite the capital letter (whereas September and December are (wrongly IMO) given as proper nouns). Benwing2 (talk) 08:23, 28 February 2022 (UTC)[reply]

@Benwing2: Thank you. As Wiktionary knows, the misunderstanding of "uncountable" (most people think it just means "the plural is rarely used") is my personal bugbear. These are wrong and I will fix them. Equinox 19:12, 28 February 2022 (UTC)[reply]
Just a comment: "Most people" (or delightfully in Italian uncountable molta gente) cannot distinguish grammatical number from arithmetic number, claiming for example that if you say "Ten stone eight", it shows that "stone" has two plural forms, one identical to the singular. Imaginatorium (talk) 07:15, 2 March 2022 (UTC)[reply]
Question: we've discussed before that it may be suboptimal that the way we indicate an English word has no plural is to say it's uncountable ({{en-noun|-}})—and I'm not talking about cases where we think there is a plural that's just not attested, which gets "plural not attested", but where we want to say there is no plural—but there's been no change to the template, because they seem to always coincide, so it doesn't seem to be a problem. So, are any of these examples—or if not, are there examples—of common nouns where "uncountable" =/= "has no plural"? (Are there uncountable pluralia tantum?) If there are actual examples, maybe we could change the template; otherwise we're stuck at this just theoretically being an issue. - -sche (discuss) 04:02, 3 March 2022 (UTC)[reply]

Related thought: Speaking of most people and of proper nouns, in my experience, most people will not admit that the nature of a proper noun is that it is a name for a unique thing within an implicit context (an implicit cognitive framing). This is why "Building 9" is a proper noun within the implicit context of a particular campus or industrial site. Within that context, it is the only Building 9 that exists. Not that there are no other "Building 9"s in the universe (i.e., if the Coca-Cola plant on Maple Road has one, then the Pepsi plant on Pine Avenue might also have its homologue, named with the same proper name/proper noun), and not that people don't realize that that is possible (they certainly do; when they speak of "the President" (capped) or "the Cafeteria" (capped), they know it is so); but within the implicit context (implicit cognitive framing) of "the campus we are conversing in", the named thing is understood as implicitly unique. Quercus solaris (talk) 04:16, 3 March 2022 (UTC) Corollary: proper nouns naming things that are cosmically unique (i.e., the implicit context for the uniqueness is the entire known universe) are the names of things that have no plural form because there is only one of them. For example, for monotheists, Big Daddy in the Sky is Exemplar Numero Uno. But Avogadro's constant (the Avogadro constant) is another. Quercus solaris (talk) 04:20, 3 March 2022 (UTC)[reply]

Coming soon

edit

- Johanna Strodt (WMDE) 12:38, 28 February 2022 (UTC)[reply]

I've recently removed the Spanish templates from the drop-down menu as the corresponding pages have been deleted 12 (Template:es-new intro, Template:es-new-verb) or 5 years (Template:es-new-noun) ago respectively, or have never existed in the first place (Template:es-new-adj). The other templates seem to exist for the most part, except for one that I had to restore. I also had to do a bit of cleanup in those templates. However, I conclude from this that these templates are used relatively seldom. Thus, the question is: Do we want to keep them? If we decide to keep them, I'd be strongly in favor of creating a new namespace for them; even sysops are regularly confused and delete them. That is not without reason, they are not templates in the sense of what we on Wiktionary usually mean by that word: They are not something that we include in our pages using {{}}. I'd suggest naming the new namespace PageStarter: or Model:. — Fytcha T | L | C 15:19, 28 February 2022 (UTC)[reply]

  Input needed
This discussion needs further input in order to be successfully closed. Please take a look!
I don't think we need these. —Justin (koavf)TCM 17:21, 19 March 2022 (UTC)[reply]

Edit request

edit

To be added to daylight robbery:

  1. Used other than figuratively or idiomatically: see daylight,‎ robbery.
    • 2006, Sarah N. Welling, “Stop and Frisk”, in Paul Finkelman, editor, Encyclopedia of American Civil Liberties, volume 3, Routledge, published 2013, →ISBN, page 1570, column 1:
      A daylight robbery of a store carries with it a risk that the store clerk will be present and a confrontation will ensue, so grounds to fear a daylight robbery give rise to a reasonable inference that the defendant is armed and dangerous.

70.172.194.25 19:08, 28 February 2022 (UTC)[reply]

I hate &lits but it's legitimate and I've done it! Smile. Equinox 19:11, 28 February 2022 (UTC)[reply]

Remember to Participate in the UCoC Conversations and Ratification Vote!

edit
You can find this message translated into additional languages on Meta-wiki.

Hello everyone, A vote in SecurePoll from 7 to 21 March 2022 is scheduled as part of the ratification process for the Universal Code of Conduct (UCoC) Enforcement guidelines. Eligible voters are invited to answer a poll question and share comments. Read voter information and eligibility details. During the poll, voters will be asked if they support the enforcement of the Universal Code of Conduct based on the proposed guidelines. The Universal Code of Conduct (UCoC) provides a baseline of acceptable behavior for the entire movement. The revised enforcement guidelines were published 24 January 2022 as a proposed way to apply the policy across the movement. A Wikimedia Foundation Board statement calls for a ratification process where eligible voters will have an opportunity to support or oppose the adoption of the UCoC Enforcement guidelines in a vote. Wikimedians are invited to translate and share important information. For more information about the UCoC, please see the project page and frequently asked questions on Meta-wiki.

The m:Movement Strategy and GovernanceMovement Strategy and Governance (MSG) team is on 4 March 2022 at 15:00 UTC. Please sign-up for this conversation hour to interact with the project team and the drafting committee about the updated enforcement guidelines and the ratification process. See the Conversation Hour summaries for notes from 4 February 2022 and 25 February 2022.

You can comment on Meta-wiki talk pages in any language. You may also contact either team by email: msg wikimedia.org or ucocproject wikimedia.org

Sincerely, Movement Strategy and Governance
Wikimedia Foundation --Mervat (WMF) (talk) 19:42, 28 February 2022 (UTC)[reply]

Bad behaviour of "User:Equinox" and redirects

edit

I was accused by User:Equinox for creating redirects needed for interwiki linking with some wiktionaries (particularly the Swedish one). "User:Equinox" deleted those redirects several times, and is unnecessarily rude (given that I obviously had not violated any policy). Most notably, besides considerably delayed answers, "User:Equinox" has not yet pointed any policy that would prohibit such redirects. Page när katten är borta, dansar råttorna på bordet links to Swedish wiktionary, but the link back to English wiktionary vanished due to "User:Equinox"'s deletionism. Thoughts about the issue are welcome. Taylor 49 (talk) 20:36, 1 March 2022 (UTC)[reply]

Without getting into anyone's rudeness or incivility in particular, please let's all deescalate and make sure that we're being respectful. @Taylor 49: have you read d:Wikidata:Wiktionary and the unique role of interwiki links for Wiktionaries? —Justin (koavf)TCM 00:05, 2 March 2022 (UTC)[reply]
"considerably delayed answers"? --Geographyinitiative (talk) 00:45, 2 March 2022 (UTC)[reply]
@Taylor 49: I agree with User:Equinox's decision to delete these dummy redirects. I don't want en.wikt to be flooded with these redirects only because other Wiktionaries lemmatize incorrectly; compare also Leiche im Keller and de:eine Leiche im Keller haben (Talk:Leiche_im_Keller). — Fytcha T | L | C 12:50, 2 March 2022 (UTC)[reply]
Aren't manual interwiki links still valid for cases where the interwiki points to an entry with a different title? Why not use that feature? - TheDaveRoss 16:15, 2 March 2022 (UTC)[reply]
Does that apply to Wiktionary, which is handled as a special case? I know that Wiktionary's redirects are not handled by Wikidata, as they are for all other projects, and I understand that is because they link to the exact same page name on other Wikis. Theknightwho (talk) 17:28, 2 March 2022 (UTC)[reply]
Yes, you can add manual interwiki links like so: [4]. 70.172.194.25 17:32, 2 March 2022 (UTC)[reply]
Title "När katten är borta, dansar råttorna på bordet." is NOT wrong lemmatization of "när katten är borta, dansar råttorna på bordet". In both Swedish and English the base rule is to use capital letter at beginning of a sentence and a dot at the end. Note that the inner comma that did NOT get sacrificed as opposed to the final dot. Using explicit interwiki links is theoretically possible, but is this trick advised, and will they work forever? The risk is big that someone will remove them too.
> unique role of interwiki links for Wiktionaries
This case is barely touched on the pages linked. Ideally, Wikidata should connect the pages "När katten är borta, dansar råttorna på bordet.", "När katten är borta dansar råttorna på bordet.", "när katten är borta, dansar råttorna på bordet", "när katten är borta dansar råttorna på bordet". Until then, an other solution is needed. Taylor 49 (talk) 19:16, 2 March 2022 (UTC)[reply]
@Taylor 49: (Disclaimer: I don't speak Swedish) A quick search on Google Book yields this result (quote: ” [] Ja, när katten är borta dansar råttorna på bordet ...”) on the first search page, which contains neither a capitalized När nor a full stop, which goes to show that this proverb can be part of bigger sentences and is thus incorrectly lemmatized at sv.wikt. — Fytcha T | L | C 19:57, 2 March 2022 (UTC)[reply]
Another one: »För du förstår, när katten är borta dansar råttorna på bordet!« — Fytcha T | L | C 20:01, 2 March 2022 (UTC)[reply]
"Ja" means "Yes", thus an interjection that more or less never can be a clause element. In the lower example, the "För du förstår" (means approximately "FYI") part is separated by a comma, and is a separate clause only loosely attached to the proverb. The upper example terminates the proverb by "!" and the lower one by "...". The variation between "." and "!" and "..." is equivalent to variation between inner comma and no inner comma. Lemmatization to "när katten är borta, dansar råttorna på bordet" is inconsequent at least. Lemmatization in this wiki is one question (that I do not want to push just now), possibility to create redirects (until there is a better solution) without having to feel like a vandal the other one that I would like to get resolved in this thread. Taylor 49 (talk) 21:29, 2 March 2022 (UTC)[reply]
Wikidata won't connect the pages - Wiktionary is explicitly excluded and has its own system. Theknightwho (talk) 06:17, 3 March 2022 (UTC)[reply]
To elaborate, as I wrote above, see d:Wikidata:Wiktionary. —Justin (koavf)TCM 06:36, 3 March 2022 (UTC)[reply]