Wikisource talk:Text quality

Text quality

edit

I have tried to make a translation of Catons work on text quality, see Wikisource:Text quality. I have taken the liberty to freely formulate what I believe is the general meaning of the content, but as my French is very poor I may have made some mistakes or missed something out.
The general idea has been taken from Wikibooks, and I think we can use it here as well with good results - Caton has already started to implement the system at the French texts. Being open about the progress and reliability of the individual texts can make this project more trustworthy in general.
To ensure that text pages with a progress level of 75% (proofread and corrected by one user) and 100% (proofread and corrected by several users) stays in good shape (i.e. to prevent vandalism, ref. Wikisource:Scriptorium#vandalism) it may be a good idea to protect pages that has reached these levels. Any comments? Christian S 07:30, 1 Mar 2005 (UTC)

In principle this is a positive initiative. I even, with regret, admit that many pages will need to be protected. One interesting question will be whose work is reliable enough to have credit for the proofreading? The dedicated and trusted participants are still a very small group with limited available time. Much of what needs to be proofread involves entire books.
It would also be good to develop a practical system for encouraging systematic annotations and translations for protected pages, without getting them mixed in with the random rants that can often be found on talk pages. Eclecticology 10:13, 1 Mar 2005 (UTC)

The question of whose proofreading abilities we trust and don't trust is not only interesting, but also very difficult to answer. I would really hate to draw that line, and I don't think it's a good idea to try drawing it. The best answer I can give is this: If several independent users (no sockpuppetry allowed here...) having proofread a given text agrees that the text is authoritative and correct, then there is a good chance that the text actually is correct and authoritative. This question is most relevant to the protection issue, i.e. when do we have a sufficiently reliable version of a given text to merit protection. Maybe this line should be drawn at two independent proofreaders, or, in cases of pages that are often vandalised, one regular and trusted user. But, as you say, the number of dedicated and trusted users is still limited, and as the number of texts that these users contribute and proofread continues to grow, these contributors will need to spend more time checking a growing number of stable pages for edits, unless the page is protected after the first (trusted) proofreading, as the second independent proofreading may not happen in any near future (this is especially true when the original source is rare). That means a lot of time-consuming checking up for large contributors if they for some reason have been unable to track the recent changes for just a few days. This workload could be an argument to allow "proofread once"-protections. Texts, that have not been proofread yet should not be protected, unless they suffer from repeated vandalism. The protection system, if agreed on in some form, would probably need a place to advertice "protection candidates" more or less in the same way as we have the proposed deletions page for deletion candidates. I would like to see the protection issue discussed some more, and some consensus to build up about when and how, before we just start protecting pages.

The protection issue does not, however, affect the idea about the progress/quality icons and the suggested template for meta information. This system could be implemented right away, and i definitely support implementing the system. Christian S 13:47, 1 Mar 2005 (UTC)

I support the idea of protecting texts that have reached the 3rd and 4th level. I think that protecting a text should be relatively easy. Most texts here are being copy/pasted from other sites, rather than scanned. In that case, "proofreading" is not really what we want. For example, if I copy/paste a text from a reliable site (Gutenberg, Gallica), I can reasonably assume that OCR errors have been fixed. In that case, it makes more sense to freeze the page immediately, without taking the time to read it. Reading whole books is too time-consuming, and it would slow down the process. Not protecting a text for too long exposes it not only to vandalism, but to "corrections" made by ignorant people. And those corrections are sometimes hard to detect (check here for an example of someone fixing a spelling error in a text, while at the same time changing the meaning of another sentence)
Concerning the question of who we trust to do that, I propose a liberal approach, where everyone is allowed to freeze texts. For that, I guess we should have a page where freeze requests are centralized. If somebody contributes a text and claims it contains no error, then this contributor would make a freeze request on a dedicated page. A sysop could then freeze the text after having had a quick look at it (no proofreading is necessary at this point). The name of the person who made the request would be mentioned on the talk page. In addition, the texts frozen by a given user could be listed on his/her user page.
There would be another page, where typos found in frozen texts are centralized. So if a reader finds a typo in a frozen text, this user would submit a typo request, where the proposed correction is described. Here we would have a way to know if the person who decided to freeze the page was right to do so. We could even measure how reliable a contributor is, and decide to deny him further freeze requests, if it turns out he is not reliable. Objective measures of reliability could be the number of typos found per kilobyte contributed, or the number of typos per page. --ThomasV 14:30, 1 Mar 2005 (UTC)

I agree that 3rd and 4th level texts should be frozen, and that freezing should be reasonably easy. Proofreading is, IMHO, the prefered thing to do, especially if the text is a new scan (as most of my own contributions has been), but alas, it is very time-consuming. If a text is directly copy/pasted from a reliable site I guess we can assume that the text has already been proofread by the editors of that site (if not, then the site is not reliable), and a full proofreading may not be nessesary in these cases. However, the contributor should check that no part of the text has been lost in the copy/paste process, and the text should be properly (wiki)formatted before a page is frozen (Gutenberg texts often needs at least some degree of formatting).
The liberal approach is fine with me - I tend to trust those who make serious contributions, also when they are newbies. A centralized place for freeze requests is definitely nessesary, something like Wikisource:Freeze requests. A central page for typo requests may be more difficult to maintain - it will be nessesary to elaborately advertise that errors should be reported at the central error request page rather than at the talk page of the text. Perhaps a template at the talk page of frozen texts that informs the reader/user about where to report errors could do the job - adding such a template to the talkpage of the text could be part of the freezing procedure. Keeping track of error reports is a good way to meassure the reliability of contributors (most of them probably are reliable). However, before we blame a contributor for being unreliable, it should be checked whether the texts added by this user has been copy/pasted from another site, as it may be that site, and not the contributor, that is unreliable. In that case we should instead discourage copy/pasting from the unreliable site. Christian S 15:57, 1 Mar 2005 (UTC)

I don't think that we have any major differences in this. It's just a question of working out the details and refining them. As for the icons, I still find them small, and using graphics makes them fairly inflexible. There is a series of characters in Unicode's geometric shapes that could serve the same purpose: ● ◕ ◑ ◔ ○ . The template system could still allow these to be in different colours, and contain a suitable font size.
The freeze requests page seems workable. The freeze amounts to a page protection which admins can already do. (Thomas, I've often wondered why you've never requested to become an admin. I still disagree strongly with you on a number of issues, but it would be inappropriate if I let that stand in the way. It's easy to see who does the work.) Admins should feel free to protect a page within agreed parameters without seeking to open a discussion about it. If it becomes a problem or there are too many complaints the procedure can be reviewed then. A Category:Freeze requested could be easier to implement than a request page that would have its own need for maintenance.
I'm afraid that a "fix typos" page with a list of things to check and fix won't work. The requests for cleanup pages in various projects end up being lists of things that nobody wants to do. I'm inclined to support the use of a sub-page for this, and have something like [[Article title/Typos]] . Similarly a sub-page system could be used for annotations, translations, and other meta material. Perhaps we could begin to reserve the use of the "/" character for such purposes. I confess that I find the concept of typos per kilobyte amusing. If we've got time to identify and count them, we've got time to fix them. :-)
I think that being liberal with our trust is a workable approach to get us started. That being said, no text should be able to get past the 50% level unless the source is indicated. The cut and paste technique is worth something in the evaluation, but we also need to be aware of the limitations of those sites, including PG where being limited by what's available in ASCII characters imposes limitations like suppressing all italics. In the long run it may be necessary to have both graphic and OCR versions of a text. The graphic version would be available whenever the accuracy of a text is in question. Eclecticology 05:01, 2 Mar 2005 (UTC)
The reason why I did not request adminship so far is not boycott, but simply that I believe the current situation (i.e. no language subdomains) is temporary, so I did not bother. However, I guess I could be admin on both wikisource.org and fr.wikisource.org when it is created. So let me apply for adminship here.
Maybe you could actually post these unicode characters in colors, rather than just mentioning the fact that they could be in colors. For now, I prefer the small squares proposed by Caton, at least with regard to aesthetics. But using unicode characters would have the advantage of making lighter pages.
IMHO, a "fix typo" page will work. And I think that people who contributed pages will want to fix typos. But even if I am wrong, in the sense that nobody will want to do it, as you suggest, I do not understand why using a sub-pages system would work better. It seems to me that there is no logical link between people wanting or not wanting to fix typos, on the one side, and the choice between a subpage system or a single "fix typos" page, on the other.
When I proposed to count the number of typos in the texts contributed by a given person, I obviously was referring to the typos reported on that kind of page. I must have been very unclear, if you understood that I was suggesting to count them without fixing them.
--ThomasV 21:41, 2 Mar 2005 (UTC)

Which icons to use is not of much importance to me, as long as they are useable at (at least almost) any system, and some of the Unicode geometrical shapes are not. The 2nd and 4th symbol suggested by Ec. are just plain sqares when viewed at my PC, while the other three looks fine (but they should be downsized a bit, IMO). I believe this has something to do with my system and not the caracters, but I am probably not the only one who has this problem - and I find it very important that the symbols used for this purpose are compatible with as many PC systems as possible. The graphical solution is compatible with most systems. Creating templates like {{75%}} which only contains the icon image could make the images (or unicode graphics with colours) easier to use.

Whether or not a "fix typos" page will work, I actually have no idea, but, as I mentioned earlier, serious advertising will probably be nessesary to make it work. If we instead wants to keep the typo fixing process attached to the individual text pages, I think that a "Typos to be fixed" section at the talk page would be more workable than a subpage.

The suppression of italics (as well as unavailable images/figures from the text) is definitely a drawback to PG, and such errors/omissions should be fixed here before a text is accepted as "genuine" (fixing such errors is easier than a full proofread, and should, IMHO, be a minimum). I agree that a text should not pass the 50% level without indication of the source.

I support that we grant adminship to ThomasV, as he is really doing a great job here. Christian S 06:45, 3 Mar 2005 (UTC)

Je soutiens également la candidature de ThomasV. Caton 15:53, 3 Mar 2005 (UTC)

Return to the project page "Text quality".