## A proposal for a pre-preprint archive

The last days there had been a big public discussion in Germany about scientific publishing. The reason for that discussion was that the former german minister for defense Karl-Theodor zu Guttenberg had included texts into his Ph. D. thesis, which were not by himself, but which were on the other hand not appropriately marked as a citation. The number of these plagiarized texts in his dissertation was so big that this “secretely pasting in” of texts by other authors had to be called systematic. As of march first, 2011 the site GuttenPlag had found plagiarized texts on 324 of 393 pages of his dissertation. After a longer process which included strong protests especially from the academic community, which included the revocation of his degree and a declaration of his doctoral advisor Peter Häberle Mr. Guttenberg finally resigned from his post as a minister for defense. Among others the declaration of his doctoral advisor Peter Häberle contained the sentence:

Die in der Promotionsschrift von Herrn zu Guttenberg entdeckten, mir unvorstellbaren Mängel sind schwerwiegend und nicht akzeptabel.

(translation without guarantee: The – to me unexplicable – found shortcomings in Mr. zu Guttenbergs dissertation are severe and not acceptable.)

The investigations into this case haven’t been finished, in particular the University of Bayreuth (which was in charge for the thesis) seems still to be investigating and e.g. checking wether the plagiarism was intentional fraud. There seems also to be the possibility of juridicial consequences (see e.g. here and here.)

In short the damage that had been done with this affair is quite big and thus a sensible question is “how could that happen and how could one prevent further cases like this?”

I hope that the future will bring more clearness into the case and will give an answer to the first question. Whatever the result will be it is already clear by now, that something went utterly wrong in how the dissertation was made and in how it was supervised.

In the randform post Alexandre Grothendieck writes a letter I had promised to speak about the proposal of a “closed pre-preprint section for the arxive for timestamping works” i.e. a kind of “safe” for work in progress. I think that such a section in an open access preprint archive could in particular be useful in the supervision process of a doctoral thesis and thus eventually help preventing similar cases as the above. It would also have other benefits. Let’s explain this.

The archive Arxiv.org is a twenty year old open access archive for scientific works from math, physics, computer science etc. It hosts preprints, that is scientific works which are eventually appearing as articles in journals as well as scientific works, which are not in journals. Thus for example according to Wikipedia Grigori Perelmans proofs of the socalled geometrization conjecture for which he had won a “mathematical Nobel prize”, namely the fields medal, appeared only on the arxiv. (If you want to get an idea how the geometrization conjecture can contribute to our understanding of space, please read this randform post) . In short – the arxiv contains an extremely important cultural heritage and scientific ressource.

Unfortunately – as I wrote already in the randform post Alexandre Grothendieck writes a letter – the arxiv seems to struggle.

On one hand there is a rather tight budget. The annual budget for 2010 was according to the the article “Cornell Seeks Sustainable arXiv Support” 400.000 US\$. The article says that:

The budget is comparable to the Cornell University Library’s collection budget for physics and astronomy.

Alas the arxiv is a free service to the whole world! The high costs of libraries are mostly due to high journal prices, here a comparision from 2008.

On the other hand apart from big publishers unfortnuately even academic societies seem sometimes to be rather unsupportive when it comes to the idea of open access.

As a consequence one can infer that my proposal of a closed pre-preprint section is definitely not in the center of attention of the arxiv and far from something that could be realized. But nevertheless lets proceed in the explanation.

What is a pre-preprint?

Let’s for simplicity call a work which may be have been peer-reviewed or not a preprint. That is even copies of peer-reviewed articles from journals, which are freely available at the arxiv will be called preprints for simplicity. A pre-preprint is then a preprint in progress, that is it is a scientific work which is not in a publishable state. A doctoral thesis before release is such a case. Like in a Wiki the development of a pre-preprint would be timestamped, that is the various versions during development would be archived. In particular with such a system the development of a work can be traced in retrospective.

What means closed section?

A closed section means that the degree of publicity of the pre-preprint has to be decided by the author(s). That is the author(s) may wish to work alone on the pre-preprint or include some readers or they may wish to make their pre-preprint fully public. One could also think about a comment section where the readers contribution is documented. Thus the readers of a doctoral thesis in progress could for example include the thesis advisors.

What are time-stamps?

I haven’t seen yet that notary publics offer an online repository for notarizing for example pdf’s. Maybe I haven’t searched strongly enough, but I think at least in Germany this doesn’t exist – although it would be a useful notary service. (so if some high-ranking jurist should read this: please propagate that proposal! )

Whatsoever assume that such an online notarizing service would exist and call it time-stamp. That is a time-stamp means that a trustful institution authenticates that you had handed in a certain document at a certain moment in time.

Keeping the different versions of a pre-preprint in a Wiki at a respectable institution like the arxiv can thus be seen as putting a timestamp on each version.

Why should time-stamping be important?

originality

A creative work usually starts with one or a couple of ideas. Also if for some people an idea looks like “no work”, having an idea is work. That is usually before coming up with an idea a longer process took place, which may look to an outsider as if you are lurking lazily around. But in fact the brain is often heavily working and at least for me I can say that I need more proteins and sugar if I am thinking intensively. Some people even “go pregnant with an idea” for years.

However even if coming up with your idea was a painful process – it may not be new. Given the facts that we have an increasing world population and thus an increasing number of creative brains (you may want to read also the rejected article to the journal vectors from 2005) and the fact that humans have a limited brain size it is to be expected that to come up with an “original idea” in the sense of “new” will be less and less probable. Thus the competition for an original idea is getting fiercer. But in e.g. in science and technology an original idea may be quite crucial.

Unfortunately as a result I think one can observe that there is some tendency especially in science that ideas are hastily put down, often without checking about their originality. This is because if an idea is original it would be crazy to delay staking a claim (which is in science usually a publication), because any second someone else might come up with the same idea (It is also often the case that if you talk about work in progress other people might get of course also an idea!) – on the other hand if the idea had not been original then this goes often even undetected. Last but not least the increasing number of publications may in part be due to this.

So I think the scientific community has really to think what to do about this problem and what is meant by “original” and how much this has to mean “new”. This is especially important for coming generations since of course the process of developping an idea is important (also if it would turn out that the idea is not new). It is a major drive in research or other creative work and shouldn’t be suppressed. Developping an idea is like a piece of art. If you are hindered to finish or if someone takes your brush and finishes for you than this may be very unpleasant. Moreover for a good teacher it is of course possible to detect wether a pupil is creative and comes up with a lot of ideas and wether those ideas are “simple to come up with” or not.

This is by the way also a problem in technology and patenting. But I’ll try to speak about that later.

In short I think one can say that for nowadays scientific success it is important to publish a developped idea (“a result”) as fast as possible. Unfortunately this may lead to a certain negligence. A negligence in presentation, clearness, relation to other works etc. This negligence may even be dangerous if for example the societal results haven’t been thought through appropriately. (side remark: I do think that potentially dangerous information should not be made easily accessible and its distribution should in some sense be controllable. Dangerous information may be seen as a kind of weapon. In Germany the hand-out of weapons is restricted and limited and I think this is good. But I know that this is seen differently in different parts of the world. Moreover it is important to inform the public about possible dangers and how to protect themselves. (see also the randform post “On high teach speed“). )

A time-stamp may mitigate the above mentioned problems. Let’s explain this at the example of the concrete suggestion of a pre-preprint for the archiv. In some sense a time-stamp from the arxiv would allow for claim-staking or in other words it allows that your more or less half-bred idea, result is getting documented and that the development of your ideas can be assessed. Putting more emphasis on the development, on the overall creativity and on the standard of presentation rather than on “newness” could give scientific research a new quality.

It would allow a fairer treatment of part-time scientists like researchers with a heavy teaching load, out-of academia research free-lancers etc. The feeling that someone else has more available time, capability and ressources to put fastly something down that is “in the air” may be frustrating and demotivating (see again the randform post “On high teach speed“).

Even if the development of a work is documented and you are almost finished with a work then it is surely still not very pleasant to detect that someone else was faster than you in publishing a result. But if it would then turn out that your work is almost the same or may be even better (like in terms of presentation, broadness of result, usefulness to the public good etc.) and you can prove that it is your original work by presenting its long development history on the arxiv , then you would still be in a better situation then in the current “who-is-first hunt”-situation. This holds especially true for young researchers who have to proove their capabilities of doing independent original research.

One might infer that “taking the speed” out of scientific development would impair progress. But I don’t think that on average this would impair progress. On the contrary a better presentation of scientific results and a better accessability may be more nourishing than speed. Finally presenting a result fast would still be more advantagous then having it decaying in the pre-preprint section.

Another advantage is that for example the development of students can be better evaluated. Last but not least such a system would make life harder for ghost writers.

