e-mail is great
e-mail is a great form of communication.
For starters, it is asynchronous, which means that one can choose to deal with it when desired and not to be victim of unwanted interruptions (unlike phone calls).
e-mails can be of any length, from just one word in the subject line and an empty body (unlike a letter), up to several pages of arguments and ideas. They can be accompanied by attached documents and richly formatted1 (unlike the limits of SMS or some «social media» platforms).
e-mails have some kind of permanence, since they can be stored by the sender or the receiver as they want (unlike most modern «social media»).
e-mail is decentralized, in the sense that anybody can have their own server and send and receive e-mails without going through a central server or authority (unlike most of the means of electronic communications we use today, be it SMS, WhatsApp, etc.). e-mail can be considered as the first service in the Fediverse.
e-mail also allows to personalize how one wants to appear on the internet, since an e-mail address can show the belonging to a group, a company, just one's own name or even a chosen nickname.
Of course, most people have learned to hate e-mail with an attitude which has become rather hipster.
- We get lots of e-mail (but much less than what we get on Twitter, Instagram, WhatsApp, etc.)
- Lots of e-mail is spam (not unlike the ads one gets on all other media)
- Long e-mail threads with many people in CC are close to spam (not unlike Twitter threads)
All these issues can be solved by using a good e-mail client or MUA which allows us to filter and archive messages to suit our preferences (unlike the algorithms implemented by Twitter, Facebook and even Gmail, which choose what one can see). We can also have a good electronic hygiene by unsubscribing from useless newsletters, implementing «Inbox zero» (if that works for us), shutting off e-mail notifications and using different e-mail addresses for different purposes2.
A monopolistic view of e-mail
Many have predicted, wanted and even worked towards the end of e-mail. You may remember Wave, a Google project presented as the perfect replacement for e-mail. It was discontinued after a very short life, then the Apache Foundation tried to resuscitate it, but it was finally abandoned.
Google was criticized for trying to implement this e-mail substitute as a way to capture all those electronic communications which where happening between people not using Google services. Indeed, despite the popularity of Gmail, both as an e-mail provider (with lots of storage for free) and as an e-mail client (with a very good anti-spam filter, and other useful features to deal with lots of e-mail), this service uses the standard e-mail protocol and therefore, it is difficult to have a monopoly on it.
Why would Google want to have all the e-mail on the internet go through their servers? Because being able to parse and inspect all these messages allows to extract lots of information which is useful to train AI models which in turn can be used to understand what people think, want and need3. Then one can select which ads to show them and how to keep them «engaged» in their platform.
I am picking on Google because it was the first player on this game, but Microsoft is doing the same with the Office365 platform, where Outlook is the Gmail equivalent.
The network effect and the silo
Just providing a complete platform with an online office suite, e-mail (server, storage and client) does not seem enough to force all users to stay permanently on the platform. What should Google and Microsoft do to get people to do all their electronic communications on their platforms?
Until now, they have succeeded in capturing lots of users: nearly everybody has a Google account because of Android and YouTube; and an increasing number of people, at least in the enterprise and academic sectors, are Office365 users because of the «need» of collaborative distributed editing (think SharePoint and the move of Microsoft Office itself to the cloud).
If a large number of business and universities have moved to cloud services like G-suite and Office365, this means that many e-mail messages have a receiver on these platforms. What can these platforms do to get more users? The easiest thing is to make communications with users outside these platforms difficult. For instance, redirect to spam all e-mails coming from domains which are not linked to the platforms. Google will accept messages from Microsoft servers and vice-versa, but will likely flag as spam messages coming from other sources.
Spam is your friend
Fewer and fewer people maintain their own e-mail servers. It is not a matter of lack of technical skills. Easy solutions like Mail-in-a-Box exist, so nearly anybody can install their own server. The problem is that these servers are often blacklisted and the messages they sent are trashed by the receivers4. This makes many sysadmins choose e-mail services hosted by cloud providers, like Gandi, OVH, or Ionos in Europe.
Unfortunately, some of those, like OVH, have chosen to replace the technology they used which was based on Free Software (postfix or sendmail, etc.) by Exchange. Are they loosing their technical skills and prefer to use a commercial product with commercial support? Or are they choosing a solution which is less likely to be flagged as spam? After all, a Russian spammer will use free software instead of buying a Microsoft license.
Regardless, these providers will have a hard time competing with complete integrated solutions as those from Google or Microsoft5.
Break the standards
Now that everybody uses Google or Microsoft servers, there is still one thing that bothers these companies. As e-mail uses a set of open protocols (SMTP for sending and IMAP or POP for getting the messages from the server), there is still the possibility to access Google and Microsoft servers with clients running locally on the users computers. It is therefore impossible for the provider to display ads or monitor how the user interacts with e-mail. This is a loss of potential revenue.
Both Google and Microsoft have announced that they will be shutting down the standard user authentication for SMTP and IMAP. The only way to send and receive e-mail will be to implement an authentication which needs to register the client application with a secret token which may change periodically. That means that only applications for which Google and Microsoft will have given their blessing will be able to communicate with Gmail and Outlook accounts.
Of course, getting a Free Software e-mail client registered with Google or Microsoft should be possible, but their terms of service forbid making public the registration token, which means that it can not be embedded in the free software. Thunderbird and KMail seem to have gotten an exception, but for how long?
The end of e-mail?
So that is a nifty theory about evil corporations working to destroy our dear internet. I am probably wrong and even a little paranoid here, but the fact is that it is increasingly difficult to self-host an e-mail server and use local e-mail clients.
If you have a different point of view or any ideas on how to preserve e-mail as a decentralized and open means of electronic communication, please get in touch. Contact information is available at the bottom of the page.
The sad thing is that most people don't care about these issues. Most people think that the internet is the web. From those, the majority things that the internet is Google or Facebook.
Even most people who are into politics don't care or don't understand. Left-wing anti-capitalists are on Facebook and use Gmail and then they complain when they are censored. Conservative patriotic nationalists in Europe live in Microsoft environments to write and discuss about sovereignty. What a joke!
Although there are drawbacks to that: HTML e-mail is usually twice the volume of a plain text one and it is often used for phishing attacks.
Personal and professional, of course, but also having a specific address to give to any commercial entity pretending to need our e-mail contact.
"I think of Google as a set of overlapping things. It's a consumer platform, consumer phenomenon of which search is its fundamental activity, but there are many other things you can do than search… I think of Google as an advertising company who services the broader advertising industry in the ways that you know." Eric Schmidt
If Gmail has the best anti-spam filter, how come there are so may false positives?
Again, free software solutions exist, like those based on Nextcloud and the associated ecosystem of applications, but most business and universities choose to go with Google and Microsoft instead of fostering in-house skills or supporting local companies which can provide maintenance for these solutions.
As every 10 years, IGARSS will take place in Hawaii in 2020. This time it won’t be in Honolulu as in 2000 and 2010, but in Waikoloa, in the “Big Island”.
I went to Honolulu for the 2 previous events, and it would be nice to go there again, visit another place and meet with colleagues and friends that I don’t see often out of this kind of gathering.
But the issue is that, without falling victim of solastalgia, I find it difficult to justify flying for about 50h for a conference. As most of my colleagues, I have done it plenty of times. Thanks to IGARSS and since 1998, I have been to a lot of interesting places and met brilliant people from the remote sensing community. But I find it ironic that people observing our planet from space and measuring how climate and biodiversity are going astray wouldn’t change their behaviour and reduce their impact.
Every IGARSS has a particular theme. Here are the ones for the previous 6:
- Global-Environment Observation and Disaster Mitigation
- Observing, Understanding And Forecasting The Dynamics Of Our Planet
- International Cooperation for Global Awareness
- Advancing the understanding of our living planet
- Understanding the Earth for a safer world
- Energy and our changing planet
In 2020, the theme is Remote Sensing: Global Perspectives for Local Solutions.
One can see that the environment, our living planet, energy etc. are some of the focus of the community who attends these events. This is why the choice of a place which for most of the attendees will need between 12 and 50 hours of travel by plane is questionable. Some may try to get there by other means, but Hawaii is a 6h flight (one way) for everybody.
Let’s do the math. If we assume greenhouse gas emissions of 1/4 tonne CO2 equivalent per hour flying, this is between 3 and 12 tonnes per person (knowing that In order to stop climate change, 0.6 tonnes is the maximum amount of CO2 that can be generated by a single person in a year). Let’s assume an average of 7. IGARSS 2019 in Yokohama had 2600. We can imagine that at least the same amount of people would want to go to Hawaii, although one could argue that Hawaii may attract more people. The calculator says that 18200 tonnes of C02 would be emitted just by flying to IGARSS, that is the maximum amount that 30,000 people can produce in a year if we want to stop climate change.
Of course, this back-of-the-envelope calculation may not be very accurate, but I think that the orders of magnitude are good.
I can only speak for myself, but I don’t think that my contribution to Earth observation that could potentially be used to mitigate climate change and biodiversity degradation is worth the emissions.
Meeting the remote sensing community is useful to advance science and technology, but other ways can be used. The GRSS society has started a new initiative, as announced by its president:
[…] in 2020 we are starting three regional conferences held in locations far from the IGARSS flagship conference. The idea is to help communities that cannot come to IGARSS because of distance, but also because of economic issues or other barriers, and organise dedicated events.
Let’s hope that these events replace the trips to distant venues and do not add up with them!
Please allow me to introduce a couple of ideas which should help improve the user experience on the GEE platform. I know that Google, a company of wealth and taste, has an impressive record on providing services with outstanding features. They have the best search engine, the best web mail application and the best web browser1.
But these services and tools are targeted to non expert users. With GEE, Google is addressing a complete different audience: scientists, or I should say Scientists. These are clever people with PhD's! Therefore, in order to keep them satisfied Google will have to make an extra effort. One could think that scientists can easily be fooled because, for instance, they agree with giving away to private companies the results of research funded with tax payer money2. Or because they accept to be evaluated by how many times their tweets are liked3. Seeing scientists like this would be a mistake. They are very demanding users who only want to use the best tools4.
But Google has the technology needed to attract this smarter-than-the average users. Here go some ideas which could make GEE the best platform for producing impactful research using remote sensing data.
I think that it would be nice to introduce some literate programming facilities in the code editor. This could be similar to what can be done with Emacs org-mode's Babel or Knitr for the R programming language. This would allow to directly write scientific papers on the GEE editor and keep together notes, formulas, code and charts. Of course, exporting to Google Docs would be also very useful so that results can be integrated in slides or spreadsheets.
The possibility of citing bibliographic references should also be integrated in the editor. I suppose that a Google Scholar search function would not be difficult to add. Oh, yes, and Google Books also, by the way. Actually, using the same technology Google uses to insert advertisements in search results or in Gmail, it would be possible to automatically suggest references based on what the user is writing.
In these suggestions, papers produced using GEE could come first, since they are better. Papers written by people in the author's Google contacts list could also be promoted: good friends cite friends and the content of e-mails should help the algorithms determine if they are collaborators or competitors. But let's trust Google to find the algorithm which will make the best suggestions.
Many software development environments have code completion. In the case of GEE the technology5 would be much more powerful since all the code written by all scientists could be used to make suggestions. The same technology could be used to suggest completions for the text of the papers. We all know how boring is writing again and again the same "introduction" and "materials and methods" sections. Google algorithms could introduce some randomness and even compute a plagiarism score to help us make sure that we comply with the scientific literature standards. Of course, the "Conclusions" section could be automatically produced from the results using Google's AI technology.
It would also be nice to have some kind of warning if the user was designing an experiment or a processing chain that somebody else had already done. So some kind of message like "this has already been done" together with the link to the corresponding paper would be great. Also, automatic checking for patent infringement would be useful. Again, Google has all we need. In this case, the warning message could be "I can't let you do that Dave".
Massive peer review
The executable paper written using what has been described above could be made available through Google Plus as a pre-print. Actually, nobody would call that a "pre-print", but rather a paper in beta. All people in the author's circles could be able to comment on it and, most importantly, give a +1 as a warrant of scientific quality. This approach could quickly be replaced by a more reliable one. Using deep learning (of course, what else?) applied to the training data base freely generated by GEE early adopters, Google could propose an unbiased system for paper review which would be much faster than the traditional peer review approach. The h-index should be abandoned and replaced by the paper-rank metric.
Thanks to GEE, doing remote sensing based science will become much cheaper. Universities and research centres won't need to buy expensive computers anymore. Instead, just one Chromebook per person will be enough. Actually, not even offices will be needed, since WiFi is free at Starbucks. Lab meetings can be cheaply replaced by Google Hangouts.
However, scientists will still need some funding, since they can't live on alphaet soup and coffee is still not free at Starbucks. Google has a grant programme for scientists, but this is somewhat old school: real people have to review proposals and even worse, scientists have to spend time writing them.
Again, Google has the technology to help here: "AdSense is a free, simple way to earn money by placing ads on your website." Scientists who would allow ads on their papers, could make some revenue.
I know that in this post I have given away many ideas which could be used to get venture capital for a start-up which could make lots of money, but this would be really unfair, because all this would not be possible without:
- Google Earth Engine
- Google Chrome
- Google Docs
- Google Scholar
- Google Books
- Google Patents
- Google Plus
- Google Starbucks
- Google Hangouts
- Google's Youtube
Don't forget that the mission statement of GEE is "developing and sharing new digital mapping technology to save the world". And anyway, section 4.3 of GEE Terms of Service says6:
Customer Feedback. If Customer provides Google Feedback about the Services, then Google may use that information without obligation to Customer, and Customer hereby irrevocably assigns to Google all right, title, and interest in that Feedback.
They used to have the best RSS reader, but they killed it http://chromespot.com/2013/06/06/google-reader-shutting-down/.
More than for any other post in this blog, the usual disclaimer applies here.
Let's face it: what Google has implemented with the Earth Engine is very appealing since it is the first solution for Earth Observation data exploitation which concentrates all the open access EO data, the computing resources and the processing algorithms. This is the Remote Sensing Scientist dream. Or is it?
Talks and posters at ESA Living Planet Symposium this week show that an increasing number of people are using GEE to do science. One of the reasons put forward is the possibility of sharing the scripts, so that other people can reproduce the results. This is, in my opinion, an incorrect statement. Let's have a look at a definition of reproducible research:
An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. —D. Donoho
One important term here is complete. When you use GEE, or any other non free software like Matlab, even if you share your scripts, the core of the algorithms you are using is just a black box which can't be inspected. Actually, the case of GEE is even worse than the one of non free software running locally. Google could change the implementation of the algorithms and your scripts would yield different results without you being able to identify why. Do you remember the "Climategate"? One of the main conclusions was:
… the reports called on the scientists to avoid any such allegations in the future by taking steps to regain public confidence in their work, for example by opening up access to their supporting data, processing methods and software, and by promptly honouring freedom of information requests.
During one of my presentations at the Living Planet Symposium I decided to warn my fellow remote sensers about the issues with GEE and I put a slide with a provocative title. The room was packed with more than 200 people and somebody tweeted this:
So it seems I was able to get some attention, but a 2-minute slide summarised in a 140 character tweet is not the best medium to start this discussion.
As I said during my presentation, I fully understand why scientists are migrating towards GEE and I don't blame them. Actually, there is nobody to blame here. Not even Google. But in the same way that, after many years of scientists using non free software and publishing in non open access journals, we should take a step back and reflect together about how we want to do Earth Observation Science in a sustainable (which is the perenniality of GEE?) and really open way.
What I was suggesting in the 3 last bullet points in my slide (which don't appear in the tweeted picture1) is that we should ask ESA, the European Commission and our national agencies to join efforts to implement the infrastructure where:
- all data is available;
- and every scientist can log in and build and share libre software for doing science.
And this is much cheaper than launching a satellite.
This is not to criticise what the agencies are doing. ESA's Thematic Exploitation Platforms are a good start. CNES is developing PEPS and Theia which together are a very nice step forward. But I think that a joint effort driven by users' needs coming from the EO Science community would help. So let's speak up and proceed in a constructive way.
In a previous post, we saw when and why feature normalisation before training a supervised classifier may needed. The main point of the post was about the fact that distance based classifiers need to operate on features which have similar dynamic ranges.
One thing we didn't discuss is why often things work better when the normalisation is done towards the [0-1] or the [-1,1] intervals rather than, for instance, the [0-100] range.
If you have used the SVM classifier, even with a linear kernel, using standardisation yields faster learning times and improved classification accuracy. Why is this the case?
SVM training usually uses optimisers (solvers) which are complex machines. Therefore, there may be several reasons for this behaviour, but one of them is the representation of floating point numbers in the computer. This representation is defined by an IEEE standard. In a nutshell, this representation gives different precisions to different ranges of values: the closer numbers are to 0, the higher the precision with which they are represented.
This plot means that the error with which a value is represented in the computer increases exponentially as a function of the value itself. It is therefore easy to understand that this representation error will make things difficult for optimisers even in the case where the cost function is quadratic, smooth and well conditioned.
Therefore, in order to make sure that the optimisation procedure benefits from accurate computations, rescaling feature values close to zero is useful.
Classification algorithms which do not optimise a cost function can also benefit from this rescaling. KNN classifiers, Self Organising Maps and many other algorithms using distances to select and sort will produce more accurate results.
On the other hand, if you use a tree-based classifier with a Gini purity index (like the Random Forest canonical implementation), rescaling is not needed, since the purity is computed over fractions which are already small numbers.
However, bear in mind that some tree-based classifiers use entropy (information gain) or other purity measures, like variance reduction which involve computations which may benefit from the increased precision obtained by the rescaling.
As a rule of thumb, in case of doubt, rescaling data will do no harm.
If you use the ORFEO Toolbox (and why wouldn't you?), the
TrainImagesClassifier and the
ImageClassifier applications have
the option to provide a statistics file with the mean and standard
deviation of the features so they samples can be standardised. This
statistics file can be produced from your feature image by the
ComputeImagesStatistics application. You can therefore easily
compare the results with and without rescaling and decide what works
best in your case.