How can an independent researcher or small company make it in a world dominated by FAANG corporations? [TOP TECH TRENDS - Panel Discussion]

Karolina Zadroga

Anotator

15 grudnia 2021

We invite you to read the transcript of the panel discussion that took place during the Top Tech Trends 2021 conference. The discussion featured Lukasz Kobylinski, Danijel Korzinek, Patryk Pilarski, Norbert Ryciak and Ryszard Tuora.

In the first part of the panel, the topic of discussion concerned the situation of people working alone, in startups or in larger IT companies, but not as huge as Facebook, Apple and Amazon. Speakers discussed, among other things, how to cope in a competitive environment with big players with almost unlimited resources, as well as how to find your place in the ranks and what you can achieve.

Lukasz Kobylinski: Perhaps some of you are working independently, in a startup, or intend to work for a smaller or larger company. Would it be appropriate, therefore, to look for a job at one of these major companies? I once heard a saying that you have to go where the data is, so maybe you should drop everything and look for a job in the big five top players: Facebook, Amazon, Apple, Netflix and Google? What do you think about this? Is this advantage really so huge that the smallest ones are in a losing position from the start?

Danijel Korzinek: Let me start with an anecdote. I used to apply for funds to start various projects. I noticed that when we apply for some money, appear in front of some committee and tell what kind of things we want to do and what we want to design, the question is practically always asked: why do you even want to do something like this, when Google has already done something like this? This applied to speech recognition, in which Google is a leader. Defending against this type of argument can be stressful and frustrating, but it is doable. However, my experience ended on a positive note. I'd be interested to hear what others have opinions on this.

Norbert Ryciak: How then do you defend yourself against this type of accusation?

Danijel Korzinek: When we approach solving a problem consistently, we are better than Google. I know that a lot of people have made comparisons with it in the context of machine translation, while I don't know if there are other areas where Google is so vociferous with some results that are equally easy to test, because testing your own collection using their service is very easy. Every time we have some well-defined domain, we are able to achieve twice as good a result as Google, which works very well on general applications. He specializes in things that are profitable for him, i.e. internet search, speech recognition, among others, but when it comes to specific applications in speech recognition, we are always able to achieve better results, because they have no interest in dealing with every domain in the world, every language specific within it, such as medical language. We are able to defend this argument that it's not always worthwhile to use things produced somewhere overseas, just to consider whether it can be done better here.

Norbert Ryciak: You said that Google has generic solutions that work well in many cases, but are not specified for a specific problem and domain. Relating this to natural language processing: language itself can be such a niche, e.g. Poland, from Google's point of view, is not some super big country and you can do better algorithms for language processing there than the generic ones that Google creates on a global scale. Google isn't interested in each country individually, it works en masse. English is the leading language, it gives the most data, and for this language it will probably be better, but the solutions for Polish language give a lot of room for improvement, and if we add some specialized fields, such as medical, then in all likelihood these solutions of ours will be better.

Patryk Pilarski: The truth is that big companies don't deal with absolutely everything, so it depends what we deal with. This puts us in a choice situation: do we compete with the giants, or do we fight in our own backyard? If we don't try to become another big online platform, the chances of finding our niche are pretty good.

Danijel Korzinek: It's also worth pointing out that what Google or Facebook is doing, however beautiful it looks in blog posts, is no magic from the point of view of science and knowledge of how to solve these problems. Yes, they are able to bring very smart people on board, but the knowledge put into production there is the same. And just because Google is able to do something doesn't mean that someone else can't also do the same thing or do it better.

Lukasz Kobylinski: The naked eye can also see some differences, for example, the biggest companies have access to billions of images, in addition, often labeled by different users, because with some description, or scraped from the Internet, because Google does it anyway to create its index. They have access to billions of texts that are marked with paws up or down. That is, in short, they have access to data that no one else has, at least not on this scale. Thanks to Android, they can learn from examples of misrecognized commands issued to the phone, so this data set is growing. And every other company has to spend very big money to get similar data. This poses a potential problem in such a competition.

Patryk Pilarski: But every company, every industry has these characteristics. If suddenly Google wanted to enter the telecommunications market, it too doesn't have the data that T-Mobile or other such corporations have.

Richard Tuora: Of course, Google has priority access to data, but there are initiatives to create public data, for example, top-down in state institutions, public control institutions. In institutions at the European level, care is being laid to make this data open and accessible. And corporations have benefits from this, for example, regarding Google's machine translation competence. They benefit a lot from the fact that there is a European Parliament, where such translation is non-stop generated by high-quality translators who provide such data. And this is where the focus shifts from unequal access to data to unequal access to computing power. This is a problem that is much harder to get through.

Danijel Korzinek: A very often-cited example is GPT. When you type in "GPT price" in Google, a very often quoted amount is several million dollars. Already leaving aside the fact that you had to have the data and train it. Later on, there are still situations where a company makes such a model, but states that it will not release it for ethical reasons. And at this point it gets so asymmetrical: some people have access to certain sources, others not necessarily. The question: how important is GPT to be successful in NLP? Is it possible to live without it?

Lukasz Kobylinski: Probably, it is possible to live without GPT, while the fact is that in many competitions, however, the solutions that win are those based on the largest models, which take a long time to overtrain. All this requires computing power. Access to such power certainly affects reality.

Patryk Pilarski: The question is whether the competition is a reflection of reality, and whether we always need that best model to deliver value to the customer, which is what business comes down to.

Norbert Ryciak: Exactly. If we have a model that performs on efficiency one per mille better than others, but is much easier to maintain, then we don't need to compete with these most powerful algorithms, because the model has done its job anyway. Personally, I've never felt the pull of big data, even though I've been in the data science and machine learning business for many years. It seems to me that talking about big data is a little bit bordering on engineering, because handling it all becomes heavily technical side of data science and machine learning. I think not everyone might be interested in that, because you can also do very cool and useful things on small data.

Danijel Korzinek: Against all odds, we talked a lot about data, about resources. I heard that supposedly such infrastructure as PL-Grid is still 50% unused. And this computing infrastructure in Poland we really have a lot. It seems to me that what we lack is not computing resources or data, but people who would be able to do something with it, such as scientists or commercial researchers. It seems that this is the main limitation for us, because there will always be new fields, problems, and there will always be someone who will fund a study for us if we search a little bit. But we would still like to see more people working in these fields, so that we can discover new things.

Lukasz Kobylinski: I'll agree, there are still staff shortages, while I'm not familiar with the PL-Grid topic, I can only say that as a scientific institution we've always had a problem with getting access to some computing infrastructure, but maybe that was due to our lack of knowledge on how to do it formally, or too much formalization of the topic. Because it's well known that if you have a credit card on Google, you can click it in five minutes, and this is perhaps not advertised and facilitated enough for researchers to actually access it.

The discussion took place during the first edition of the Top Tech Trends conference held under the auspices of the Sages Masterclass courses. The speakers are also authors of the Masterclass Natural Language Processing and Masterclass Machine Learning courses.