Albert Schandl (CIRAA): Companies do not yet calculate the carbon footprint of purchased materials and services, but this will change
24. 04. 2022
Reading time: 15 minutes
Minimizing wear and tear and damage to large industrial machines, security problems in shopping malls, self-driving cars, instantly dealing with medical incidents on the street... Computer vision can do all this in real time, so it can predict incidents. And that saves companies a lot of money. "In addition, computer vision often reveals flaws that companies had no idea about," agrees Nikola Pleska, digital transformer at Microsoft, with Jiří Čermák, AI project manager at Blindspot Solutions.
Ivana Karhanová: Computer vision - Technology that helps us monitor safety, pandemic compliance, production quality, or even drive cars. I will talk about it with Jiří Čermák, AI project manager from Blindspot Solutions, and Nikola Pleska, a digital transformer from Microsoft. Hello.
Nikola Pleska: Good morning.
Jiří Čermák: Hello.
Ivana Karhanová: Computer vision. The name sounds hi-tech, yet this technology has been with us. Let's give some examples of how computer vision is used, Nikola.
Nikola Pleska: The application can be effortless. Experienced hobbyists probably remember from the past software vision or document reading, which we called OCR. That's one of the first computer vision applications where we started reading letters from scanned documents. Then, we began to put them together in machine-readable form and accept their content. Over time, other issues piled on top of that, with increasing power, because to read and view reality through the lens of computers, I need to have considerable influence. Jirka can probably speak very well about that, as we gradually gained power and could read more things and understand reality better. That means we can see what is in the photograph and what objects are there. We can also talk about video analytics and the fact that we can, for example, drive a car with good sensor placement because we can see what is happening around it.
Ivana Karhanová: Jirka, driving cars is maybe a lot about advanced computer vision. Still, what are the most common use cases these days? What are companies using?
Jiří Čermák: There's an awful lot of information in a video in general, and every mall has dozens of cameras of its own. Even production lines have dozens of cameras that are constantly scanning the space. In the current situation, video is used very reactively. For example, something happens, and one looks at the video to document a robbery.
Ivana Karhanová: That means we are going into history.
Jiří Čermák: Exactly. However, vision allows to capture and analyze footage almost continuously and react to events immediately or even predict them. And as Nikola was saying, there is a massive evolution in the quality of the models used to analyze and detect objects and different situations, both in video and hardware. So the development of models is specific to deep learning, where deep neural networks are most often used for video analysis. Hand in hand with that is the development of particular hardware of graphics cards and specific chips that can then very cheaply enable video analysis and real-time analysis on a small computer or a small dedicated chip right at the camera. And I can then give the information contained in the video to people who can make decisions based on that. So it could be either a machine maintenance setup or a security problem in a mall, a medical event somewhere on the street, etc.
Ivana Karhanová: Let's go through each use case in detail so that the listeners can get an idea of what can be done with the technology. For example, you deploy veil detection in stores, shopping malls, and public spaces. How does this work?
Jiří Čermák: The detection is that typically on the stairs to enclosed areas where restrictions are to be enforced due to covid, we automatically detect the human figure, the place where the face is supposed to be, and we analyze automatically if the critical points of that face are covered, which means that the nose mouth is covered by something that has a good protection factor. So we can detect that the person is covered if they're covered well and covered with the suitable device. And this is all done in real-time. So when a person is detected who is not covering that sufficiently, we can alert security with a notification, and they can then alert the person and correct the situation.
Nikola Pleska: The applicability of the network means that it creates models that are relatively easy to grasp. So we can detect from veils to detect the distance between people. There are trained models for that, which anyone can download today. And so it's more or less democratizing access to something that used to be only for the big companies, and it cost trillions of money to develop. Today, you can download a trained machine learning model for recognition or estimating the distance between people and use it beautifully in your end applications.
Ivana Karhanová: You mentioned spacing between people. However, insecurity, computer vision is used for zone breaking.
Jiří Čermák: Yes, but, for example, verification that the person who has violated the zone has permission to do so - face scan, eye lens scan, etc. So it's not only about a notification that someone has entered but also about verifying whether it was a legitimate entry.
Ivana Karhanová: So far, we've been talking about analyzing human behavior, whether people are covered, whether they're violating the protection zones. But computer vision is also used in manufacturing. How does it work there?
Nikola Pleska: It can be in different places. For example, we can talk about the fact that I have a special camera that follows manufacturing a product, and I'm looking at an anomaly in the product. A typical example would be the production of a tile, and if I'm tracking it with a camera at the exact locations, I can see maybe much better than the human eye that there is a micro-crack that could be a reason for a potential complaint. And I'll stop production at the right time before there are far-reaching problems. Alternatively, vision can be combined in production from a safety perspective, meaning I don't have anyone approaching the line who doesn't have the proper credentials to operate the line. Here's a quick help to look at a video we did for the Microsoft Build conference back in 2019, which was a nice demonstration of how we detect a picker on the production floor. And the picker is approached by a guy named Jirka who doesn't have the proper credentials, and the picker just won't let go. This means that the power stops when a man touches it, and there is no safety problem. So it's a combination of all the visions where I combine the production itself, or its surroundings, to prevent any ills that may arise in the production or processes.
Ivana Karhanová: Jirko, you mentioned damage to machinery. How does that work?
Jiří Čermák: It's part of predictive maintenance. Typically on hard-to-reach parts of huge machines where there is frequent wear and tear due to the load, we can continuously monitor how the wear and tear occur there. And instead of what happens is that a machine wears out to the point where it breaks down and stops all production and the huge money loss that comes with it, we can warn that damage is occurring in time and resolve it in scheduled downtime for that machine. It means that the maintenance person who has the expertise is getting warnings from these approaches that there is wear and tear on specific parts of the machine. He can do the final inspection himself, schedule maintenance for the next shutdown window, and eliminate the downtime that would otherwise occur.
Nikola Pleska: Or I will reduce the power of the machine simply to maintain continuity of operation, even at the cost that there might be a fault. But the machine, if it has a smaller load, will last a little longer and perhaps until the next planned downtime. That means that it's again a combination of a kind of human decision or an automatic decision, which is probably where we're going - we're letting the machines that can see decide what should happen and how it should happen and why it should happen. That's the vision we have to eliminate typical human error because humans make pretty bad decisions in this, maybe by something we call common sense. Most of the time, it's wrong, and that's what we don't want to occur in these cases.
Ivana Karhanová: What do I need to do in practice to be able to, let's say, microcracks on boats or wear and tear on machines to be able to sense and then, of course, evaluate?
Jiří Čermák: Of course, if I want to capture a phenomenon, I need to have a sufficiently detailed image to be able to recognize the phenomenon.
Ivana Karhanová: By phenomenon, you mean the crack.
Jiří Čermák: For example. A good rule of thumb is that if the crack is so small that it's impossible to see it in the image with the human eye, it's very hard for the algorithm to detect the defect. So I need to have a good enough capture of a product that might be cracked, for example, and enough examples of what different cracks might look like. During a project, we typically run an analysis where we can validate that something like this can be detected. The easiest way to do this is to get a lot of examples of what those cracks might look like, and do a feasibility study first, where we demonstrate to the customer the achievable accuracy of detection and what they can expect from the system.
Ivana Karhanová: We may have started with a quite complicated example. And when we are talking about, for example, monitoring of drapes or spacing, we can get by with ordinary cameras deployed in those plants.
Jiří Čermák: Yes, typically nowadays, everybody has normal and hyper cameras installed in shopping centers and various shops. And there is no requirement for the camera to have specific features. A standard camera is enough. The infrastructure, the network where that camera is connected, the machine that processes that video is placed. And that's it.
Ivana Karhanová: So the actual computations or algorithms are done in the cloud because that allows for ever-increasing performance?
Nikola Pleska: I think the cloud has given us the power, and we've started to test these things, train models, and recognize the world around us. But it's inconceivable that, for example, a self-driving Tesla would send things to the cloud because I'm relying on the fact that there's infrastructure somewhere that will stop my data flow and suddenly crash because of it. Today, we're getting closer to using models on computing. I learn the thing in the cloud, but then by sort of containerization, I get the smarts to the end device, where the implementation of machine vision then takes place. So that the end device understands it and then maybe just sends the data from that back to the cloud, where the data is used for some other purpose. But there's a huge trend to do these things more on-prem, but using the cloud for learning and trying things out. I might be burning resources unnecessarily on developing a specific box for something that they wouldn't find a use.
Ivana Karhanová: Here, edge computing can be thought of as an octopus that reaches down to the individual devices, and the health evaluation takes place at those devices.
Nikola Pleska: Quite simply, think of it as a smarter camera that can record an image and process it directly in the camera. And to make sure that the machine learning model, which I've always been able to update to my needs so far. That's where I need the ability to get that smartness from the cloud. So that means it's a programmable camera. You could put it that way.
Ivana Karhanová: You mentioned the democratization of technology as well. We can think of it as something that just as we can buy apps from AppStore or the Store on Google, companies can apply the same thing in an enterprise environment - to put it simply.
Nikola Pleska: That's right. As the supplier of the underlying technology of that core performance, we, like Microsoft, have a few basic models that you can start using right away. And today, I can imagine simply reading a document in paper form using some power app and then dealing with an approval process tied to the data that I got from that document. Or we have models for facial demographics recognition, veil detection, or distance estimation. And that's how we develop some of these things. Well, if that doesn't exist and it's specific, partners like Blindspot Solutions come in and say, hey, we just don't have that model for detecting anomalies on tiles here. Well, we have to roll up our sleeves and do it separately. But basically, it's incredibly cheaper to have all this access to these technologies. And today, that technology can be wielded by companies that would never have had the resources to do it in their lifetime. Today, they'll buy it for a few bobs or possibly pay for the knowledge of data companies to get it moving. But it's a relatively cheap affair.
Ivana Karhanová: Jirka, when we talk about basically bespoke development, when we evaluate something, how much time and money do companies need to spend on that, for example, compared to just using an off-the-shelf solution?
Jiří Čermák: Even in the context of using a ready-made solution, the price range is quite extensive, so we can talk about something that is a finished product. For example, simply statistics on the stairs of shopping centers. These are ready-made cameras that you put in there and connect, and they calculate the statistics of entrances, exits, etc. And these solutions are in the hundreds of euros per camera. The other extreme, of course, are things that are completely bespoke because they are so specific that nobody has addressed them. Then you have to collect the data, develop the model, and so on. The prices can go into the hundreds of thousands of euros. Of course, typically, we are somewhere in the middle ground because you can't put a completely flat price tag on every task. The task has its complexity, and it is the complexity of the task itself that defines the price, how much data needs to be collected, how difficult it is to train the model, and how accurate the model can be. So here, I would probably recommend automating and streamlining a process, reaching out to one of those companies already in that analysis to provide expertise, and maybe even help specify that idea so that we don't get into anything extremely complex. Still, there's maybe a simpler approach, more affordable and more efficient. The firm has experience with that and can help with that.
Ivana Karhanová: Nikola, what is the biggest added value of Computer Vision from the perspective of, say, the company's customers - the better quality or better reliability?
Nikola Pleska: I would say it's probably business continuity. It means that in different parts of different industries, there can be different destructions in processes. And I'm trying to find a working model to eliminate as many of those distributions as possible here and operate relatively free of disruptions. On the other hand, in retail, the added value can be in a better customer experience. In short, I'm just going to build a customer journey for that customer that's just going to be familiar, and I'm going to use the technology to do that. That means there's no gatekeeper coming to me to say, "Wear a hood. It's just delivered in a way that's for the twenty-first century. It means I'm not bothering the customer with any government regulation. And that's just the part where it's a bother to the customer. We would likely find plenty of other instances where I'm building a unique customer experience, whether it's tracking shelf replenishment because that's a shortcoming. Again, I'm able to use vision because I'm finding out if the product is displayed there or not. I'm finding out if there's a match between the price tag and the product displayed there. Again, I'm just putting two inputs together.
Ivana Karhanová: Does that mean that, let's say, the chain sells more, and the customer doesn't come to an empty shelf looking for an item that's just supposed to be there, and it's not?
Nikola Pleska: Or maybe they don't buy expired goods. When I'm scanning the shelves, I go through the camera and see if the label says: hey, this is about to expire. If it does, I'd better pull it off the shelf, which of course, the moment I accidentally leave it there, I'm setting myself up for potential trouble. I automatically deal with it that way, and most of the time, the trouble doesn't arise.
Ivana Karhanová: We've been talking about event detection, where a human being makes the decision afterward. So what's next for us? With full automation of processes, where do we eliminate human decisions?
Jiří Čermák: We need to remember that we are still talking about machine learning. And the unwritten rule is that there will always be corner cases that are extremely difficult to solve. Look at self-driving cars. That's a thing that's been talked about for years, and still, no one has gotten it to full operation. With the current state of knowledge, there will always be situations where the algorithm can't make a perfect decision. So for me, the answer is that there should always be a human factor, some expert who can decide these situations. The way to go should be more than 90. Maybe 99 percent of the situations are solved by the machine in clear situations, and the expert knowledge of the human is only used for really difficult, somehow critical situations. By freeing its hands, it can service maybe a hundred times as many events - the ones where it's needed.
Nikola Pleska: Well, we're probably heading towards having these machines be autonomous for highly repetitive activities. Where something is repetitive, I can very well teach that machine to recognize an anomaly or to repeat that process, but at the point where I want some expertise, I can't quite expect that decision machine to do that. And then, of course, there's the ethical use of machine learning, or not to say artificial intelligence. This brings another dimension because companies should probably be a little bit more responsible about it, seeing the various machine learning models for facial recognition. There's a very fine line where you can abuse it, where you can start to track people uncomfortably, and you can evaluate from that, for example, moving back and forth when you put the story together. And it's very important that companies that provide that technology do so in a responsible way.
Ivana Karhanová: When is the most common aha moment when company executives decide that they're going to ask for similar technology, try it out - when does that shift happen for them?
Nikola Pleska: I hope that's listening to this podcast.
Ivana Karhanová: Right, and in practice? What have you seen so far?
Jiří Čermák: Typically, it's a long-term observation of some inefficiencies. People know that it's there. For example, they don't have another expert in using to increase production. They've heard about a competitor using vision, or they've read an article about it that allows you to streamline processes. So that's typically when they start to research the market - what's possible, what's not possible - and start to learn about the available products.
Ivana Karhanová: When they make real inquiries, aren't they intimidated when you present to them what it all entails in practice?
Jiří Čermák: Some people get scared, some people don't, but the ones who are interested are, of course, willing to work with us and analyze the situation. We are happy to explain to them the limits of these approaches, what can be achieved, and what it would mean for them. And we're just trying to phase the project by doing an analysis with them where we explain everything to them. We'll make a hypothesis with them of what savings they could achieve, then a feasibility study where we'll evaluate small examples inexpensively to see if that hypothesis is feasible. And then, we recommend productization, so we go with the approach: I want to streamline my whole factory by deploying cameras and demanding an end-to-end solution. And the company says: price it to me. But that doesn't usually achieve a good goal. We typically get results much better by fragmenting and working incrementally with a company that can deliver that solution.
Ivana Karhanová: What are the requirements that you put on companies then?
Jiří Čermák: What they have to meet is very intensive cooperation with us. They have to explain the process to us almost perfectly at the beginning. We often encounter the fact that no one brain understands the whole process. So at this stage, we often work with the client to write down the specifics and rules of the process in general, which we try to optimize in some way, which typically has a lot of added value for both parties. Then, of course, there's the data collection, the feedback on the outputs of that process. So it has to be a very intensive and close collaboration, where we talk almost every day about the quality of the solution we have, the plans we have. These are the main prerequisites for the project to have a chance to succeed at all.
Ivana Karhanová: Nikola, can managers even imagine what they can want from computer vision?
Nikola Pleska: It's getting better, but I wouldn't say we've won. We still have a lot of evangelistic work to do. We just need to explain the hard issues of IT, especially to business people, so that they start to understand where the technology has some impacts and when it has some benefits for them. They're not used to that. They're used to just looking through the lens of the process they're responsible for. We are trying extremely hard to break down those boundaries. We can help even with the beginning of projects, like paying for some of the upfront investment required to get the technology up and running, get a little bit of a shave, and then go big.
Ivana Karhanová: When a company decides to apply computer vision, I assume that the results should be translated into other processes, that it will affect not only one specific process, one specific production step, but a whole range of things in that company.
Jiří Čermák: That's right, it opens the way for them to have some data-driven decisions. For example, in a shopping mall, discount promotions can be based on footfall statistics, layout changes, lease designs for different spaces based on how often people go there, etc. And then it's very much up to the client. We're happy to help them with all that, but they should be able to get those actionable from that data. It's not always trivial, but you need to end up having stats on how people visit my store in days and start mining that data further.
Ivana Karhanová: What is the willingness of managers and people to become a data-driven company in general?
Nikola Pleska: Most of the time, that willingness is quite high when it comes to that person's process. But the willingness to talk in a multidepartmental way - because basically, I need everything to build on each other - is often low, and that's a problem. Quite often, we see an idea from the marketing department, for example. We'd like to improve the customer experience, and we want these and those elements in there. Well, and we come across the other department, which is called security, for example, which says, "Well, but we can't do that, we just can't do that, we don't want to do that, it's not going to do us any good. I've got my own KPIs (key performance indicators), and if our KPIs don't deliver, we don't care." Then we could talk about other divisions that it doesn't fall to. So it's the optics in only one direction that often holds projects back. And individual managers cannot work together to agree on the general interest.
Ivana Karhanová: Do you think it's their inexperience or maybe their ego?
Nikola Pleska: It can be both. It probably stems from the fact that if I understood the issue, my ego would allow me to start the project, and it might not allow me to do so precisely because I am stepping into the unknown and I don't know what to expect.
Ivana Karhanová: We are limited by the human factor. Still, do you perceive any limitations on the technology side? For example, to move forward in computer vision so that the real application in business is greater?
Nikola Pleska: I don't perceive that we suffer from a lack of technological background. On the contrary, we are brimming with it. We have things at our disposal that we never dreamed of in our lives. Now it's just us who are developing the technology, standing there with our hand out and saying: take it, but the take-up is largely slow when you compare it to other parts of the world. When I compare how the adoption of these technologies occurs in the Czech Republic and the more technologically developed world, it is like heaven and earth. Here in our basin, things are simply slow and cumbersome. When something succeeds in our country, a person behind it has a strong vision and a real push and pull to push things through. And then it's great. But quite often, those people are missing in companies. Ordinary middle management thinks that they don't need to change anything. They're happy with the way the company works. This is, of course, due to the economic situation, the conjuncture in general, which has been here, the fact that there is a shortage of people on the market to replace them, and therefore there is a certain comfort. We're just here in this sour little pond of our own, going round in circles.
Ivana Karhanová: Isn't it also a fear that the algorithms will replace them?
Nikola Pleska: It can be, but in the end, experience shows that it will happen anyway. That means that either I'm going to benefit from it, learn from it, accept it and take the good from it, or I'm going to fight against it and be overwhelmed.
Jiří Čermák: And I'll be overwhelmed by the competition that uses it. I want to add some more specifics about these projects. One is in manufacturing, where I know how much it costs me per year to scrap, for example, if I'm unable to address it. In retail, the specifics are that it's hard to assign a return to a project because there are actionably tied to it, and then customers simply can't evaluate the ROI. Hence, companies are hesitant to do these projects. They often miss the train from competitors who don't hesitate to give it a try, knowing that it is sometimes a step into the unknown.
Ivana Karhanová: If I'm a manager who needs to use data more to involve computer vision, what should I prepare my company for?
Jiří Čermák: As I said, it will require intensive cooperation with the supplier who will deal with this. It will be necessary to uncover the processes happening there, open Pandora's boxes, and look for inefficiencies in their process.
Ivana Karhanová: So I'm supposed to accept that I will find inefficiencies in a lot more places than I think there are?
Jiří Čermák: Exactly. It's not related to computer vision. This is what process optimization generally brings with it. People should put aside their egos and remove the ", but this is how it's been done for the last 40 years" argument and just be able to dig into the processes themselves because they are the ones who have a huge amount of knowledge that we need to pass on to help them.
Ivana Karhanová: Does that mean I should prepare myself for a possible change in decision-making processes? Should I prepare myself for the fact that the data may reveal other unpleasant things that I might have wanted to ignore until now? But what about, for example, connecting to existing systems?
Jiří Čermák: There, of course, the requirements of the security department of that customer come into it. We typically try to use existing camera systems because it brings down the price of the whole solution. Still, often for some reason - for example, in the banking sector - they don't want to allow external suppliers to have access to the cameras. Then, of course, it means that new infrastructure has to be installed, and there is intervention in those locations. How big depends on the specific use case. It may be monitoring of entrances - that's a small thing, for all premises it's more than that. So it's good to think at the outset if there is an opportunity to use the current infrastructure for that project and to find out internally because that's one of the first questions that come up when dealing with that project.
Nikola Pleska: Connecting to other systems from some business platforms that I have is quite easy. I quite often use the analogy to the central brain of the company. It means that if I have the data in bunches, ideally in the cloud, I can do the overall analytics on top of it, then I'm halfway there because then I start to put the patterns together. It's the gaps or the good things that start to come out. Suddenly I can see that this is how we should be doing it. Oh, and I need to have the data on one plate to dissect it and then enjoy it as part of an analytical dinner.
Ivana Karhanová: For example, how does the collaboration between just the business and the IT teamwork in companies?
Nikola Pleska: That's a very physical question for us. Of course, we would be happy if the cooperation was more intense, and I will refer to my previous answer: it is getting better. With educated people already dealing with technology, there is a little better collaboration. But quite often, it's still the case that the project deals with IT and doesn't deal with the business, and vice versa, that the project deals with the business and IT doesn't interfere. It's possible today.
With democratization, as business people, you can buy a service relatively unbeknownst to IT and start consuming it suddenly. This, of course, brings some security risks because suddenly, you have what we call grey IT in your factory or your organization, and you don't have control over it. That means that here you need to emphasize more on the IT people to start doing their role because they are the internal supplier and have their internal customers. Those internal customers should be delivering the assignments, and if I abstract it a lot, the future IT shouldn't be dealing with anything other than data governance. That is, who gets access to what data, why, and how they access that data. It could be a human, or it could be a system. It doesn't matter, but I need to start orchestrating where I have that data and who gets into it to create value. That's the future of IT because everything else I can buy, outsource, or maybe the business will do itself. And now I'm talking about, for example, business application platform low-code/no-code approaches, like power apps in our case. That means that today a business makes its business process and seeks IT support. He connects a piece of the model, i.e., computer vision, which is ready in some catalog somewhere and has a beautiful application that takes his process, his activity, to a much higher level.
Ivana Karhanová: You described that very nicely, but what is the reality, Jirka?
Jiří Čermák: As Nikola said, it's getting better. But of course, we often run into the fact that something is promoted by IT and does not have the support of the business. And vice versa, the business has a specific use case that they want to solve, and they don't have IT support, either because the department is overloaded or they don't want to introduce new technology into their portfolio. Of course, that always complicates the delivery of that project, either because of the time involved or because it just doesn't get done. So unless those two parties are aligned and digging for one team, it's always a complication.
Ivana Karhanová: If we were to summarize and you both were to give one piece of advice to companies that want to move forward with digitalization and maybe be data-driven, what would it be?
Jiří Čermák: I think definitely at the beginning, bring someone into the discussion who can help them with that, which means a company that has experience with that, whether it's from competitors, from other projects, and so on. And can help them specify the problem in a way that's much more solvable, much cheaper, and has much more benefits because they just have experience with it already. It's always terribly risky to do this on a greenfield project without expertise because you can specify a virtually unworkable project.
Nikola Pleska: I would add to that: just start with the project. Today it's extremely cheap to throw something away, which means that if you go the way Jirka was talking about, which means not spending billions on it, I can afford to just have that failure. I can just fail. But besides that, I have five other ideas in my box that can catch on and be fabulous. So going in with this principle means I'll quickly test to see if it works, or I'll quickly discard or develop further and generate ideas like a treadmill.
Jiří Čermák: I'll add here that this is what I mentioned in the project breakdown: not to do monolithic end-to-end projects, from idea to implementation, to tender them out, but to break them down into analysis, feasibility study, and only after I'm sure it's worth investing big money in it, to roll it out.
Nikola Pleska: I guess the technical term should be "prototyping," which means I prototype it fairly quickly or throw it away because it's no good.
Ivana Karhanová: Thanks to both of you for coming to the studio. So that was Nikola Pleska, a digital transformer from Microsoft. Thanks for coming.
Nikola Pleska: Thank you.
Ivana Karhanová: And Jiří Čermák, the head of AI projects at Blindspot Solutions. Thank you as well.
Jiří Čermák: Thank you.