14: Edge Computer Vision with Karthik Kannan

0:00 / 1:59

Karthik Kannan

One of the really interesting things in this space, and I think one of the key sort of, I would say, moats that Envision as a company has tends to be the fact that the data that we have is all from a visually impaired person's perspective and that helps us build much, much superior AI compared to competitors in this space who might have to rely more on open datasets in the beginning, because these open datasets, they're all data that's captured from the point of view of a sighted person, right? If you look at a dataset like ICDAR, which basically contains a lot of documents that are scanned, those documents are perfectly scanned: top down, cropped and properly perspective- transformed.

But if you look at how a visually impaired person captures a document with a phone it's dramatically different. The angles are very skewed. Sometimes they only capture a portion of the document. Or sometimes the way they capture or the whole, their phone is in completely different orientation, so you get different orientations. So all those things matter a lot. And initially we had to go through a lot to basically even build our own datasets in some of these areas, like for example, document detection and all those things. We had to invite users to the office, collect data from there. And eventually when we started to work with user data, we had to work through a lot of stages before we can actually use the data that's coming in from the users or before really having like our own data set to start working on bigger problems. So what really helped Envision in the early days was definitely the availability of pretrained models, checkpoints architectures and even some of the papers that people wrote that did contain insights into how they do post processing, for example, when they're trying to scan a table or when they're trying to scan a document. Those things really, really helped Envision in the early days as a startup, because of course we didn't have the bandwidth to do all that research ourselves. You know, initially it was just me trying to figure this stuff out and over a period of time we've come to rely more on our own data set, plus the advances that are happening in the open field and combining them together to actually solve problems.

Auto-scroll