I'm still learning very much every day about the best way to do these things, but the way I've begin to think about it is just to care very deeply about reproducibility in all aspects of everything that I'm working on.
And that, that kind of naturally pushes you towards a lot of the best practices that are being adopted, in terms of how you develop your code that you're going to use to perform your analyses, version control all the standard things that we think about when we're starting a new project.
And then I think that also this idea of trying to be reproducible trickles down in the MLOps, because if you're worried about reproducibility, then you begin to worry about the history of the data, how the data got to the way it is. And just trying to streamline that whole approach is what pushes you towards the best practices.
And that's still something I'm trying, especially on the MLOps end is something I'm still very much trying to learn. There's a lot of new tools out there that try to make this process easy for you. And then sometimes I'm always debating oh gosh, should I learn this new tool? Or should I just have it write out an output file that says you know, the history of my data or something, because right now I don't want to climb that learning curve.
I got a deadline to meet, you know? But in terms of best practices for me, it's just making reproducibility easy not just for the code and running the code, but the data that you produce and always being able to track down you know, for a given data set like which scripts were used to generate it and what parameters were chosen. So that reproducing at any phase is easy to do.