Happy Friday, dear reader! Another week down, and another chance to reflect and update here. In case you don’t feel like reading the past entries, let me catch you up to speed:
I’m Maddie, one of 4 students working on the STEM Visuals team at Lehigh University’s Mountaintop Summer Experience program. Our current focus is developing a “Digital Lab Twin” Module for use in a lab here at Lehigh, in tandem with our advisors Prof. Rangarajan, Dr. Menicucci, and Dr. Thiagarajan. The module hopes to use machine learning to model the photobioreactors Dr. Menicucci’s class will build through their labs, allowing students to play with a virtual sandbox and find interesting parameters to experiment with physically. The project is web-based and open-sourced–all past modules can be found here.
Now, away from the elevator pitch and onto this week’s progress! We’ve pulled together as a team quite nicely if I may say, and this marks the first week of what I’d call truly tangible results.
My week started by solving the problem that’d really been stressing me out for the last few weeks: getting our synthetic dataset ready to go. No machine learning is happening without that data, so it was a prerequisite to even the most simple of testing. If there’s anything to take from us, it’s this: whatever you think will be difficult will likely be simpler than you believed. What you believed to be simple, however, will likely end up with you feeling like this:
Seriously. I’d hoped this would take maybe a week at most–and here we are, finishing it at the start of week 3. All things said and done though, it’s alright–the time spent researching ML technologies definitely made implementation far, far easier. Once I had my data, it was a matter of a day’s work to transport it into the test model I already built (for the nitty-gritty on that process, see last week’s post!) Once again, Python comes in clutch with a great community and plentiful documentation, so all that I really had to do was pin down what my problem was in order to find a working solution.
Now, this is a visualization project, so I’d be remiss if I didn’t share the output of the graph by this model just like last week. As before, first is a loss graph, showing how accurate our predictions were against the number of epochs, or runs through the entire training set of data.
That sharp decrease shows that our model is learning quickly! The fact our MSE reached zero is something to possibly improve on, as it may be a sign of overfitting. However, for a first attempt, it’s promising to see!
Next, here’s one of the actual output models. The X axis represents time; the 150 hours that the training experiments were run for. The Y axis shows concentrations of Lutein, a wanted bioproduct synthesized from a photobioreactor. All 100 experiments are overlaid atop each other, creating the spread of blue and pink dots you see–blue was data used to train the model, and pink is holdout points to test later. Green shows what the model thinks the Lutein concentration should be, given the conditions of the experiment prior!
Isn’t it pretty? Getting a cool-looking model like this is uber-satisfying, and part of the fun of a project like this. Our predictions aren’t too far off, either; that’s a relief! However, this isn’t the final plan for this model.
See, what’s happening up above is the model predicting only one time step aheadĀ of reality. It’s a shallow neural network in that sense–each set of inputs is tied to one set of outputs, representing that next timestep. Ideally, our model can be run multiple times, feeding its own output back into itself to generate an entirely model-made set of data over time. Beyond administrative work for the team, the rest of my week has been focusing on creating this exact idea–a visual of what our model attempts to predict over longer periods of time. Sadly, I’m not quite finished with it as of writing this post–but expect that to be the first thing I gush about next week!
As for other strides forward this week, we were challenged to put together a presentation surrounding end-users and really hone in on the audience for our module. Alex designed this new logo for it, and it’s really cute!
Perhaps someday we’ll find a more catchy team name than simply “STEM Visualization”, but that day isn’t today. At least our logo’s eyecatching now!
Further, allow me to share my favorite visual from the presentation I designed myself:
These presentations offer a welcome change of pace from focusing solely on code, and creating bright visuals for them has become one of my favorite weekly tasks. I’m excited to make more as we move on!
Looking forward, it’s all about optimization on my end. I want to adjust the neural network to be more robust, and that starts with measuring how accurate it already is. Further, we’ll want to explore other models, so once the neural net is finished, I’ll likely switch to helping my teammates who are currently attempting a similar task to my own with a Gaussian Process model. It’s been a heck of a week, all the better for having been four days!
Here’s to hoping progress remains this smooth. See you all next week!
-Maddie