NBIS
03-May-2026
The thought process should not be: “I have some data, why don’t we try neural networks”
But it should be: “Given the problem, does it make sense to use neural networks?”
Question from group leader: “I tried deep learning on my data and it didn’t perform better than this other simpler method”
… is Deep Learning the right choice?

source: datarobot
… you need a few more things:
And therein lies the main issue: * Some think that DL is about having a model magically fixing your data * Reality: your network will be as good as your data at best
(https://www.datarobot.com/blog/identifying-leakage-in-computer-vision-on-medical-images/)
Inspecting dataset with image embeddings tells another story: can anyone tell what’s wrong?
Let’s look at activations map and see more in detail
Jupyter notebook (good_practices/labs/target_leakage/investigating_target_leakage.ipynb)
Visualize the layers of a NN for Natural Language Processing:
(2F08 “Fear of Flying”)
Train, development and test sets cannot be too similar to each other, or you will not be able to tell if the network is generalizing or just memorizing
Jupyter notebook:
good_practices/labs/data_splits//rigorous_train_validation_splitting.ipynb
Two different strategies will be tested: * Random split * Split by alignment score
Which works best? Different groups test different networks on each strategy
Reasons why one of my networks wouldn’t work:
They will “kind of” work even when some labels are incorrect, but it is going to be very tricky to understand if and what is wrong
Main avenues: * Find more of it * Make smaller models * Cut down insignificant features * Generate artificial samples: Data augmentation * Transfer learning (so find more data, again) * Think outside the (black) box
You thought we were done with PyTorch api explanations, but we’re not!
Good Practices of Project Design