From Math to Code: Building GAM with Penalty Functions From Scratch
Enjoyed learning penalized GAM math. Built penalty matrices, optimized λ using GCV, and implement our own GAM function. Confusing? Yes! Rewarding? Oh yes!
Enjoyed learning penalized GAM math. Built penalty matrices, optimized λ using GCV, and implement our own GAM function. Confusing? Yes! Rewarding? Oh yes!
I finally understood B-splines by working through the Cox-deBoor algorithm step-by-step, discovering they’re just weighted combinations of basis functions that make non-linear regression linear. What surprised me is going through Bayesian statistics really helped me understand the engine behind the model! Will try this again in the future!
We learnt to derive the Newton-Raphson algorithm from Taylor series approximation and implements it for logistic regression in R. We’ll show how the second-order Taylor expansion leads to the Newton-Raphson update formula, then compare individual parameter updates versus using the full Fisher Information matrix for faster convergence
Refreshed my rusty calculus skills lately! 🤓 Finally understand what happens during complete separation and why those coefficient SE get so extreme. The math behind maximum likelihood estimation makes more sense now! Chain rule, quotient rule, matrix inversion are crucial!
In my simulations of Response Adaptive Randomization, I discovered it performs comparably to fixed 50-50 allocation in identifying treatment effects. The adaptive approach does appear to work! However, with only 10 trials, I’ve merely scratched the surface. Important limitations exist - temporal bias risks, statistical inefficiency, and complex multiplicity adjustments in Bayesian frameworks.
RSQLite
With DBI
: A Note To MyselfI messed around with DBI and RSQLite and learned it’s actually pretty simple to use in R - just connect, write tables, and use SQL queries without all the complicated server stuff. Thanks to Alec Wong for suggesting this!
Plumber
and JavaScriptTried out plumber and a bit of JavaScript to build a simple local API for logging migraine events 🧠💻. Just a quick tap on my phone now records the time to a CSV—pretty handy! 📱✅
🙈 Made a hilariously redundant R package for a simple OpenAI calls, but the real win was finally learning how to build an R package! 🛠️ Is it efficient? Absolutely not!Was it worth the time and experience? Yes! Will I do it again? Yes! Will it break? Yes! 🤣
How do we identify relevant articles in our domains? This project uses example journal RSS feeds with abstracts, uses LLMs to extract points of interest, and shares insights on Bluesky—stimulating curiosity.
I found Polars
syntax is quite similar to dplyr
. And the way that we can chain the functions makes it even more familiar! It was fun learning the nuances, now it’s time to put them into practice! Wish me luck! 🍀
"Fascinating" describes my journey with Stable Diffusion 3. It’s deepened my appreciation for original art and masterpieces. Understanding how to generate quality art is just the beginning—it drives me to explore the underlying structure. Join me in exploring SD3 in R!
Overall, I am quite impressed with the responses! With minimal prompt engineering, document cleaning! It was able to return accurate responses, and even separated different conditions and provided appropriate treatment options. It was also able to return the correct response for tricky questions that our RAG was not able to. It definitely has potential!
Wow, what a journey, and more to come! We learned how to perform simple RAG with an LLM and even ventured into LangChain territory. It wasn’t as scary as some people said! The documentation is fantastic. Best of all, we did it ALL in R with Reticulate, without leaving RStudio! Not only we can read IDSA Guidelines, we can use LLM to assist us with retrieving information!
MCAR, MAR, MNAR, all so confusing. But with DAG, oh so amusing! Many technical words, I don’t understand, but with simulation, I am a fan! Join me in exploring missing mechanisms, learn I will with great optimism.
The S
UTVA, P
ositivity, I
dentifiability, C
onsistency, E
xchangeability of Causal Inference, the essential ingredients that helps us bring out the true flavor of the causal model. Here is my understanding of each assumptions (main course) with examples (side dish) and accompanied by simulation (paired with beverages). Bon Appétit!
I’ve struggled with differentiating between total, direct, and indirect effects, so this blog/note serves as a personal reference to solidify my understanding and make future amendments as needed. While there are comprehensive articles available, this is a simplified explanation for myself and potentially others
It was enjoyable to visualize the non-linear relationship with interaction and observe the corresponding changes in CATE. If one understands the underlying equation, it’s possible to easily obtain the ATE using calculus. Lastly, adopting Richard McElreath’s Owl framework as a documented procedure ensures quality assurance! 🙌
I’m now more confident in my understanding of the 95% confidence interval, but less certain about confidence intervals in general, knowing that we can’t be sure if our current interval includes the true population parameter. On a brighter note, if we have the correct confidence interval, it could still encompass the true parameter even when it’s not statistically significant. I find that quite refreshing
We learned how to convert the pooled odds ratio from a random-effects model and subsequently calculate the number needed to treat (NNT) or harm (NNH). It’s important to understand that without knowing the event proportions in either the treatment or control groups, we cannot accurately estimate the absolute risk reduction for an individual study or for a meta-analysis. Fascinating indeed! Everyday is a school day! 🙌
Here, we have demonstrated three different methods for calculating NNT with meta-analysis data. I learned a lot from this experience, and I hope you find it enjoyable and informative as well. Thank you, @wwrighID, for initiating the discussion and providing a pivotal example by using the highest weight control event proportion to back-calculate ARR and, eventually, NNT. I also want to express my gratitude to @DrToddLee for contributing a brilliant method of pooling a single proportion from the control group for further estimation. Special thanks to @MatthewBJane, the meta-analysis maestro, for guiding me toward the correct equation to calculate event proportions, with weight estimated by the random effect model. 🙏
What an incredible journey it has been! I’m thoroughly enjoying working with Stan codes, even though I don’t yet grasp all the intricacies. We’ve already tackled simple linear and logistic regressions and delved into the application of Bayes’ theorem. Now, let’s turn our attention to the fascinating world of Mixed-Effect Models, also known as Hierarchical Models
Diving into this, we’re exploring how using numbers to express our certainty/uncertainty, especially with medical results, can help sharpen our estimated ‘posterior value’ and offer a solid base for learning and discussions. We often talk about specifics like sensitivity without the nitty-gritty math, but crafting our own priors and using a dash of Bayes and visuals can really spotlight how our initial guesses shift. Sure, learning this takes patience, but once it clicks, it’s a game-changer – continuous learning for the win!
I learned a great deal throughout this journey. In the second part, I gained knowledge about implementing logistic regression in Stan. I also learned the significance of data type declarations for obtaining accurate estimates, how to use posterior to predict new data, and what generated quantities in Stan is for. Moreover, having a friend who is well-versed in Bayesian statistics proves invaluable when delving into the Bayesian realm! Very fun indeed!
There is a lot to learn about Bayesian statistics, but it’s fun, exciting, and flexible! I thoroughly enjoyed the beginning of this journey. There will be learning curves, but there are so many great people and resources out there to help us get closer to understanding the Bayesian way.
Sending key presses to another device using software that emulates a keyboard, but isn't a physical keyboard, is a fascinating concept. We understand that in the Linux/Unix environment and with Python, this can be accomplished through low-level programming. But can the R programming language achieve the same feat? If it can, then how does it work?
Interaction adventures through simulations and gradient boosting trees using the S-learner approach. I hadn’t realized that lightGBM and XGBoost could reveal interaction terms without explicit specification. Quite intriguing!
I’m delighted that R users can have access to the incredible Hugging Face pre-trained models. In this demonstration, we provide a straightforward example of how to utilize them for sentiment analysis using GPT-generated synthetic data from evaluation comments. Let’s go!
The PyWhy Causal-learn Discord community is fantastic! The package documentation is equally impressive, making experiential learning both fun and informative. Truly, it’s another exceptional tool for causal discovery at our fingertips!
Get ready for a thrill ride in causal discovery! We’re diving into gCastle, a Python package, right in R to amp up our skills. Let’s orchestrate our prior knowledge and nail that true DAG. 🔥
Simulating a binary dataset, coupled with an understanding of the logit link and the linear formula, is truly fascinating! However, we must exercise caution regarding our adjustments, as they can potentially divert us from the true findings. I advocate for transparency in Directed Acyclic Graphs (DAGs) and emphasize the sequence: causal model -> estimator -> estimand.
Saving can be enjoyable! If you’re planning to cut down on takeout orders, why not use past data to simulate your savings? Let it inspire and motivate your future dining-in decisions! 👍
Beware of what we adjust. As we have demonstrated, adjusting for a collider variable can lead to a false estimate in your analysis. If a collider is included in your model, relying solely on AIC/BIC for model selection may provide misleading results and give you a false sense of achievement.
Front-door adjustment: a superhero method for handling unobserved confounding by using mediators (if present) to estimate causal effects accurately
I had the opportunity to share our journey to data science in medical education
Which strategy is the most optimal for dollar cost averaging? Let’s play with data!
Bring a textbook to life by Using a simple Natural Language Processing method (Ngram) to guide focused reading and build a robust differential diagnosis
I didn’t want to read the textbook in sequence. Hence, I figured that if I read a paragraph a day in a random chapter, I might be able to benefit from random learning!
R
you doing it?Just Because in a true sense :D
How to solve this… 2 ? 1 ? 6 ? 6 ? 200 ? 50 = 416.56
Brief Introduction: The 100 prisoners problem is a probability theory and combinatorics problem. In this challenge, 100 numbered prisoners must find their own numbers in one of 100 drawers in order to survive. Rules: We have 100 prisoners labeled: 1, 2 … 100 on their clothes we have a room filled with 100 boxes labeled 1, 2, … 100 on the outside of the boxes inside each box, there is a number from 1, 2 … 100 only 1 prisoner may enter the room each time Each prisoner may open only up to 50 attempts/boxes and cannot communicate with other prisoners if the prisoner found his/her/their number, he/she/they will exit the room and no be able to talk to other prisoners.