Welcome!
My name is Lucas.
I’m a data scientist that hails from the world of bioinformatics. I have been working with data for at least a decade already! My career began by doing research regarding evolutionary biology, specifically molecular evolution using bayesian inference and maximum likelihood methods, subjects of my master’s and bachelor’s degrees, which were conducted at the Federal University of Rio de Janeiro (UFRJ). This context formed the basis of my knowledge in statistics and computer programming, since back then most tools used in today’s data science stack were already part of my everyday life.
I’ve also worked for many years at Brazil’s National Centre for Flora Conservation, with data/environmental analysis directed to aiding my country’s Ministry of Environment consolidate public policies regarding threatened species and economic activities. This was a period of huge personal growth, since I worked in partnership with several experts and national/international organizations, with politically sensitive topics.
Both these experiences leveraged my storytelling and softskills, since explaining technical topics to non technical audiences was also part of my daily activities, besides analyzing data and writing reports. With the rise of data science and analytics I naturally took the plunge to the job market, since most of my skills were highly transferable and helping people/enterprises solve problems is greatly rewarding for me.
Most of my current work revolves around helping people create statistically sound metrics (such as churn or client activity rates), customer behavior/segmentation, experimentation (A/B testing) and predictive modeling. My statistical knowledge and analytical care are my most relied upon skills.
Needless to say, I love technology and science, an environment in which I feel native to. My academic background made me very data oriented, which I consider of utmost importance if the goal is to make truly data driven choices. Although statistical and machine learning modeling is staple for most data related work, I firmly believe in the importance of properly understanding and handling the data, besides carefully designing experiments as means to create robust and trustworthy analyses.
These are the tools I use more often:
-
R, Python, SQL and Spark: My heart lies with R, which I believe is amazing for statistical analyses and data visualization. I do pass the same amount of time programming in python, though, and SQL’s efficiency to extract data from databases is also essential to most data related work. I also use a lot of PySpark and Sparklyr. I like to see myself as language agnostic.
-
Statistics and Machine Learning: I use statistical tools to answer questions, test hypothesis and check for robustness. I also implement predictive machine learning models, such as logistic regressions, decision trees and unsupervised models in general. A large portion of this process relies on finding the adequate tool for the task in hand and fine tuning from that point. A statistical mindset is particularly important for experimental design (neglected more often than it should) and to develop metrics which make sense and are not driven purely by beliefs or arbitrary thoughts.
-
Data Viz: Essential to identify patterns, send messages and/or generate insight, data visualization can make all the difference in understanding the information contained in data. I create customized plots for reports, projects and general purposes. I also have experience developing dashboards (mainly in Looker). Also, although not the main objective, I think a good design makes everything more pleasant to the public.
-
Communication: Frequently, the main public interested in the outcome of an analysis is not the same who understand what goes under the hood. As such, this information should be translated in order to make complex topics easy to understand. Scheduled reports are also a frequent activity for any data analysis team. Academia prepares us for both of these scenarios and, as such, I aid in scientific writing, presentation development and in creating clear communication, occasionally making use of tools such as Rmarkdown for different types of publications.
A little bit of all that can be checked in my blog!
Blog, portfolio & snippets
Histórico de descrições de plantas brasileiras
Pequena análise temporal da taxa de descrição de plantas brasileiras. Nela observo como o número de descrições se comportou ao longo dos anos, a diferença entre os séculos, grupos taxonômicos e entre os estados brasileiros. Read more