Temperature in Colombia: Understanding Factors & Predictive Model Proposal
Authors: Ana Cruz, Andrés Riaño, Gabriela Rincón, Oscar Rosero, and Elsa Quicazán
Climate change is affecting Colombia and requires a deeper understanding of its temperature patterns and the factors that influence it. EcoTemp emerged as a solution to understanding climate change in Colombia by collecting information from main national entities:
- Providing three scenarios for predicting the average national temperature,
- Describing the annual behavior of temperature-affecting factors, and
- Presenting interactive graphs of temperature, deforestation, and GDP historical data at the departmental level.
Our team utilized our expertise in mathematics, economics, engineering, biology, and our Data Science training from DS4A to work together.
The main findings that came from our data analysis and modeling were:
- The change in temperature, which is an indicator of climate change, was almost 1°C in Colombia between 1990 and 2019.
- The urban population increased from 69% in 1990 to 81% in 2019, far exceeding the global estimate of 68% of the urban population by 2050.
- Forest cover decreased from 57% in 1990 to 52% in 2019. The department with the highest deforestation over the years is Caquetá, located in the Amazon region.
- The Gross Domestic Product (GDP) is concentrated in regions other than those with the highest deforestation, meaning that the exploitation of natural wealth is not reflected in the improvement of the local economy.
- The number of cattle heads is similar in pattern to the production of methane by agriculture.
- Three possible scenarios for the temperature in 2040 and its increase relative to the 1990-2000 period are:
- If energy consumption, gas emissions, GDP, population growth, and deforestation continue at their current rate: 25°C, an increase of 1.58°C.
- If they decrease: 24.5°C, an increase of 1.13°C.
- If they increase considerably (7% per year each): 27°C, an increase of 4°C.
The case diagram (fig 1) shows the relationships between actors and use cases within our system.
The dashboard was created using Dash, HTML, CSS, Bootstrap and the Plotly library. It included line plots, bar plots, choropleth maps, and interactive line plots with sliders. The model predicted temperature based on factors including forest area, energy consumption, population, greenhouse gas emissions, and GDP. The dataset was obtained from sources such as IDEAM, Our World in Data, and the World Bank database. The project was implemented using Python libraries such as Matplotlib, Seaborn, Bokeh, Plotly, Pandas, Numpy, Scikit-learn, Tensorflow, and Statsmodels. Our app coding and datasets are located in the GitHub repository https://github.com/Osc2405/DS4A_team38
Data analysis and computation
The goal of the study was to predict temperature in Colombia. We performed Exploratory Data Analysis (EDA) and observed the relationships between temperature and other variables such as population, forest area, energy consumption, greenhouse gas emissions, and GDP. Figure 2 presents some of the main trends observed in the data.
Some of the data tendencies in the analysis may be linked to significant economic, social and political events. For example, the 1999 economic crisis, the 2011 El Niño phenomenon, the 2014 Promotion Law 1715 which eliminated import tariffs for solar panels, Colombia’s membership in the International Renewable Energy Agency (IRENA) in 2015, the signing of the Peace Agreement in 2016, and the 2018 emergency at Hidroituango all potentially impacted the trends in the data. Understanding these events and their influence on the data helps to paint a more complete picture of the analyzed trends. However, these analyses were beyond the scope of our study.
For the model, we chose relevant variables based on the interaction between temperature and independent variables and among the independent variables themselves, including forest area, fossil fuel consumption, renewables consumption, total population, total greenhouse gasses emissions, and GDP. Existing models like EN-ROADS predict global temperature and greenhouse gas emissions using data from social, economic, law enforcement, and production sources, but our aim was to create a model specifically focused on Colombia.
The study used a train-test dataset with the train data covering 1990-2015 and the test data from 2016-2020. The majority (86%) of the data was used for training and 4 entries for testing. Initially, an OLS model was used, but due to multicollinearity among predictors, the results were unreliable.
Chosen model: Bayesian Ridge Regression Model
The final model used is a Bayesian model (fig 3), which provides more accurate and robust results due to its ability to handle uncertainty and incorporate prior knowledge. The Bayesian model was implemented using the PyMC3 library in Python and was able to predict temperature in Colombia with a high degree of accuracy. We presented the results of the Bayesian model in the final application, providing valuable insights into the behavior of temperature in Colombia and its relationship with other environmental, economic, and social factors.
Fig 3. Bayesian Ridge Regression Model Results
The dashboard was developed using Dash, HTML, CSS, Bootstrap and Plotly library and showcases a variety of visualizations such as line plots, bar plots, choropleth maps, and interactive line plots with sliders. It provides an overview of the EcoTemp project’s motivation and key data in the “Home” section (fig 4), a prediction of temperature change at the national level in the “Prediction” section (fig 5), and a visualization of temperature-related variables at the departmental (figs 6, 7) and national level (fig8) in the “Description” section, including heat maps, indicators for the selected year, and a slider to compare trends in variables such as energy consumption and greenhouse gas emissions. The “About Us” section (fig 9) presents information about the project team and includes their logo, symbolizing the impact of climate change, personal information, and social media links.
Fig 4. EcoTemp Home Section providing a brief overview of the project's motivation and presenting key data obtained from the information analysis.
Fig 5. EcoTemp Prediction Section showing the model's forecast of temperature at the national level with the best, medium, and worst options for future temperature change. The figure also displays the changes in variables such as forest area, fuel consumption, renewable energy consumption, population, greenhouse gases, and GDP in the 3 proposed scenarios.
Fig 6. EcoTemp Description Section displaying the visualization of variables related to temperature change at the national and departmental level. The section includes a heat map showcasing the data of temperature, PIB, and deforestation. The map is created using a geojson map of Colombia divided by department with a code assigned to each department. The forest data is extracted from a panda series and is presented in each area using the department ID
Fig 7. EcoTemp Description Section featuring a section with indicators for the last year in the selected range.
Fig 8. EcoTemp Description Section showcasing the use of a slider to change the year in the maps and compare trends in variables such as energy consumption, greenhouse gas emissions, population, and land uses. The section also presents a line diagram contrasting variables such as cattle vs. gas emissions, population vs. GDP, and GDP vs. CO2 emissions, showing a positive correlation in both trends.
Fig 9. EcoTemp About Us Section presenting information on the project team and its origin. The section includes the team's logo, symbolizing the impact of climate change on Colombian paramos, a source of life. The team is represented by a frailejon icon. Personal information, such as LinkedIn contacts and other social media links, are also displayed.
The solution was deployed on a virtual machine using GCP, and later on Google Cloud Platform and Heroku’s serverless services. The Serverless application deployed in Cloud Run has 2GB memory, 4 CPUs, a 300 sec request timeout, and a maximum of 10 instances.
The study provided an understanding of temperature behavior in Colombia and the factors that influence it. The predictive model and interactive dashboard offered valuable insights and information for informed decision making. The deployment in serverless services on Google Cloud Platform offers scalability and accessibility to users.
This was the final project for the Data Science for All Certification. We would like to express our gratitude to the Ministerio de Tecnologías de la Información y Comunicaciones for providing us with this opportunity and to all the individuals who offered their advice and expertise. Special thanks to our TAs, Aura Forero and David Alfredo Quintero Olaya, and the Data Science for All by Correlation One under the leadership of Oscar Adolfo Pérez Tuta for their invaluable contributions. We also acknowledge the valuable insights from Luis Enrique Carreño Herreño.
Software: Python (pandas, numpy, scipy, sklearn, statsmodels, plotly, ipywidgets, cufflinks, matplotlib, seaborn), Dash, HTML, CSS.
- View our project and its datasets at https://github.com/Osc2405/DS4A_team38/
Category: Modeling, visualization