Discovering the Power of Data to Predict Forest Fires

by Diana Drake

Fire season on the West Coast of the U.S. has been nothing short of disastrous in 2020. The numbers are staggering. More than 5 million acres combined have burned in California, Oregon and Washington so far in some of the largest fires ever recorded. By all accounts, the experts say that since the summer, 2020 is the most active fire year on record for the West Coast – ever.

And the flames rage on.

“The El Dorado Fire, which was caused by [pyrotechnics] from a gender-reveal party, is 15 miles from my house in California,” says Emily Fu, an undergraduate in business analytics at the Wharton School who is now back in Philadelphia for her senior year. “For us, it has become such a normal thing. Every couple of years the mountain catches fire and if it’s really bad, we’ll evacuate. I call home pretty often and my family didn’t even mention it. I asked my dad, ‘How close is the fire?’ He sent me a video and said, ‘Oh, you can see the flames.’”

A Little Modern Data Mining

While the wildfires of recent years may no longer stoke Emily’s anxiety, the numbers rising out of the smoke and ash have ignited her curiosity.

Emily and her classmates Zhun Yan Chang and Melisa Lee, also Wharton seniors, began digging deeply into wildfire data last year as students in Prof. Linda Zhao’s Modern Data Mining class at Wharton. Data mining is the practice of examining large databases – for instance “Wildfires in the U.S. for the Past 30 Years” – in order to generate new insights.

Social impact should be at the center of your data project.” — Melisa Lee, Wharton Student

Inspired by Emily’s personal connection to the fires and their collective fascination with data analytics, the team set out to answer a specific question: “Given its features, can we predict the size of a fire?”

The team ultimately created a data model, a descriptive diagram of relationships between various types of information, for their class project that they hope may someday be used to prevent forest fires in California. Their work earned them a spot among the industry professionals, Ph.D students and professors presenting at the Women in Data Science Conference at the University of Pennsylvania in February 2020.

They were excited to tell their fire-research story, inspired by weeks of data collecting and analysis. “We found out that a very small percentage of fires, less than 1%, cause 80% of the destruction. That told us if we can prevent that less than 1% and focus on stopping those fires, we can cut down on a big portion of the destruction,” notes Emily. “So, what’s causing those fires? Most of these are fires from lightning striking a tree or forest…When we ran the model, it spit out some specific vegetation types that were predictors in how big a fire was — like timber litter or dead branches that fall on the ground and are dry. They are causing these large fires. Our storyline: These lightning fires are the most destructive, they spike up in the summer, and are also linked with vegetation.”

Plans to share their findings with the California government have been delayed by the pandemic. Still, the three data scientists-in-training were eager to pass along some takeaways from their research process when Wharton Global Youth checked in with them last week.

Fruit Trees or Lightning?

First, data science demands you to extract the most compelling story behind the numbers. You can build models and encode things, and suddenly you’re left with 30 variables that, for example, influence fire size. Finding the commonalities between all these things? That’s up to you. And arriving at the best conclusions will take time. “Once you get the proper data, you shouldn’t straight out build a model,” says Zhun, a finance major from Malaysia. “You should look at the data, figure out how it is going to fit into the big picture and the impacts of each variable. Use data analysis, simple charts and visualizations to have an idea of what your final results should be.”

Then you begin to shape the story that you want to tell. “Our team could have told many stories with our data,” notes Melisa, who is pursuing a career in marketing and business analytics. “An alternate story was that the vegetation type around orchards is also very flammable because of the fruits. But that story would not have driven as much impact or incited people to act as much as our lightning story. Social impact should be at the center of your data project. Building a model should be for some ultimate purpose to let people know that something is happening and you have the data to back it up.”

Emily, Melisa and Zhun are convinced that data has immense power (did you know it has replaced oil as the world’s most valuable resource?). They plan to spend part of their senior year working to get their project into the hands of decision makers who are confronting the worsening West Coast wildfires.

“Our model highlighted a lot of high-risk areas that we can target. We can target these by using prescribed burns, which are smaller controlled fires that will burn up all the timber litter by summer so that once fire season comes along, there’s not much to catch on fire,” says Emily. “We want our model to motivate the California State Department to focus on having more prescribed burns, but also to make sure the ones we do have are well targeted so we’re preventing the larger fires going forward.”

Related Links

Conversation Starters

Why is the story behind the numbers so important to data analytics? How did Emily, Melisa and Zhun shape their findings into a story?

How do you describe the power of data? Why is it such a valuable and important resource worldwide?

Do you take statistics or data analytics classes in high school? Are they focused only on the numbers or do they provide context to the real world? What are some examples of how data projects have become more relevant for you? Describe one in the comment section of this article.

5 comments on “Discovering the Power of Data to Predict Forest Fires

  1. Emily and her team inspired me to open my eyes and realize that data analysis can be applied to prevent not only forest fires, but also a variety of other critical environmental issues. Learning about the power of data that helped Emily and her team resolve the problem of forest fires provoked an interesting idea in my head. “Why don’t I use data analysis to solve the environmental issues in my own country, South Korea?”

    Though I was born in South Korea, I spent most of my childhood in Germany because of my dad’s business. When my father told me and my mom that we were finally heading back to Korea, I was eager to go back to my home country. From all the nostalgic memories of Korea, I especially missed going to the summer house that my family owned in the countryside of South Korea called Hongcheon. I still remember all the good memories I had there, such as catching fish in the clear lakes, stargazing at night, scavenging for wild fruits, and hiking up and down the hills. However, when I finally arrived in Hongcheon, it did not take me a long time to realize that many things in Hongcheon have changed since I have left Korea. The first thing that caught my eyes was the massive chemical factories lined up throughout Hongcheon lake. I couldn’t catch sight of any fish in the lakes, the night sky was foggy with no signs of stars, the places where I scavenged for wild fruits were replaced with construction sites.

    Yes, I have listened to Greta Thunberg’s inspiring speech at the UN Climate Action Summit and watched multiple documentaries promoting sustainable growth. However, I have always been neglecting to take care of the environmental pollution around the world as I thought it would not be a problem that will affect my daily life. After witnessing the changes in Hongcheon around my weekend house in 9th grade, I learned how immature and ignorant I was towards greater societal issues. After realizing the insufficient part of me, I was motivated to take small steps and put them into action. Researching different environmental organizations, I started to donate 20$ (a money that past myself would have carelessly spent on purchasing new clothes) monthly to the Korean Federation for Environmental Movements(KFEM), an organization that adopted policies to abate GHG emission rates and invigorate and expand the use of renewable energy to reduce the number of greenhouse gases.

    However, I always felt that monthly donations are not enough to actually help sustain the environment, and the answer was in the article: use data analysis to make a sustainable world. Knowing how much Hongcheon has changed from my childhood memory, I wanted to alert the people of what I have seen and noticed through presenting data evidence. I started to collect climate information about Hongcheon from Korean national weather center and calculated the difference in average temperature level in July from 2010 till now. I found out that the average day temperature rose from 32.5° to 35.1° just in 10 years, meaning that by 2050, the average day temperature of Hongcheon will rise to 42.9°.

    After a few days, I conducted another data analysis experiment with a pH Sensor that I borrowed from Mr. Hershfield, my chemistry teacher. I used it to measure how polluted the Hongcheon lake is by detecting the acidity of the water at the top and the bottom of the stream. I first hiked up the hill to measure the acidity of the water at the high point of the stream and found out that the pH level was 6.8. Then, I hiked back down to measure the low end of the stream, and while descending, I saw various colossal garbage disposal sites and factories near the bottom stream of the lake. As expected, the resulting pH level of water at the bottom end of the stream was 5.5. With the series of data analysis that I conducted a week ago on how sustainability of Hongcheon is at stake, I emailed the Hongcheon County Office to alert the officials about this situation.

    Furthermore, with Alex, or my geek friend who knows much more about data analysis, I decided to create a club called Green Up that focuses on saving our environment through promoting the use of more renewable energy, making donations to widely known environmental organizations, reducing the use of plastic, and much more. With the small but powerful steps that I will take from now on, I hope that I could inspire other members of generation Z to use data analysis to tackle important social issues that are yet overlooked.

  2. This article caught my eye when I was scrolling the website, as I had gone through something similar.

    In October 2020, I also was hit by the west coast fires, specifically the Silverado Fire, which was only a block away from my house. That was the first time I had experienced an evacuation and something that I will never forget: having to go to my school that day with the sky bright orange, evacuating from school in the afternoon, and staying at a hotel that night. I can understand how Emily and her peers were affected by these fires.

    It is amazing that these students went all the way to creating a data point and an analysis on how to prevent these fires from happening. They seem like such inspirations to look up to and allowed me to realize that even students can create things that no one has thought of. This article showed me that I can also learn to make a change even now as a high school student, with things that I am interested in (business and education).

    I hope to use this article as motivation to try something new and make a change within my community.

  3. These Wharton seniors really inspire me. I like how they took their technical skills of data analysis and applied it to a real-life relevant issue. Emily was able to take this threatening fire problem affecting her family and bring it to a bigger stage. If we can focus in on a few of these key factors that increase the chances of fires, then we can really act to minimize their impact on society. I remember how my brother worked with NASA Worldview reflectance images to track the regrowth of vegetation patches after these horrific fires. He was able to increase awareness of the devastation from campfires by showing how long it took for vegetation to recover. I think that Emily and her partners can extend their data mining process to this post-fire phase and come up with a model to help speed up this recovery process.

    My “innovation trigger” is also tingling in another direction. The data mining technique to detect forest fires can be used to curb spending in healthcare. As I shadow a healthcare provider this summer, I constantly think about the socioeconomic trends of the rising cost of medical care. I keep hearing about how just a small percentage of patients are really sick and how these individuals account for the bulk of the spending. An analysis of the 2019 Medical Expenditure Panel Survey data showed that just one percent of the population accounted for 24% of all out-of-pocket spending for health services. Patients in the top 5% made up almost half of all the out-of-pocket spending.

    Identifying factors that contribute to the lion share of expenditures by these particular patients could lead to a model to address these factors. What if most expenditures are related to advanced diseases that arise due to poor access? And what if these access problems were from systemic disparities due to race or gender? Addressing these potential socioeconomic issues would bring faces to the forefront of just numbers from statistical data. By telling their compelling stories, we as Gen Z innovators can ultimately reach policy makers who can reverse our current medical spending trend.

  4. Emily, what you did here is amazing! Like we are a city who has been fighting an enemy stronger than us, we’ve been needing Emily and her friends’ projects for years now. According to the Insurance Information Institute, in 2021, more than 2 million properties were reported to be prone to high or extreme risk of wildfires in just California alone. If that statistic is shocking to you, then it should be painfully obvious that we need a way to predict wildfires, which is exactly what Emily’s model offers. By mining data from many large databases, the group was able to compile the overall trajectory and impact of each variable, allowing them to find an interesting conclusion: less than 1% of all the forest fires are causing 80% of the havoc and those particular are related to vegetation. I knew that to keep a fire going, it needs fuel to feed itself so the same application should apply for wildfires, but to a larger extent. After doing some research, according to the National Fire Protection Association, vegetation is responsible for igniting an average of 6,200 homes and 130 million U.S dollars in direct property damage. But just random tinder or grass would not be enough to cause a forest fire, otherwise there would be a dozen miniature fires blooming somewhere. No, according to Nasa, those powerful fires are the results of drawn out droughts. The drier the kindle, the more likely that events such as lightning strikes to ignition via power lines will be able to cause fires. What’s more terrifying is that there could be a feedback loop as a result of the increased fires: fire burns down a forest, which then releases stored carbon into the atmosphere, resulting in higher temperature, which leads to a perfect situation for wildfires. Not only that, but wildfires cause snow-melt to occur earlier as its soot blackens the snow, causing it to absorb more heat and thus melt. In perspective, this gives the wildfire a perfect situation to thrive in this world.

    So with this hopeless situation in mind, is there a way to fix this?

    Well, Emily proposed that by using the model, they are able to find areas that are prone to fires and start controlled fires to reduce the potential kindling. I think that’s a great idea, but I want to add some extensions to improve on the model.

    First, I believe that the model should also consider areas where there are frequent power outages. How do power outages relate to forest fires? When trees and other vegetation are in the way of infrastructure such as power lines, this could cause not only power outages, but also forest fires. In fact, according to “Reducing Service Interruptions and Wildfire Risks” by Corteva Agriscience, 23.2% of distribution outages are caused by vegetation. Using this knowledge, knowing where power outages are caused by vegetation could give us a precursor for places that are likely to ignite wildfires. So if you widen the scope from just areas with high vegetation to areas that have frequent power outages caused by vegetation, that can give you more leverage to deal with potential fires.

    Second, if the first suggestion is viable, then there must be an alteration in handling areas with high risks. You can’t just start a controlled fire in places with power lines—that would be crazy and dangerous. I suggest you follow in similar fashion to what the Pacific Gas and Electric Company has done: they have initiated the Enhanced Vegetation Management Program (not to be confused with Vegetation Management Program, which, by the way, is another great program you could take inspiration from), in which they not only follow state standards of keeping power lines 4 feet away from vegetation and removing dead or dying trees, but also going beyond the standards by cutting overhanging branches above the power lines. However, do not underestimate the foliage. If we use mechanical mowing as our solution, that can actually exacerbate the problem by inciting the plants to regrow at a faster rate. Many experts are promoting the use of selective herbicides to control the plants. By selectively controlling the plants, not only do you get to keep plants that you want, but the maintenance would be less costly and less frequent.

    Third, you can take inspiration from indigenous peoples for how to start a proper, controlled fire. Prior to the colonists expanding across the country, the indigenous people had an intimate relationship with the forests by using fire, exactly what you, Emily, is proposing. The Native Americans selectively domesticated, harvested via coppicing (a method of collecting food without cutting down the trees, thereby preserving the forest), and burned the forest periodically to promote the growth of desirable plants. They had even sown desirable plants that were edible after a fire to establish good germination. As a result, they were able to control the American forests in a manner that made it productive, diverse, more resistant to diseases, and less fire-prone.

    And finally, when considering starting a premeditated fire, you should try to tinker with the factors because it could also provide an unexpected result: the production of charcoal. Why is charcoal so important to this conversation? According to “Understanding How Forest Fires Affect Carbon Emissions and Climate Change” by Swansea University, a study led by Stefan Doerr and Christina Santin had found that during the fire, charcoal is produced that contains more carbon than dead vegetation. Since carbon is haphazardly released into the atmosphere in the midst of a fire, the fact that those same carbon can be trapped in charcoal is great news. This is because charcoal can act as carbon sinks that can last for centuries. This means if you can theoretically create a fire that allows you to create more charcoal instead of other byproducts like soot that, referenced before, could start earlier snow-melt, then we can actually add another benefit to the controlled fires! In combination with your data, this would allow for a greater treatment of California’s wildfire problems because you not only have a dataset that shows the high risk areas, including areas with high risk to wildfires and power outages, but now you have a methodology to prevent such fires unlike your previous model.

    In any case, what Emily has initiated is fantastic because most of the involvement is currently from major organizations like NASA or other governmental policies. The fact that there is such an effort by locals can incite those who had felt discouraged to help and I can’t tell you how important that rousing can do for a movement such as this one. Wildfires have taken the typical people’s homes, dear people, and money and the effort to prevent shouldn’t be limited to large government agencies. However, being hasty is never a good idea. Your idea is novel, but to go to solve America’s most destructive problem with a simple plan would only exacerbate the problem. I think what I wrote above is a great start to resolving an issue. If you carelessly only use prescribed burns like what you mentioned in the article, this project might fail. I think with learning Native American’s relationship with the forest and a larger search area, there may be a greater chance for your project at succeeding. Good luck on your journey!

  5. Firstly, congratulations on the amazing, and also important job done by the three students, Emily, Melisa and Zhum, who have made a work which, by analysis of data, is able to predict future forest fires. This work has impacted many people and showed the importance of the protection of not only our forest, but also of our planet. As a Brazilian student who really cares about the Amazon rainforest, the Pantanal and Cerrado areas, I strongly agree with the development of those new technological features which can help us to predict not just forest fires, but also other natural events with catastrophic ends.

    One of the most important things to be done by my generation is reducing the global warming effects, which are affecting not only our daily lives but our planet’s lifespan and with this new project we can finally have a chance of doing it. I believe that this type of analysis could be applied not only in the U.S.A but also in Brazil, especially for the Pantanal area which in the past years has suffered a lot with forest fires. Another great usability of this technology, also applied in Brazil, is to analyze and prevent deforestation and illegal cattle raising in the Amazon rainforest.

    A suggestion (an upgrade) for this amazing work is to try using drones to collect data and realize a deeper analysis in the areas, providing more data during the dry season or heavy rains periods. Another idea is to use small radars in the area, powered by solar energy, which would be able to detect and collect information, being used to warn about a possible forest fire situation. Processing the analyses of the sky in days in which those incidents happened using those radars or satellites, could be a good way of predicting it. Talking about the problem discovered about the fruit trees, a more ecological ideia, is to do a reintroduction of native animals from this place to reduce the number of forest fires caused by lighting on fruit trees, and meanwhile gather data during the reintroduction process using one of the equipment suggested.

    Therefore, this text has helped me to understand more about how the collection of data can help a lot in environmental preservation.

Leave a Reply

Your email address will not be published. Required fields are marked *