DOES THE SOUND ACTUALLY MATTER? WHAT I LEARNED FROM STUDYING AUDIO IN DESTINATION MARKETING VIDEOS 

Juho Pesonen
Find me

By Aditya Aggarwal | Master’s in Tourism Marketing and Management, University of Eastern Finland 

When I started thinking about my thesis topic, I kept returning to something I had noticed almost instinctively while scrolling through travel content on Instagram and TikTok. Two videos of the same destination, same lighting, same scenery, similar editing, could feel completely different from each other, and a lot of the time, the difference came down to sound. One had a trending pop audio clip that felt upbeat but oddly disconnected from the footage. The other had soft environmental sounds, wind, birdsong, the crunch of snow, that made me feel like I was actually there. 

That observation became the foundation of my thesis: How does perceived audio-visual congruence in short-form social media destination marketing videos influence tourists’ visit intention and intention to recommend a destination? 

What the Study Looked At 

To answer that question, I ran an online experiment with 119 international participants, showing each of them the same 30-second video of the Finnish Lakeland region in winter, frozen lakes, snow-covered trees, pale winter light. The only thing that changed across the four groups was the audio: 

  • Congruent soundscape — environmental winter sounds matching the footage (wind, snow, birds) 
  • Trending audio — a popular social media sound clip 
  • Background music — generic, non-trending instrumental music 
  • Incongruent audio — environmental sounds from a different season (autumn), mismatched to the winter visuals 

After watching, participants rated how much they felt the audio and visuals fit together, how emotionally engaged they felt, how much they felt transported into the destination, and whether they would want to visit or recommend Finnish Lakeland. 

The theoretical frame I used was the Stimulus–Organism–Response (S–O–R) framework, the idea that an external stimulus (in this case, audio-visual congruence) shapes internal psychological states (emotional engagement and sense of presence), which then drive behavioural responses (visit intention and recommendation intention). 

What I Found 

The most consistent result across the entire study was this: the type of audio paired with the visuals meaningfully shaped how viewers experienced the destination psychologically. 

The congruent soundscape group reported the strongest perceived fit between audio and visuals (mean score of 4.29 out of 5) and the highest emotional engagement (3.94). The background music group produced the lowest scores on both emotional engagement and sense of presence, consistently, across every analysis. 

But the most theoretically interesting finding came from the structural model. Sense of presence emerged as the dominant pathway connecting perceived audio-visual congruence to both visit intention and recommendation intention. In other words, when audio and visuals aligned well, viewers felt more mentally situated within the destination, and it was that feeling of psychological transportation, rather than general emotional response alone, that most strongly drove their intentions to visit and recommend. 

Emotional engagement also mattered. It predicted both outcomes directly, and it showed a particularly strong connection to recommendation intention, stronger than its connection to visit intention. That asymmetry makes intuitive sense: sharing a video or telling someone “you have to go here” is an affective act. You recommend something because it felt good, not just because you could imagine yourself standing in it. 

One additional finding worth mentioning: across the full sample, visit intention increased slightly but significantly after watching the video, from a mean of 3.39 before viewing to 3.56 after. It is a small shift, but it indicates that even a single 30-second exposure produced a real directional change, not just noise. 

What This Means for Destination Marketing 

This is where I think the findings move from academic to practically useful, and what I would want anyone working in destination marketing to take away from this research. 

Audio is a strategic decision, not a background detail. The most persistent finding in this study is that background music, the most common default in destination videos, partly because it is legally safe and easy to source, actively underperformed. It was not neutral. It produced lower psychological immersion and lower emotional engagement than either the congruent soundscape or the trending audio. If a marketing team reaches for a royalty-free track because it is easy and inoffensive, the data from this study suggest they may be leaving meaningful engagement on the table. 

For atmosphere-dependent destinations, congruent soundscapes do something that other audio types cannot. Finnish Lakeland’s identity is built around quietude, stillness, and sensory proximity to nature. A winter soundscape, wind moving across ice, distant bird calls, the particular silence of snow, allows viewers to mentally locate themselves in that environment. A trending pop clip can generate attention and platform engagement, but it does not produce that sense of place. The two audio types serve different goals, and understanding that distinction is important for anyone planning a content strategy. 

A practical dual-track approach may be worth considering. Trending audio can be effective at generating immediate engagement and algorithmic reach, especially on TikTok and Instagram Reels where platform-native sounds carry cultural familiarity. Congruent soundscapes are better suited to content specifically designed to build destination image and travel motivation. Rather than choosing one over the other, DMOs could use trending audio for awareness-stage content while reserving soundscape-matched video for deeper, consideration-stage content. 

Emotional engagement is a driver of organic word-of-mouth. The stronger connection between emotional engagement and recommendation intention, compared to visit intention, points to something practically relevant for social media marketing. When a viewer feels genuinely pleasant and relaxed watching a destination video, they are more likely to share it, tag someone, or leave a comment encouraging a friend to go. Content that feels emotionally good to watch does not just inspire travel, it amplifies itself through the networks of the people watching it. 

Audio testing could reasonably be incorporated into content production workflows. The experimental design of this study, holding visuals constant and varying only audio across conditions, is essentially a template that a marketing team could adapt informally. Before publishing, content creators could test two or three audio configurations with a small group of target-audience members and select based on which produces stronger reported engagement and sense of immersion. This does not require a formal research apparatus. Even a small, structured informal test could provide meaningful directional guidance. 

What This Study Cannot Tell You (And Why That Matters) 

I want to be honest about the limits of what this research can claim. The structural model showed poor fit by conventional standards, which means the path coefficients and mediation results should be treated as indicative and direction-setting rather than definitive. The sample of 119 participants fell below what is generally recommended for the type of modelling used, and it was drawn largely from student and personal networks rather than a representative international tourist population. 

This does not mean the findings are wrong, but it does mean they are preliminary. The patterns observed here need replication with a larger, more diverse sample before stronger claims about generalisability can be made. Finnish Lakeland in winter, with a 30-second video, is one context. Whether the same patterns emerge for a different destination, a different season, or a different video length is an open question. 

A Final Thought 

What I came away from this study believing is that sound in destination marketing videos deserves to be treated with the same deliberateness that visual aesthetics typically receive. The footage, the colour grading, the editing rhythm, these are all understood as craft decisions that shape how a destination is perceived. The audio layer shapes that perception too, and in ways that go beyond atmosphere or mood. It affects whether viewers feel present in the destination, and presence, as the findings suggest, is one of the most powerful psychological bridges between watching a video and wanting to be somewhere. 

For destinations like Finnish Lakeland, where the draw is experiential, sensory, and difficult to convey through purely visual means, that bridge matters a great deal. 

Master’s thesis in Tourism Marketing and Management at the University of Eastern Finland Business School in 2026, supervised by Professor Juho Pesonen and Dr Kelsey M. Johansen. This research was supported by SUE