Automatic translation in meetings: how does it work? - Jamy

automatic-translation-in-meetings-how-it-works

Subscribe for FREE

7 day free trial

Jamy.ai

Features

Integrations

Pricing

Affiliates

Blog

Real-time automatic translation allows converting speech into subtitles and translated audio during virtual meetings, eliminating language barriers. It uses artificial intelligence with three key technologies: automatic speech recognition (ASR), neural machine translation (NMT), and text-to-speech synthesis (TTS). Currently, it supports over 100 languages for subtitles and 36 languages for audio translation.

Why it matters:

Less than 17% of the global population speaks English, but 70% interact more in their native language.
Improves workplace productivity by 80% and reduces lost business opportunities by 25%.
The market is projected to reach $10.1 billion by 2027, with an annual growth of 19.8%.

How it works:

Speech recognition: Converts speech into text with high accuracy (WER below 4.5% in advanced systems).
Machine translation: Analyzes and translates complete sentences considering context and grammatical structures.
Audio generation: Transforms translated text into speech.

Compatible platforms:

Google Meet: Translation in 80+ languages and 4,600 language combinations.
Microsoft Teams: Subtitles and translations in 70+ languages.
Zoom: Live transcription and simultaneous interpretation.

Advantages:

Facilitates collaboration in international teams.
Improves accessibility for people with disabilities or language limitations.
Reduces costs by avoiding human interpreters.

Challenges:

Delays of 15-30 seconds in translations.
Challenges with dialects, local expressions, and complex structures.
Privacy and compliance risks (such as GDPR in Spain).

Real-time translation is transforming global communication, but it requires a balance between accuracy, speed, and security.

How is the translation of video calls from Google Meet?

How real-time translation technology works

The process of automatic translation in meetings combines several interconnected stages designed to transform speech into clear and understandable translations.

Speech recognition and voice-to-text conversion

The first step is to convert spoken words into text using machine learning algorithms. These algorithms process audio with models trained on large volumes of data, identifying patterns in human language such as grammar, syntax, and context. The most advanced systems use encoder-decoder Transformer models, which allow for direct mapping of audio features to subtitles, eliminating intermediate steps like phonetic representations.

Speed is key in this stage. For example, Speechmatics achieves 90% accuracy with less than one second of latency, while Notta reports an accuracy of 98.86% in high-quality audio. The best systems achieve a WER (word error rate) of 4.5 or less.

"Astonishingly accurate ASR is here. In real-time." - Speechmatics

Once the text has been generated, the next step is translation using advanced technologies.

Neural machine translation (NMT) and natural language processing (NLP)

Neural machine translation (NMT) analyzes complete sentences, identifying relationships between words regardless of their order. Natural language processing (NLP) breaks down the structure and meaning of the original text, adapting it to the target language to generate a coherent result. This approach allows for the resolution of ambiguities by analyzing the context of surrounding words, handling conjugations, agreement, and idiomatic expressions.

AI-based systems have achieved accuracy levels of up to 90% in certain language pairs. For example, Google's NMT system reduced translation errors by 60% compared to its previous version. Similarly, Meta's No Language Left Behind (NLLB) initiative improved translation quality in underrepresented languages by up to 70%.

However, challenges remain. A notable instance occurred in 2024 when Meta's real-time translation system for Ray-Ban glasses failed to correctly interpret the Spanish expression "no manches," translating it literally as "no stain."

What affects accuracy and speed

Although the most advanced systems achieve between 80% and 90% accuracy, errors are still possible.

Factors influencing accuracy:

Structural differences between languages.
Idiomatic expressions and specialized vocabulary.
The ability to interpret sarcasm, humor, or specific references from each language.
The quality and diversity of training data are also determining factors.

Factors affecting speed:

The system's latency directly influences the speed of translations.
On local devices, hardware performance is essential, while in cloud-based systems, connection stability can be a critical factor.

"Language is a complex and dynamic system that evolves over time, and it is influenced by cultural, historical, and social factors, making it difficult to capture its entirety through an automated process." - Spenser Mestel, The Atlantic

When selecting a voice-to-text model for real-time applications, it is important to evaluate factors such as WER, words per minute (WPM), cost, multilingual support, and streaming capabilities. In some cases, prioritizing higher WPM may be more practical, even if some accuracy is sacrificed.

How translation tools work with meeting platforms

The incorporation of real-time translation tools into videoconferencing platforms is changing how businesses and educational institutions communicate globally. These technologies eliminate language barriers, enabling people from different countries to collaborate through features like live subtitles, simultaneous translation, and professional interpretation services.

Compatible platforms and features

Today, major videoconferencing platforms have integrated automatic translation capabilities, each with its own level of development. For example:

Google Meet: With over 300 million monthly users, Google Meet launched its AI-based voice translation in May 2025 during Google I/O 2025. Since June 2025, it offers automatic subtitles in over 80 languages and translations covering more than 4,600 language combinations for Gemini for Google Workspace users.
Microsoft Teams: Integrates Microsoft Translator to provide live subtitles and translations in over 70 languages.
Zoom: Notable for its live transcription capability and channels for simultaneous interpretation, whether automated or with human interpreters.

A key difference is that while Zoom and Microsoft Teams offer native support for simultaneous interpretation, Google Meet relies on integrations with third parties for professional interpretation services.

Additionally, third-party tools expand possibilities. For example:

Talo AI: Compatible with Google Meet, Microsoft Teams, and Zoom, offers real-time translations in 60 languages.
DeepL Voice: Provides translated subtitles with AI in 60 languages, with high accuracy.
Interprefy: Connects with Google Meet to offer remote simultaneous interpretation and voice translation in over 6,000 language combinations.

Platform	Integration	Main features
Google Meet	Integrated AI, external integrations	Live subtitles, AI voice translation, Interprefy support
Zoom	Native features and plugins	Live transcription, interpretation channels, option for human interpreters
Microsoft Teams	Native features and integrations	Live subtitles and translations, integration with Microsoft Translator

These options allow for customizable experiences for users, adapting to different needs.

User experience and configuration options

Configuring these tools is straightforward and allows for adjusting the multilingual experience according to user preferences. For example:

In Zoom, activating transcription is as easy as clicking the CC button in the navigation bar and selecting "View Full Transcript." This function is compatible with both integrated capabilities and external services.
Specialized services like Verbit require an initial installation. Afterwards, users can set up transcription or subtitles and schedule CART (Communication Access Realtime Translation) sessions, ensuring professional transcription during meetings.

A notable case occurred in March 2025 when the NYC Tenement Museum implemented Sorenson Forum, an AI-based multilingual accessibility solution. This tool provides real-time subtitles in 25 major languages and 43 dialects for its tours and programs.

"We wanted to provide translation services for our visitors who do not speak English and who otherwise cannot understand the content and history of the museum. We have some of our primary sources translated into other languages, but the tours are conducted in English, so the challenge was to get translation for the entire program."
Allegra Tribone, Visitor Services Manager, NYC Tenement Museum

Additionally, platforms like Jamy.ai allow switching between more than 50 languages, adjusting settings such as subtitle readability and notifications, significantly enhancing the multilingual experience.

When choosing real-time translation software, it is essential to evaluate factors such as accuracy, ease of use, compatibility with existing tools, and the ability to adapt to different contexts. Testing available options and comparing their features can make the difference between a smooth meeting and one filled with language obstacles.

Benefits of accessibility and inclusion

Real-time automatic translation is transforming accessibility and inclusion in meetings, eliminating language barriers and allowing people from diverse backgrounds to participate fully.

Live subtitles and voice readings: an open door for everyone

Live subtitles and voice readings have created a significant shift for those with hearing impairments or difficulties understanding certain languages. These tools make content accessible effectively, regardless of limitations.

For instance, figures show that 1 in 3 adults aged 65 to 74, and half of those over 75, experience hearing loss. In Canada, over 22% of people speak a language at home other than English or French, while in the United States, over 67 million residents communicate in a language other than English.

A case illustrating this impact is Mercy Hill Church in Milwaukee, which uses an AI-based automatic translation solution to translate its sermons into Spanish. This has allowed them to replace volunteer interpreters with highly accurate real-time translations. Similarly, Kaufman Church in Texas has implemented this technology to support its growing Hispanic community, including refugee families, offering multilingual services that connect with their needs. The Stockbridge Community Church in Georgia has gone further, integrating AI voice translation with subtitles in both in-person and online services, benefiting its diverse congregation.

This technology is also making a mark outside the religious sphere. The Athens Solidarity Centre of Solidarity Now, for example, uses real-time translation to communicate with beneficiaries who do not speak Greek or English. This enables them to express their needs clearly and receive information they genuinely understand.

"Our beneficiaries don't necessarily speak Greek or English. They receive a lot of information, but they can't understand if it's accurate. By being able to speak their own language, they can express their needs and requests more accurately and also understand how things work." - Theo Bogeas, Coordinator of Athens Solidarity Centre, Solidarity Now

In addition to improving accessibility, these tools are revolutionizing the way multilingual teams work together.

Uniting international teams with automatic translation

Automatic translation is changing the game for international teams, allowing people from different countries to work together without language being a barrier. Now, communication flows more naturally and effectively.

Among the most useful functions are automatic language detection and multilingual transcriptions, allowing each user to review content in their preferred language. In a concrete example, the inclusion of customizable glossaries and precise translations has increased participation of Spanish speakers in municipal meetings by 300%, showing how this technology is democratizing communication.

"Webex's inclusive features help to create a level playing field for users, regardless of factors like language or geography. Enabling Real-Time Global Translations is another step toward driving an Inclusive Future, and an important component in fostering better communication and collaboration among teams." - Jeetu Patel, SVP and GM of Security and Applications, Cisco

Business data reinforces the importance of this technology: 75% of customers prefer to receive support in their native language, and businesses lose 29% of their customers due to a lack of multilingual support. Additionally, 76% of consumers choose brands that communicate in their language, and 75% are more likely to make repeat purchases when receiving attention in their native tongue.

Tools like Jamy.ai are taking this experience to another level, allowing switching between more than 50 languages during meetings. This not only enhances collaboration but also tailors the experience to the needs of each participant, creating a more inclusive and effective environment for international teams.

Challenges and limitations of real-time translation

Automatic translation has changed the way international meetings are conducted, but still faces challenges that can affect its effectiveness. Understanding these limitations helps better leverage its capabilities.

Common issues: delays, accuracy, and dialect recognition

One of the main drawbacks is delays. Processing, translating, and synthesizing audio creates latency that can range from 15 to 30 seconds, even in well-optimized systems. For example, Google's SimulTron processes sounds every 40 milliseconds, improving fluency.

Another challenge is accuracy. If the initial transcription fails, the translation loses the original meaning. As explained by Caroline Dockes, a machine learning engineer at Speechmatics:

"Unsurprisingly, when transcription breaks down, it is impossible for translation to recover the meaning of the original sentence."

Additionally, structural differences between languages complicate the process. Languages with more complex structures, such as Japanese or Korean, require more time to be processed correctly:

Language	Typical word order	Level of translation difficulty
Japanese	Subject-Object-Verb (SOV)	High – Requires extensive storage
Korean	Subject-Object-Verb (SOV)	High – Similar challenges to Japanese
German	Verb-Second (V2)	Moderate – Complex subordinate clauses
English	Subject-Verb-Object (SVO)	Low – Common target language
Spanish	Subject-Verb-Object (SVO)	Low – Minimal reordering

Dialects and accents also pose difficulties. Regional expressions, local vocabulary, and accents can confuse translation systems, affecting quality. Additionally, factors such as background noise or simultaneous conversations make accuracy even harder. To mitigate this, it is recommended to use dual microphones set to different frequencies.

Privacy and security concerns

Beyond technical challenges, data handling poses significant risks in terms of privacy. Translation systems process large volumes of information, exposing users to issues like data loss, unauthorized access, or privacy violations. In Spain, these concerns are especially relevant due to the GDPR and local regulations.

The Spanish Agency for Data Protection (AEPD) requires companies to notify any data breach within 72 hours. Fines for non-compliance can reach up to 20 million euros or 4% of the annual turnover.

A real example occurred in 2020 when an automatic translation service suffered an attack that exposed confidential government information. Additionally, data retention policies and lack of robust encryption during the translation process can jeopardize sensitive information. To minimize these risks, companies should choose services that offer end-to-end encryption and comply with GDPR. It is also advisable to limit the use of sensitive data and adopt internal security measures, such as protected networks and staff training.

Comparison of advantages and disadvantages

Advantages	Disadvantages
Immediate translation: eliminates language barriers	Processing delays: between 15-30 seconds
Reduced costs: no need for human interpreters	Accuracy issues: errors with dialects and local expressions
Multilingual capability: supports several languages	Technological dependency: requires stable connection
Continuous operation: available 24/7	Privacy risks: sensitive data processed in the cloud
Easy integration: compatible with videoconferencing platforms	Contextual limitations: difficulty with technical jargon
Constant improvement: AI models evolve	Costs associated with GDPR compliance

These points highlight the need to balance speed, accuracy, and security when implementing translation solutions. Localized AI technologies are gaining popularity for providing faster and more private translations, while hybrid models, which combine AI with human reviewers, are improving quality in sensitive sectors like law and medicine.

Understanding these limitations allows for better leverage of automatic translation tools in multilingual environments.

Conclusion

Automatic translation has transformed the way we communicate globally, breaking down linguistic barriers that once seemed insurmountable. Currently, translation systems achieve 80-90% accuracy, facilitating more natural conversations between teams speaking different languages and improving productivity in the business realm.

The economic impact of language barriers is hard to ignore. They can cost companies up to $500,000 a year in losses and hidden expenses, and 86% of professionals in the manufacturing sector acknowledge that these obstacles negatively affect their productivity. In response, tools like Microsoft Teams are rapidly adapting, with plans to support 51 spoken languages and 31 translation languages, demonstrating the tech sector's commitment to this need. This progress not only enhances business efficiency but also improves user experience.

The benefits of this technology are reflected in testimonials from professionals using it. Masato Esaka, Business Program Manager at Microsoft, states:

"I can think and speak at the speed of my first language... I can speak smoothly and articulate my thoughts clearly without worrying about what I sound like in English."

Petra Glattbach, Senior Business Program Manager at Microsoft Digital, highlights:

"Interpreter really is a game-changer... It fundamentally transforms how users participate and engage in Teams meetings where their preferred language isn't spoken. It democratizes language for all users at Microsoft and creates an inclusive and supportive meeting environment."

In addition to productivity, automatic translation has a direct impact on workplace well-being. According to data, 65% of employees would feel more satisfied if there were no language barriers, nearly half believe these obstacles contribute to high turnover rates, and one-fourth of workplace accidents are attributed to communication issues.

To maximize this technology, companies need to analyze their specific needs, train their employees on the use of these tools, and evaluate both costs and long-term benefits. In an increasingly globalized world, investing in automatic translation is not just an option but a key strategic decision.

With advancements in artificial intelligence, we can expect even more accurate, faster, and well-integrated translation systems with business platforms, paving the way for barrier-free communication.

FAQs

How does automatic translation safeguard privacy and data security in meetings?

Real-time automatic translation safeguards your privacy and security

Real-time automatic translation places special emphasis on maintaining privacy and security of data. This is achieved through several key strategies:

End-to-end encryption, ensuring that information is protected while in transit.
Real-time processing that avoids storing sensitive data, reducing risks.
Implementation of secure communication protocols, complying with current data protection regulations.

Thanks to these measures, your conversations and data remain protected, allowing you to enjoy smoother and more accessible meetings without compromising security.

How do automatic translations handle complex dialects and local expressions?

The limitations of automatic translations

Automatic translations often face challenges when interpreting complex dialects or local expressions. This is because these elements rely on very specific cultural and contextual nuances that algorithms do not always capture accurately. While these tools are advancing by leaps and bounds, their effectiveness can vary considerably depending on the language and context in which they are used.

To achieve more precise translations, it is essential to opt for tools that allow customization of settings or, better yet, combine their use with human reviews. This approach not only enhances translation quality but also ensures clearer and more effective communication, especially in situations where local nuances play a crucial role.

How does latency affect real-time automatic translation during meetings?

The importance of latency in real-time automatic translation

Latency in real-time automatic translation plays a crucial role in how users perceive the quality of service. When delays are apparent, the conversation loses fluency, complicating the exchange of ideas and generating interruptions that can disrupt the flow of a meeting.

In contrast, low latency allows interactions to be more natural and fluid, resulting in clear communication without awkward pauses. This is especially relevant in professional environments, where both accuracy and speed are essential to ensure productivity and smooth development of activities.

Frequently Asked Questions

Free trial plan for Jamy?

What are the pricing plans?

How does Jamy work?

How is my information protected?

Does Jamy integrate with other tools?

Jamy.ai

Jamy.ai is an AI-powered meeting assistant that joins your virtual calls, records audio and video, generates transcriptions, summaries, and extracts the main topics and tasks related to the meeting.

Main

Home

Features

Integrations

Prices

Blog

Legal

Privacy Notice

Terms and Conditions

Contact

How is the translation of video calls from Google Meet?

How real-time translation technology works

Speech recognition and voice-to-text conversion

Neural machine translation (NMT) and natural language processing (NLP)

What affects accuracy and speed

How translation tools work with meeting platforms

Compatible platforms and features

User experience and configuration options

Benefits of accessibility and inclusion

Live subtitles and voice readings: an open door for everyone

Uniting international teams with automatic translation

Challenges and limitations of real-time translation

Common issues: delays, accuracy, and dialect recognition

Privacy and security concerns

Comparison of advantages and disadvantages

Conclusion

FAQs

How does automatic translation safeguard privacy and data security in meetings?

Real-time automatic translation safeguards your privacy and security

How do automatic translations handle complex dialects and local expressions?

The limitations of automatic translations

How does latency affect real-time automatic translation during meetings?

The importance of latency in real-time automatic translation

Related posts

Frequently Asked Questions

Frequently Asked Questions

Free trial plan for Jamy?

What are the pricing plans?

How does Jamy work?

How is my information protected?

Does Jamy integrate with other tools?