Recent breakthroughs in generative AI have markedly elevated the realism and controllability of synthetic media. In the visual modality, long-context attention mechanisms and diffusion-style refinements now deliver videos with superior temporal consistency, spatial coherence, and high-resolution detail. These techniques underpin an expanding set of applications ranging from text-guided storyboarding and animation to engineering visualization and virtual prototyping. In the audio modality, token-based representations combined with hierarchical decoding enable the direct production of faithful speech, music, and ambient sound from textual prompts, powering rapid voice-over creation, personalized music, and immersive soundscapes. The frontier is shifting toward unified audio–visual pipelines that synchronize imagery with dialog, sound effects, and ambience, promising end-to-end tooling for a wide variety of applications such as education, simulation, entertainment, and accessible content production. This review surveys these advances across modalities and outlines future research directions focused on improving generation efficiency, coherence, and controllability across modalities.
{"title":"Artificial Intelligence in Multimedia Content Generation: A Review of Audio and Video Synthesis Techniques","authors":"Charles Ding, Rohan Bhowmik","doi":"10.1002/jsid.2111","DOIUrl":"https://doi.org/10.1002/jsid.2111","url":null,"abstract":"<p>Recent breakthroughs in generative AI have markedly elevated the realism and controllability of synthetic media. In the visual modality, long-context attention mechanisms and diffusion-style refinements now deliver videos with superior temporal consistency, spatial coherence, and high-resolution detail. These techniques underpin an expanding set of applications ranging from text-guided storyboarding and animation to engineering visualization and virtual prototyping. In the audio modality, token-based representations combined with hierarchical decoding enable the direct production of faithful speech, music, and ambient sound from textual prompts, powering rapid voice-over creation, personalized music, and immersive soundscapes. The frontier is shifting toward unified audio–visual pipelines that synchronize imagery with dialog, sound effects, and ambience, promising end-to-end tooling for a wide variety of applications such as education, simulation, entertainment, and accessible content production. This review surveys these advances across modalities and outlines future research directions focused on improving generation efficiency, coherence, and controllability across modalities.</p>","PeriodicalId":49979,"journal":{"name":"Journal of the Society for Information Display","volume":"34 2","pages":"49-67"},"PeriodicalIF":2.2,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sid.onlinelibrary.wiley.com/doi/epdf/10.1002/jsid.2111","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146139231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaybum Kim, Kyeong-Soo Kang, Ji-Hwan Park, Chanjin Park, Minji Kim, Soo-Yeon Lee
In this paper, we present an amorphous indium-gallium-zinc-oxide (a-IGZO) based pulse width modulation (PWM) pixel circuit for micro light-emitting diode (micro-LED) displays, designed to enhance low gray-level expression. Conventional PWM circuits suffer from a long falling time, causing inaccurate gray-level representation. To address this, the proposed circuit employs a continuous piecewise linear sweep signal with two distinct slopes: a steeper slope for low gray levels and a shallower slope for mid-to-high gray levels. This approach reduces the falling time from 209 to 62 μs, enabling accurate gray-level expression down to 41 G without distortion. To prevent falling time distortion from sweep signal slope transitions, a newly proposed separation part divides the emission period into low and high gray phases. By maintaining a constant sweep signal slope within each phase, the circuit ensures stable emission and eliminates distortion. HSPICE simulation verifies the circuit operation and confirms that it maintains stable performance under threshold voltage variations.
{"title":"Oxide TFT-Based Micro-LED Pixel Circuit With Piecewise Linear Sweep Slope Signal for Improved Low Gray-Level Expression","authors":"Jaybum Kim, Kyeong-Soo Kang, Ji-Hwan Park, Chanjin Park, Minji Kim, Soo-Yeon Lee","doi":"10.1002/jsid.2110","DOIUrl":"https://doi.org/10.1002/jsid.2110","url":null,"abstract":"<p>In this paper, we present an amorphous indium-gallium-zinc-oxide (a-IGZO) based pulse width modulation (PWM) pixel circuit for micro light-emitting diode (micro-LED) displays, designed to enhance low gray-level expression. Conventional PWM circuits suffer from a long falling time, causing inaccurate gray-level representation. To address this, the proposed circuit employs a continuous piecewise linear sweep signal with two distinct slopes: a steeper slope for low gray levels and a shallower slope for mid-to-high gray levels. This approach reduces the falling time from 209 to 62 μs, enabling accurate gray-level expression down to 41 G without distortion. To prevent falling time distortion from sweep signal slope transitions, a newly proposed separation part divides the emission period into low and high gray phases. By maintaining a constant sweep signal slope within each phase, the circuit ensures stable emission and eliminates distortion. HSPICE simulation verifies the circuit operation and confirms that it maintains stable performance under threshold voltage variations.</p>","PeriodicalId":49979,"journal":{"name":"Journal of the Society for Information Display","volume":"34 1","pages":"25-32"},"PeriodicalIF":2.2,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sid.onlinelibrary.wiley.com/doi/epdf/10.1002/jsid.2110","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145987197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hamid Reza Tohidypour, Frank Seto, Panos Nasiopoulos
The adoption of variable refresh rate (VRR) technology in displays—aimed at reducing input lag, minimizing video stuttering, and improving power efficiency—has introduced an unforeseen challenge: flicker caused by minor changes in luminance due to the varying duration of each frame. Existing industry flicker measuring metrics are inadequate, often overly restrictive or reliant on impractical subjective evaluations. This highlights the need for an accurate, objective flicker metric specifically designed for VRR displays. Developing such a metric requires a comprehensive dataset that captures a wide range of flicker intensities across different display technologies and luminance conditions. To facilitate this, we compiled a unique VRR dataset consisting of 160 signals, ranging from 2 to 40 cd/m2, along with perceived flicker levels obtained through extensive subjective testing, following a standard protocol defined in ITU-R BT.500-15. This dataset serves as a critical resource for flicker assessment, providing valuable insights for display manufacturers, and it is instrumental in advancing VRR technology. Our analysis revealed that JEITA, the most widely used flicker metric for VRR displays, correlates with subjective flicker perception at only 71.43%. This finding underscores the limitations of current metrics and the pressing need for a more reliable standard tailored to VRR technology.
{"title":"A Comprehensive VRR Dataset of Luminance Signals and Their Perceived Flicker Levels: Insights for Display and GPU Manufacturers","authors":"Hamid Reza Tohidypour, Frank Seto, Panos Nasiopoulos","doi":"10.1002/jsid.2112","DOIUrl":"https://doi.org/10.1002/jsid.2112","url":null,"abstract":"<p>The adoption of variable refresh rate (VRR) technology in displays—aimed at reducing input lag, minimizing video stuttering, and improving power efficiency—has introduced an unforeseen challenge: flicker caused by minor changes in luminance due to the varying duration of each frame. Existing industry flicker measuring metrics are inadequate, often overly restrictive or reliant on impractical subjective evaluations. This highlights the need for an accurate, objective flicker metric specifically designed for VRR displays. Developing such a metric requires a comprehensive dataset that captures a wide range of flicker intensities across different display technologies and luminance conditions. To facilitate this, we compiled a unique VRR dataset consisting of 160 signals, ranging from 2 to 40 cd/m<sup>2</sup>, along with perceived flicker levels obtained through extensive subjective testing, following a standard protocol defined in ITU-R BT.500-15. This dataset serves as a critical resource for flicker assessment, providing valuable insights for display manufacturers, and it is instrumental in advancing VRR technology. Our analysis revealed that JEITA, the most widely used flicker metric for VRR displays, correlates with subjective flicker perception at only 71.43%. This finding underscores the limitations of current metrics and the pressing need for a more reliable standard tailored to VRR technology.</p>","PeriodicalId":49979,"journal":{"name":"Journal of the Society for Information Display","volume":"34 1","pages":"12-24"},"PeriodicalIF":2.2,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sid.onlinelibrary.wiley.com/doi/epdf/10.1002/jsid.2112","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145993930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}