Image credit: Applied IntelligenceExisting scene text image super-resolution (STISR) methods primarily focus on the restoration of fixed-size English text images. Compared to English characters, Chinese characters present a greater variety of categories and more intricate stroke structures. In recent years, Transformer-based methods have achieved significant progress in image super-resolution task, but face the dilemma between global modeling and efficient computation. The emerging Receptance Weighted Key Value (RWKV) model can serve as a promising alternative to Transformer, enabling long-distance modeling with linear computational complexity. In this paper, we propose a Hybrid CNN-RWKV with High-Frequency Enhancement (HCR-HFE) model for STISR task. First, we design a recurrent bidirectional WKV (Re-Bi-WKV) attention which integrates bidirectional WKV (Bi-WKV) attention with a recurrent mechanism. Bi-WKV achieves global receptive field with linear complexity, while the recurrent mechanism establishes 2D image dependencies from different scanning directions. Additionally, a computationally efficient high-frequency enhancement module (HFEM) is incorporated to enhance high-frequency details, such as character edge information. Furthermore, we design a multi-scale large kernel convolutional (MLKC) block which integrates large kernel decomposition, gated aggregation and multi-scale mechanism to capture various-range dependencies with reduced computational cost. Finally, we introduce a multi-frequency channel attention (MFCA) which extends channel attention to the frequency domain, enabling the model to focus on critical features. Extensive experiments on real-world Chinese-English (Real-CE) dataset demonstrate that HCR-HFE outperforms previous methods in both quantitative metrics and visual results. Furthermore, HCR-HFE achieves excellent performance on natural image datasets, demonstrating its broad applicability.