学术视点----中国科学院网信工作网

学术视点

日期：2024-10-10

| 来源：【字号：大中小】

题目：Exploring Fragment Adding Strategies to Enhance Molecule Pretraining in AI-Driven Drug Discovery

作者：Z. Meng, C. Chen, X. Zhang, W. Zhao and X. Cui

来源：Big Data Mining and Analytics, vol. 7, no. 3, pp. 565-576.

摘要：The effectiveness of Al-driven drug discovery can be enhanced by pretraining on small molecules. However, the conventional masked language model pretraining techniques are not suitable for molecule pretraining due to the limited vocabulary size and the non-sequential structure of molecules. To overcome these challenges, we propose FragAdd, a strategy that involves adding a chemically implausible molecular fragment to the input molecule. This approach allows for the incorporation of rich local information and the generation of a high-quality graph representation, which is advantageous for tasks like virtual screening. Consequently, we have developed a virtual screening protocol that focuses on identifying estrogen receptor alpha binders on a nucleus receptor. Our results demonstrate a significant improvement in the binding capacity of the retrieved molecules. Additionally, we demonstrate that the FragAdd strategy can be combined with other self-supervised methods to further expedite the drug discovery process.

题目：A Large-Scale Spatio-Temporal Multimodal Fusion Framework for Traffic Prediction

作者：B. Zhou, J. Liu, S. Cui and Y. Zhao

来源：Big Data Mining and Analytics, vol. 7, no. 3, pp. 621-636.

摘要：Traffic prediction is crucial for urban planning and transportation management, and deep learning techniques have emerged as effective tools for this task. While previous works have made advancements, they often overlook comprehensive analyses of spatio-temporal distributions and the integration of multimodal representations. Our research addresses these limitations by proposing a large-scale spatio-temporal multimodal fusion framework that enables accurate predictions based on location queries and seamlessly integrates various data sources. Specifically, we utilize Convolutional Neural Networks (CNNs) for spatial information processing and a combination of Recurrent Neural Networks (RNNs) for final spatio-temporal traffic prediction. This framework not only effectively reveals its ability to integrate various modal data in the spatio-temporal hyperspace, but has also been successfully implemented in a real-world large-scale map, showcasing its practical importance in tackling urban traffic challenges. The findings presented in this work contribute to the advancement of traffic prediction methods, offering valuable insights for further research and application in addressing real-world transportation challenges.

题目：面向大数据场景的系统性能优化实践

作者：王冀彬、杨海龙、冯凯、孙欣、张敏达、雷克伦、肖智文、张逸飞、吴佳熙

来源：大数据, 2024, 10(4): 21-33.

摘要：在现有大规模分布式环境中，大数据应用的性能与计算效率仍有较大的提升空间。然而，在大规模环境中进行性能分析与优化需要大量领域专家。针对大数据应用中的性能优化问题，提出了一个通用的低效查询语句检测与优化流程，总结了4类显著影响大数据应用性能的低效行为，并针对每一类低效行为，提出了具体的优化策略。最后，通过实验评估验证了提出的优化方案在实际大规模集群中的有效性。

题目：基于特征抽取的新一代信息技术产业政策变迁研究

作者：王凯利、陶成煦、吴江

来源：图书情报知识, 2024, 41(4): 121-133.

摘要：对新一代信息技术产业政策变迁进行研究，有助于明确政策的优化方向，促进新一代信息技术产业的发展。综合官方发布的权威分类文件、业界发布的研究报告、学术领域刊登的相关文章确定政策检索词。采用政策编码、文本分析、BERT及其扩展模型等分析方法，对政策的外部属性、文本主题变迁以及工具结构变迁特征进行分析。新一代信息技术产业政策经历了起步阶段、高速发展阶段和平稳发展阶段。政策内容由基础扶持性政策过渡到高端要素性政策，再到转型提质性政策。政策工具结构呈现以环境型工具为主、供给型其次、需求型最少的非均衡性分布。未来，各级政府应协调各类型新一代信息技术的发展，落实政策要求，加大需求型政策工具的支持。从外部属性和内容属性维度开展政策量化研究，通过多方法实现不同政策特征的自动抽取，厘清了新一代信息技术产业政策关注类型，系统梳理了产业政策的变迁特征，为政府的政策制定提供借鉴和参考。

题目：高校高价值专利技术机会识别研究——以“生成式人工智能”领域为例

作者：冉从敬、李旺、黄文俊

来源：信息资源管理学报, 2024, 14(4): 103-116.

摘要：提出一种高校高价值专利技术机会识别方法，使用主题建模、突变级数法、机器学习与离群值检测算法，在评估出高校高价值专利的基础上，进一步识别出具有潜在技术机会的技术主题与专利技术。以“生成式人工智能”领域为例进行实证，研究结果表明：“生成式人工智能”领域的潜在技术主题集中在深度学习、神经网络与机器学习等前沿领域，AI影像、AI诊疗等技术为该领域的潜在技术机会，且上述技术均有国家相关政策大力支撑。本研究方法突破了单一技术机会识别方法识别结果针对性不强、识别专利价值不大、识别结果形式较为单一等核心问题，相关识别结果可以为高校技术转移、技术研发与技术创新提供决策支撑。

题目：数智时代的个人信息保护失范行为：AIGC赋能的扎根理论分析

作者：程文迪、张晓、潘兆辉、赵友军、孙晨光、单学强、金雨展、赵晓南

来源：大数据, 2024, 10(4): 3-20.

摘要：。数智时代个人信息被广泛收集和深度利用的背景下，识别个人信息保护失范行为对于推进个人隐私及信息安全保障具有重要意义。本研究基于当前个人信息保护相关法律法规，采用AIGC辅助的扎根理论分析方法，对2100个违法案例进行定性分析，识别个人信息处理活动中常见的信息保护失范行为，并对其进行系统梳理和分类，最终确定了11大类、55小类个人信息保护失范行为，并分析了数智时代个人信息保护的重点问题。本研究不仅能够为个人信息保护相关研究提供可遵循的理论框架，还可为政府职能部门等各类组织提供决策支持。

题目：科学数据网络：概念、系统与应用

作者：赵海平、刘志鑫、孙彦超、江娜

来源：信息资源管理学报, 2024, 14(4): 59-69.

摘要：科学数据具有分散化、差异化、孤岛化等典型特征，构建可打破各种孤岛、有效整合分布式科学数据资源的基础设施具有重要意义。本文梳理了国内外类网络科学数据平台、技术与系统的进展，阐明了科学数据网络的概念、特征、功能与关键技术，并针对新型科研范式下科学数据的协作利用需求，提出并设计了科学数据协作网络RDCN。科学数据网络可有效改善科学数据的分散化、差异化、孤岛化问题，RDCN在生物多样性研究、生态系统野外台站观测研究、多信使天文学研究等融合科学协作场景中将发挥重大的作用。

附件：