The preliminary version of the conference program is available.
SESSION 1. Data Integration. Room 125. Chair: Dmitrii Shaposhnikov
Sergey Stupnikov. Формальная семантика и верификация программ материализованной интеграции данных на процедурном диалекте SQL
(PDF)
Abstract
С ростом неоднородности моделей и схем данных все острее
проявляется необходимость интеграции данных. Программы
интеграции данных могут быть очень сложными, а потому
важными становятся вопросы формальной верификации их
корректности. В работе предложен метод верификации программ
материализованной интеграции данных на процедурном диалекте
SQL, основанный на определении их семантики в формальном
языке спецификаций, поддержанном средствами
автоматизированного доказательства. Метод проиллюстрирован
на примере верификации программы мате-риализованной
интеграции данных в области управления землепользованием.
Nikolay Skvortsov. Управление качеством данных при решении задач над неоднородными источниками данных
(PDF)
Abstract
Решение задач над доступными научными данными должно
отвечать принципам, обеспечивающим их многократное
повторное использование. Показатели качества данных
являются важными их характеристиками, отражающимися не
только на точности работы методов при решении
исследовательских задач, но и на пригодности данных и
возможности решения над ними конкретных задач, на выборе
методов работы с ними, на их сочетаемости друг с другом и
на других аспектах их повторного использования. При этом
приходится оценивать различные стороны качества данных на
разных уровнях от целых наборов данных до конкретных
значений. В настоящем исследовании показан подход к
комплексному управлению качеством данных на основе их
спецификации в качестве метаданных. Обсуждаются разные
измерения в пространстве оценки качества данных, включая их
точность, полноту и происхождение. Разработанный подход
применён на примере решения задач над множественными
источниками данных в области звёздной астрономии.
Андрей Шепелев and Sergey Stupnikov. Сопоставление схем с использованием федеративного обучения на гибридном наборе признаков
(PDF)
Abstract
Для повышения репрезентативности при анализе данных
возникает необходимость извлечения и интеграции данных из
различных источников. В данной работе рассматриваются
вопросы применения машинного обучения для одного из
основных этапов интеграции данных - сопоставления схем.
Результатом этого этапа является сопоставление элементов
схем источников и элементов целевой схемы. Предложена
нейросетевая модель, основанная на сочетании сетей долгой
краткосрочной памяти, механизмов внимания и многослойного
персептрона. При обучении модели используется гибридный
набор признаков, включающий показатели сходства имен, типы
данных, выделенные теги, описательные статистики,
коэффициенты корреляции для числовых данных. Проведены
эксперименты, показывающие, что предложенная модель
превосходит по качеству базовую нейросетевую модель и
классические методы сопоставления схем. Показано также, что
при федеративном обучении, сохраняющем конфиденциальность
данных, каче-ство модели практически не падает по сравнению
с централизованным обучением.
SESSION 2. Data analysis in astronomy I. Room 126. Chair: Alexey Pozanenko
Ekaterina Malik, Dana Kovaleva, Oleg Malkov, Pavel Kaygorodov and Bernard Debray. Developing the optimal cross-matching algorithm for the Gaia and BDB data
(PDF)
Abstract
Binary and multiple stellar systems for an important part
of stellar population of our Galaxy. Depending on the
method of discovery of binarity, such stars are referred to
using a number of datasets of observational parameters,
dedicated catalogues and systems of identifiers. The task
of linking of the identifiers with the proper entities
within binary and multiple systems is solved by the index
catalogue of the Binary star DataBase BDB
(http://bdb.inasan.ru), named Identification List of
binaries (ILB). This catalogue needs to be regularly
updated, and recently the vast amount of results of the
Gaia space mission has been published which contains, among
other data, the data for non-single stars. Similarly, based
on Gaia data, more than a million of wide binary stars were
identified. We are developing the algorithm for
cross-identification of the catalogue ILB comprising
virtually all data for binary and multiple stars before
Gaia, with the data for such stars based on Gaia results.
Евгений Щекотихин, Николай Панков, Алексей Позаненко, Сергей Белкин, Павел Минаев and Алина Вольнова. Применение нейронных сетей для поиска оптических транзиентов на астрономических изображениях методом вычитания
(PDF)
Abstract
Объектами данного исследования являются методы обработки и
вычитания астрономических изображений, полученных в ходе
различных оптических обзоров с целью поиска с
использованием вычитания изображений оптических
транзиентов. В работе рассмотрен новый метод преобразования
вычитаемого изображения к поисковому, основанный на
обучении модели сверточной нейронной сети соответствующему
преобразованию с пары вычитаемых изображений в рамках
обучающей маски непосредственно перед вычитанием. На парах
фрагментов изображений, полученных на Абастуманской
астрофизической обсерватории и фрагментов обзора Pan-STARRS
продемонстрирована эффективность метода как для
идентификации транзиентных источников, так и для оценки их
потока. Актуальность работы заключается в том, что метод
может применяться для любых изображений, в том числе для
пары изображений полученных на различных телескопах, а
также для других задач. Также планируется имплементация
алгоритма в действующий программный конвейер для обработки
астрономических изображений в широких полях зрения и поиска
оптических транзиентов по программе наблюдений космических
гамма-всплесков и источников электромагнитного излучения,
ассоциированных с гравитационно-волновыми событиями,
регистрируемыми детекторами LIGO-Virgo-KAGRA.
Альберт Хабибуллин, Алексей Позаненко and Владимир Лозников. Моделирование кривых блеска космических гамма-всплесков
(PDF)
Abstract
Работа посвящена решению задачи определения параметров
распределения профиля импульсов кривой блеска космических
гамма-всплесков. Задача решается с помощью моделирования
синтетических кривых блеска, составленных из набора
импульсов. Каждый из импульсов определяется аналитической
формой, имеющей 4 параметра: амплитуда, положение максимума
на кривой блеска, длительность и параметр симметрии.
Рассматривается несколько аналитических форм импульсов:
двухсторонняя экспонента, логарифмически-нормальная форма и
непрерывно дифференцируемая функция Fast Rise Exponential
Decay (FRED). Кривые блеска генерируются исходя из
предположения, что для каждого параметра импульсов
существует некоторое распределение. Исследовано влияние
аналитической формы импульса и распределения параметров
импульса на спектры мощности ансамбля смоделированных
кривых блеска. Показано, что только параметры длительности
и симметрии влияют на вид спектра мощности ансамбля кривых
блеска, составленных из этих импульсов. Найдены
распределения параметров импульсов, которые производят
спектр мощности ансамбля кривых блеска, состоящий из трёх
степенных участков, а также показано, что для этого
необходимо ввести зависимость между параметрами
длительности и симметрии импульса. Исследована зависимость
между характерными особенностями спектра мощности ансамбля
(значениями частоты изломов между степенными участками, и
показателями степени участков) и распределением параметров
импульсов.
SESSION 3. Conceptual Modeling and Ontologies. Room 125. Chair: Nikolay Skvortsov
Stepan Vinnikov, Anatoly Nardid and Yuriy Gapanyuk. Metagraph Operations using Bigraph Representation
(PDF)
Abstract
In this article, we propose an efficient implementation of
operations on metagraphs by using an alternative definition
of a metagraph. Definitions of the metagraph structures are
given. Alternative definition of a metagraph is proposed.
Metagraph operations definition based on alternative
definition of a metagraph are discussed. Operations on
hierarchical metagraphs are proposed. Nesting binary
relation is discussed. Elementary operations on metagraphs
are proposed. Complex operations on metagraphs are
discussed. The example of using operations over metagraph
is given.
Nikolay Kalinin and Nikolay Skvortsov. Подход к анализу предметной области информационной безопасности для построения исследовательских инфраструктур
(PDF)
Abstract
В последние годы наблюдается устойчивый рост интенсивности
исследований, связанных с информационной безопасностью.
Возросший интерес научного сообщества сделал более
актуальной проблему доступности, повторного использования
научных данных и результатов в данной области. В статье
рассматриваются вопросы, связанные с разработкой
исследовательских инфраструктур в области информационной
безопасности, призванной решить упомянутые проблемы и
способствовать проведению более эффективных исследований.
[short] Olga Ataeva, Vladimir Serebryakov, Natalia Tuchkova and Ivan Strebkov. Онтология и граф знаний математической физики в семантической библиотеке
(PDF)
Abstract
В работе обсуждаются проблемы конструирования семантической
библиотеки для ресурсов, посвященных математике и
математической физике на основе энциклопедий. Исследуется
механизм интеграции энциклопедий в контент семантической
библиотеки и объединение математической энциклопедии и
энциклопедии математической физики. В процессе интеграции
обнаруживаются пересечения множеств статей этих
энциклопедий, а также взаимное обогащение их терминов. В
результате граф знаний семантической библиотеки насыщается
новыми узлами и связями, что в свою очередь приводит к
обогащению предметных областей самой семантической
библиотеки и предметных областей интегрированных научных
публикаций.
[short] Yury Zagorulko, Galina Zagorulko and Elena Sidorova. Approach to Developing a Machine Learning Ontology
(PDF)
Abstract
The paper describes an approach to developing a machine
learning ontology, based on the methodology for
constructing ontologies of scientific subject domains,
developed in A.P. Ershov Institute of Informatics Systems.
A brief overview of the basic concepts and terms of machine
learning (ML) and known developed ontologies related to
this field is given. The paper also pro-vides a brief
description of the methodology for constructing ontologies
of scientific subject domains, and describes the ontology
design patterns developed within the framework of this
methodology to represent the basic concepts of the ML
subject domain. The developed ML ontology will be used to
build an intelligent scientific Internet resource on
machine learning that will provide content-based access to
systematized knowledge and data in the field of ML, helping
users in choosing methods, models and data sets necessary
to solve their practical problems.
SESSION 4. Data analysis in astronomy II. Room 126. Chair: Alina Volnova
Matwey Kornilov, Vladimir Korolev, Konstantin Malanchev, Anastasia Lavrukhina, Etienne Russeil, Timofey Semenikhin, Emmanuel Gangler, Emille Ishida, Maria Pruzhinskaya, Alina Volnova and Sreevarsha Sreejith. Coniferest: an active anomaly detection framework
(PDF)
Abstract
We present coniferest, an open source generic purpose
active anomaly detection framework written in Python. The
package design and implemented algorithms are described.
Currently, static outlier detection analysis is supported
via the Isolation forest algorithm. Moreover, Active
Anomaly Discovery (AAD) and Pineforest algorithms are
available to tackle active anomaly detection problems. The
algorithms and package performance are evaluated on a
series of synthetic datasets. We also describe a few
success cases which resulted from applying the package to
real astronomical data in active anomaly detection tasks
within the SNAD project.
Timofey Semenikhin, Matwey Kornilov, Maria Pruzhinskaya, Anastasia Lavrukhina, Etienne Russeil, Emmanuel Gangler, Emille Ishida, Vladimir Korolev, Konstantin Malanchev, Alina Volnova and Sreevarsha Sreejith. Real-bogus classification for ZTF data releases: two approaches
(PDF)
Abstract
We compared two fundamentally different approaches to
real-bogus classification within the Zwicky Transient
Facility survey data. The first approach is based on neural
networks that take sequences of object images as input. The
second approach uses features extracted from light curves
and classical machine learning methods. Several models for
both approaches were tested. Quality metrics were evaluated
using k-fold cross-validation. We found that models based
on classical machine learning algorithms outperform the
neural network approach in both computational performance
and quality. The code written during the study is available
on GitHub.
Art Prosvetov, Sergey Grebenev, Sergey Belkin and Alexey Pozanenko. Correlations and Classification in Light Curves of Type Ic Supernovae with GRB Association
(PDF)
Abstract
In this study, we conducted an analysis of long-term light
curves for 56 type Ic supernovae, with a dedicated focus on
those accompanied by gamma-ray bursts (GRBs). Our results
did not confirm the correlations between the full-width at
half-maximum (FWHM) and peak bolometric luminosity
previously suggested in smaller samples. However, we
identified a relationship between the decay rate and growth
rate of luminosity. This relationship enables the
reconstruction of light curve profiles and the estimation
of peak bolometric luminosity. Additionally, we developed a
method for distinguishing between GRB supernovae and type
Ic supernovae based on light curve parameters. Future
research should aim to expand the sample size to refine and
validate the proposed methods.
SESSION 5. Machine learning methods. Room 125. Chair:
Dmitrii Khliustov and Dmitry Kovalev. Aggregation Of Regression Models For Variance Minimization
(PDF)
Abstract
In this work a novel method of weighting regression models
is presented. In order to minimize the variance of residual
between predictions and true values of a given parameter,
optimization problem is posed. It's solution invokes the
use of covariance matrix, which can be reliably estimated
from data. The suggested approach is tested on open source
dataset, containing information on concentration of several
chemical elements in various spatial locations. The
performance of algorithm under study is compared to that of
other algorithms, including Bootstrap aggregation
(bagging), which is often considered a standard one. It is
shown that from theoretical point of view the novel
approach outperforms bagging, while in practice it gives
better results only in some settings, which is attributed
to numerical difficulties in matrix inversion.
Dmitrii Shaposhnikov Minimax. Approach for Using the Qualitative Preferences in the Multicriteria Evaluation
(PDF)
Abstract
One of the main tasks of big data processing and analysis
systems is to obtain an integral numerical multicriteria
evaluation of grouped or ungrouped objects of a population
for all or some subset of quality indicators. The article
considers one of the approach to solving this problem by
forming a multicriteria (multiobject) evaluation of quality
of objects or groups using weighting coefficients of
indicator preference. The proposed approach assumes that in
the case when the assignment of exact values of weighting
coefficients is dicult or impossible, the use of
qualitative (verbal) information on the relative importance
of parameters presented by the analyst in the form of a
preference graph, which may be incomplete. The minimax
method for assigning weighting coefficients of relative
preference is considered based on the fundamental principle
of different values of weighting coefficients for different
objects of a population while maintaining the preference
system of the entire set of objects. For each object or
group, weighting coefficients are calculated automatically
based on qualitative preferences according to the minimax
principle by solving an optimization problem using
generalized logical criteria of maximum risk and maximum
caution. For special cases of preference systems,
analytical relationships and algorithms for calculating
weighting coefficients are given.
short] Олег Валентинович Сенько, Александр Александрович Докукин and Федор Александрович Мельник. Использование ансамблей с увеличенной дивергенцией в пространстве прогнозов в рекомендательных системах
(PDF)
Abstract
В работе представлены результаты применения метода
дивергентного решающего леса при решении задач
классификации, возникающих при создании рекомендательных
систем. Метод является авторской разработкой, основанной на
достижении более высокой дивергенции в пространстве
прогнозов по сравнению со стандартным случайным решающим
лесом.
SESSION 6. Machine learning methods and applications II. Room 126. Chair:
Irina Dvoretskaya, Alexey Semenov and Alexander Uvarov. Exploring Patterns Of Information Literacy Development In Schools: Application Of Multilevel Latent Class Analysis To School Students Survey Data
(PDF)
Abstract
While the literature on digital transformation in education
has searched for evidence based practices to improve ICT
uptake in school settings, we know little about how schools
differ in their approaches. This study aims to overcome the
absence of standardised tools that could help to assess the
stages and progress of ICT integration in educational
settings. By using the example of information literacy
development tasks assignment in class-room, we applied a
latent class analysis to the survey data obtained from the
monitoring the digital transformation of schools in the
2020-21 aca-demic year.
Based on the survey data from monitoring the digital
transformation of schools, four types of students' patterns
were identified, depending on the information skills tasks
assigned to them by their teachers at school. Based on the
distribution of students' patterns of working with
information, three typical patterns of schools were
identified with the use of multilevel latent class analysis.
This study provides evidence for how the development of
information literacy differs across schools contexts. As
with advent of digital technolo-gies education becomes data
intensive domain, new approaches to the big data analysis
are encouraged and it can help educators and education
policy makers to improve decision-making.
Alexander Varnavsky. Regression models for the AI gaming chatbot for learning programming based on Wordle-type puzzles
(PDF)
Abstract
Programming is one of the most important skills of the 21st
century. However, it is often a challenge to keep students
interested and engaged when learning pro-gramming. It is
believed that digital games can solve this problem. One of
the types of games that are well suited for the field of
computer science are puzzle games, which are aimed, among
other things, at the development of working memory and
thinking. The aim of the work is to create the gaming
chat-bot for learning programming based on Wordle-type
puzzle and developing regression models that will make it
intelligent. This type of game is chosen because of its
worldwide popularity. Artificial intelligence in the
chatbot is necessary for con-trolling the expediency and
appropriate time for its use, as well as adaptive
for-mation of the level of puzzles. In this paper, the
gaming chatbot was created and used by students. Based on
the collected data from the results of its use, models of
the influence of factors on the level of interest and
difficulty of tasks were built. The developed models formed
the basis of a new version of the gaming chatbot with
artificial intelligence. When using the chatbot with such
models, it is possible to retrain the models and adjust the
obtained values of coefficients. Ap-probation of the
obtained chatbot has shown great interest in its use among
stu-dents learning programming.
[short] Maxim Maron, Arkadiy Maron and Danila Tet'Kov. Определения вероятностей реализации рисков по прогнозным и фактическим данным
(PDF)
Abstract
Актуальность исследуемой проблемы обусловлена
тем, что деятельность компаний в современном быстро
меняющемся мире неизбеж-но связана с рисками. Необходимо
эти риски обоснованно оценивать в условиях дефицита
релевантных статистических данных. Цель статьи за-ключается
в разработке метода определения вероятностей рисков,
которые привели к отклонению целевых показателей
эффективности деятельности компании от целевых значений. В
качестве подхода к исследованию данной проблемы выбран
метод максимальной энтропии – информационный метод Джейнса.
В результате предложен метод, позволяющий для каждого
целе-вого показателя компании и для каждого риска
определить, вероятность того, что реализация именно этого
риска привела к отклонению от целевого значения. В качестве
данных используются прогноз отклонений и фактиче-ские
значения. Материалы статьи будут полезными в первую очередь
для руководства компаний. Также они могут быть полезны
научным работни-кам и аспирантам математических и
экономических специальностей.
SESSION 7. Image Analysis I. Room 125. Chair:
Aleksei Samarin, Alexander Savelev, Aleksei Toropov, Artem Nazarenko, Alexandr Motyko, Elena Mikhailova, Egor Kotenko, Alina Dzestelova and Valentin Malykh. Modernized Non-Local Blocks for Infrared Camera Image Segmentation of the Human Eye
(PDF)
Abstract
This investigation delves into the integration of
specialized
self-attention modules within deep learning models,
specifically focusing on the task of delineating human iris
and pupil regions in infrared
imagery. In this research endeavor, we introduce several
adaptations of
non-local blocks that imbue the essence of
self-attentiveness, while taking into account the unique
attributes inherent in infrared image data.
Employing these tailored enhancements, we have witnessed
remarkable
strides in the performance metrics of the underlying deep
neural net-
work framework, manifesting notable refinement of
segmentation out-
comes (advancing from 0.945 to 0.983 in mIoU and from 0.951
to 0.988
in mDice) upon rigorous evaluation against a representative
subset of
the infrared image dataset. This progress opens doors to
diverse applications in the realm of infrared image
analysis, promising novel avenues
for research and innovation.
Aleksei Samarin, Alina Dzestelova, Egor Kotenko, Valentin Malykh, Elena Mikhailova, Alexandr Motyko, Artem Nazarenko, Alexander Savelev, Aleksei Toropov and Aleksandra Dozortseva. Automated Feature Engineering Based Approach for Micrococci Microscopic Image Classification and Taxonomic Characteristics Determination
(PDF)
Abstract
In this paper, we describe our research on creating
classifiers for microbial images (micrococci microscopy
images) obtained from
images of unfixed microscopic scenes. In our work, we
propose an AutoML approach based on the automatic
generation and analysis of the
feature space for constructing the most optimal descriptors
of microorganism images for subsequent classification. This
makes it possible to
use interpretable taxonomic features based on the geometric
features of
the visual series of images of microorganisms of various
species, which
is important for the microbiology domain environment. To
demonstrate
the effectiveness of our method, we publish an annotated
dataset we created consisting of microbial images of
unfixed microscopic scenes. Using
the presented data set, we compare the classification
efficiency of our
method and various types of classifiers, including those
based on deep
neural network models. The method we proposed demonstrated
the best
results among those studied (F1-score = 0.997).
Art Prosvetov, Alexandr Govorov, Maxim Pupkov, Alexandr Andreev and Vladimir Nazarov. Illuminating the Moon: Reconstruction of Lunar Terrain Using Photogrammetry, Neural Radiance Fields, and Gaussian Splatting
(PDF)
Abstract
Accurately reconstructing the lunar surface is critical for
scientific analysis and the planning of future lunar
missions. This study investigates the efficacy of three
advanced reconstruction techniques – photogrammetry, Neural
Radiance Fields, and Gaussian Splatting – applied to the
lunar surface imagery. The research emphasizes the
influence of varying illumination conditions and shadows,
crucial elements due to the Moon’s lack of atmosphere.
Extensive comparative analysis is conducted using a dataset
of lunar surface images captured under different lighting
scenarios. Our results demonstrate the strengths and
weaknesses of each method based on a pairwise comparison of
the obtained models with the original one. The results
indicate that using methods based on neural networks, it is
possible to complement the model obtained by classical
photogrammetry. These insights are invaluable for the
optimization of surface reconstruction algorithms,
promoting enhanced accuracy and reliability in the context
of upcoming lunar exploration missions.
SESSION 8. Data Analysis in Neurophysiology. Room 126. Chair: Mikhail Zymbler
Anastasiia Timofeeva, Tatiana V. Avdeenko and Sergei Alkov. Robust Partial Correlation between EEG Connectivity and Arithmetic Ability
(PDF)
Abstract
Numerous studies of brain function using EEG data indicate
conflicting results. Therefore, the analysis of methods
that make it possible to identify stable relationships
remains relevant. In this regard, the aim of the present
study is to search for features obtained from EEG data that
are robust correlated with arithmetic ability. Connectivity
graph measures are extracted as features. The problem is
that the graph measures are highly correlated with each
other, so it is proposed to use partial correlation
coefficients based on the Spearman correlation coefficient.
The feature with the largest absolute partial correlation
coefficient is selected. To exclude the influence of other
features on it, a regression is built, based on the
coefficients of which the weights of the features are
calculated. The resulting linear combination of features is
considered as an extracted factor. This approach is
compared to the principal component analysis. The result
shows that the use of a partial correlation coefficient
allows not only to select more significant connectivity
metrics, but also to identify new relationships that cannot
be detected by standard methods due to spurious correlation.
Almaz Shangareev, Ivan Shanin and Sergey Stupnikov. Отслеживание прогресса чтения c использованием глубокого обучения на данных окулографии с высоким уровнем шума
(PDF)
Abstract
Работа посвящена исследованию методов отслеживания про-
гресса чтения на данных окулографии c использованием
нейронных се-
тей глубокого обучения. Разработана архитектура
автокодирующей нейрон-
ной сети, нацеленная на эффективное использование
пространственной и
временной информации. Предложен метод аугментации данных,
который
порождает данные с высоким уровнем шума и сохраняет
информацию о
соответствии каждой фиксации взгляда определённому слову.
Проведена
экспериментальная оценка качества нейросетевой модели на
зашумленных
данных.
Margarita Samburova, Albina Lebedeva, Alexander Naumov, Vyacheslav Razin, Nikolay Gromov, Svetlana Gerasimova, Tatiana Levanova and Lev Smirnov. Using two interconnected reservoirs to predict mouse hippocampal local field potentials
(PDF)
Abstract
The hippocampus plays an important role in various
processes in the brain related to memory and information
processing. The aim of this paper is to propose and test an
approach based on reservoir computings for signal
prediction in rodent hippocampus based on received
biological input. We compare the prediction results for two
reservoir architectures: a single reservoir and two
subsequently connected reservoirs. Obtained results can be
used in tasks of hippocampal activity restoration using
neurohybrid chips.
SESSION 9. Image Analysis II. Room 125. Chair:
Aleksei Samarin, Alexander Savelev, Aleksei Toropov, Artem Nazarenko, Alexandr Motyko, Elena Mikhailova, Egor Kotenko, Alina Dzestelova and Valentin Malykh. ADSAR: Advanced Dual-Stream Attention and Reweighting for Small Object Detection
(PDF)
Abstract
This paper focuses on advancing the field of small object
detection within complex visual environments, leveraging
the latest in deep
learning technologies. We introduce a novel approach
characterized by
a dual-stream self-attention mechanism integrated within a
multi-head
framework, and further refine detection accuracy through an
innovative
output reweighting technique. The core of our methodology,
termed ADSAR (Advanced Dual-Stream Attention and
Reweighting), is designed
to tackle the challenges posed by small objects that often
overlap multiple tokens in feature maps—a common issue in
conventional detection
models. By dynamically adjusting the scale of attention
across different
heads, ADSAR allows for detailed feature capture at
multiple granularities, significantly enhancing the model’s
ability to detect and characterize
small objects. The addition of a softmax-based reweighting
function selectively emphasizes features crucial for object
recognition, thereby sup-
pressing irrelevant information and reducing noise. Our
proposed model
not only outperforms existing state-of-the-art solutions in
accuracy but
also demonstrates superior efficiency in processing and
scalability. These
advancements not only contribute to the theoretical
understanding of
attention mechanisms in deep neural networks but also offer
practical
improvements in real-world applications where small object
detection is
critical.
Anna Shiyan, Ivan Kozlov, Ildar Baimuratov and Nataly Zhukova. Plants and their Diseases Recognition: Multiclass and Multilabel Classification Benchmarks
(PDF)
Abstract
We introduce a comparison of MobileNetV3Small,
EfficientNetB0 and DenseNet121 models pre-trained on
ImageNet and fine-tuned on Plant Village and PlantDoc
datasets for plants and their diseases multiclass and
multilabel classification. As a result of the experiments,
it was found that the EfficientNetV2B0 model was the most
effective for the plant diseases recognition task with
accuracy 0.997 on Plant Village dataset and 0.96 on
PlantDoc dataset.
[short] Павел Архипов, Сергей Филиппских and Максим Цуканов. Кластерный анализ данных для оптимизации параметров алгоритмов детектирования объектов мобильными нейросетевыми моделями
(PDF)
Abstract
В статье рассматривается задача выбора оптимальных форм и
количества полей привязки для точной настройки алгоритмов
детектирования и классификации объектов. Поиск полей
привязки сводится к задаче кластеризации, позволяющей
выделять группы схожих объектов. Для кла-стеризации были
выбраны три алгоритма, основанные на разных принципах:
принципе прототипов, иерархических деревьев и графов. Для
экспе-риментов с поиском полей привязки выбраны две
нейросетевые модели: SSD MobileNet V2 FPNLite 640x640 и SSD
ResNet50 V1 FPN 640x640, находящиеся в репозитории
TensorFlow 2 Detection Model Zoo. Данные нейросетевые
модели были предобучены на наборе данных Microsoft Common
Objects in Context 2017, а затем дообучены на наборе данных
VisDrone2022. При помощи кластерного анализа вычислены 15
множеств коэффициентов полей привязки. Для каждого
множества полученных ко-эффициентов обучена своя
нейросетевая модель SSD MobileNet V2 FPNLite 640x640 и
произведено сравнение с базовыми моделями SSD MobileNet V2
FPNLite 640x640 и SSD ResNet50 V1 FPN 640x640 со
стан-дартными значениями коэффициентов полей привязки. В
результате оп-тимизации параметров существенно выросла
точность детектирования объектов при помощи модели на базе
мобильной нейросетевой архитектуры MobileNet V2. Данный
результат имеет большое практическое значение при работе с
компактными энергоэффективными системами с ограниченной
производительностью и памятью.
[short] Sergey Stasenko, Andrey Lebedev, Olga Shemagina, Irina Nuidel, Andrey Kovalchuk and Vladimir Yakhno. Adaptive Correction of Multi-Cascade Detectors in Biomorphic AI for Pattern Recognition
(PDF)
Abstract
This paper investigates the adaptive correction of a
multistage detector in a biomorphic artificial intelligence
system for pattern recognition problems. A distinctive
feature of this system is its imitation of the hierarchical
information processing observed in living systems, such as
in the visual cortex of the brain. As part of this study,
an algorithm for correcting the results of a multistage
detector was tested, and conclusions about the outcomes of
its application were formulated.
SESSION 10. Data Analysis in Medicine. Room 126. Chair: Nikolay Zolotykh
Vladislav Kuznetsov, Victor Moskalenko and Nikolay Zolotykh. Diagnosis of cardiovascular diseases using recurrent and convolutional neural networks
(PDF)
Abstract
In this work, we explore methods for diagnosing
cardiovascular diseases based on electrocardiogram (ECG)
data, using neural networks, as well as machine learning
methods. We will use convolutional and recurrent neural
networks, as well as the XGBoost algorithm. Our goal is to
build a model that can identify a large set of
cardiovascular diseases using a signal and segmentation
built on that signal with a high degree of accuracy, as
well as additionally calculated features. We set the task
of determining heart rhythms, hypertrophies, extrasystoles,
AV blocks, bundle branch blocks and electrical axes. The
signal consists of waves with sampling rates of 100 and 500
Hz and waveform lengths of 10 seconds. The signal will be
processed and supplied to the input of the models in a
shortened form of 6 and 9 seconds. The metrics used are
precision, recall, specificity and F1-measure. Within the
framework of the problem under study, both binary and
multi-class problems of determining the presence of
pathologies will be posed. Some of the diagnoses show good
quality metrics on individual models. We will conduct a
comparative analysis of the results of each model.
Automatic diagnosis of cardiovascular diseases is designed
to reduce and optimize the work of cardiologists.
Alexander Varnavsky. Model for assessing the need to involve users of social networks in a healthy lifestyle and giving up bad habits according to the data of a social network
(PDF)
Abstract
An urgent task is to preserve and maintain the health of
the country’s population, including through the promotion
of a healthy lifestyle. Since social networks are very
popular, especially among young people, it is possible to
promote a healthy lifestyle on their basis. Despite the
existing research on the influence of social networks on
user behaviour, especially to alcohol consumption and
smoking, no models are providing personalized
recommendations for the user to involve in a healthy
lifestyle and quit bad habits. The work aimed to research
the young peo-ple’s social networks usage indicators and
behaviour to a healthy lifestyle and the construction of
personalized models to assess the need to change user
behav-iour. To achieve the aim, experimental research was
conducted based on a sur-vey of young people and an
assessment of their profiles in social networks. An
assessment and analysis of the existence of relationships
between indicators of self-assessment of health, the
presence of diseases, behaviour to a healthy life-style and
the behaviour of users in social networks were completed.
It was found that self-assessment of health and the
presence of chronic diseases are not only interconnected
with indicators of behaviour to a healthy lifestyle but
also in-terrelated with respondents’ behaviour indicators
in social networks. The theory of cognitive processes and
cognitive load can explain these relationships. Based on
the presence of interrelationships, regression models were
built predicting us-ers’ behaviour to a healthy lifestyle.
Using such models embedding in social networks will allow
issuing personalized recommendations.
[short] Vyacheslav Razin and Alexander Krasnov. Deep learning to detect the presence of heart disease on the PTB-XL dataset
(PDF)
Abstract
This article describes the use of convolutional and
recurrent neural networks to solve the problem of
determining the presence of heart disease. Particular
atten-tion is paid to constructing various ensembles from
trained deep learning methods to improve target metrics.
The work also proposes various modifications and methods
that may be useful for increasing target metrics.
[short] Maxim Kostyukov, Lev Smirnov and Grigory Osipov. Neuromorphic reservoir computing for ECG heartbeats classification
(PDF)
Abstract
Brain-inspired reset computing methods attracted a great
attention due to their reduced computation complexity by
using fixed in- ternal synaptic strengths. In our study we
consider a quantized reservoir neural network to be used on
a neuromorphic hardware as a feature ex- traction model for
solving time-series classification tasks. We conducted
experiments on ambulatory ECG recordings. The considered
approach demonstrates competitive accuracy and robustness,
so it can be regarded for use on wearable devices due its
energy efficiency.
SESSION 11. Information extraction from text I: Generative and Transformer-Based Models. Room 125. Chair: Boris Dobrov
Anna Glazkova and Dmitry Morozov. Exploring Fine-tuned Generative Models for Keyphrase Selection: A Case Study for Russian
(PDF)
Abstract
Keyphrase selection plays a pivotal role within the domain
of scholarly texts, facilitating efficient information
retrieval, summarization, and indexing. In this work, we
explored how to apply fine-tuned generative
transformer-based models to the specific task of keyphrase
selection within Russian scientific texts. We experimented
with four distinct generative models, such as ruT5, ruGPT,
mT5, and mBART, and evaluated their performance in both
in-domain and cross-domain settings. The experiments were
conducted on Russian scientific abstract texts from four
domains: mathematics and computer science, history,
medicine, and linguistics. The use of generative models,
namely mBART, led to gains in in-domain performance (up to
4.87% in BERTScore, 8.96% in ROUGE-1, and 12.16% in
F1-score) over three keyphrase extraction baselines for the
Russian language. Although the results for cross-domain
usage were significantly lower, they still demonstrated the
capability to surpass baseline performances in several
cases, underscoring the promising potential for further
exploration and refinement in this domain.
Alexey Sery, Daria Ilina, Elena Sidorova and Yury Zagorulko. Applying Generative Neural Networks to Extract Argument Relations from Scientific Communication Texts
(PDF)
Abstract
The study explores methods for extracting argument
relations from texts using large generative language
models. Experiments were conducted on a Russian-language
corpus of texts related to the field of scientific
communication. Prompt-engineering methods were used, with
prompts developed using various tactics. The Mistral-7B was
employed as the generative model. The task of extracting
ar-gumentative links was formulated as a binary
classification problem of the exist-ence/non-existence of a
link between two statements. In constructing the dataset,
the data were balanced. Positive examples included
statements that were part of a single argument (premise,
conclusion), while negative examples were generated from
statements in the same paragraph for each positive example.
Two methods of creating instructions were considered: using
ChatGPT and an expert approach using the Chain-of-Thoughts
tactic. The best solutions were obtained based on
instructions composed by an expert and including context
for each statement of one paragraph size. Instructions
generated by ChatGPT, while producing compa-rable results,
oftentimes returned incorrect responses. An experimental
study was also conducted on an approach, in which the
argumentation scheme is predicted immediately, allowing for
more precise information about the type of relation to be
included in the prompt. This task was also formulated as a
binary classification problem. The two most frequent
schemes in the examined corpora, “Expert Opin-ion” and
“Example,” were explored.
Alisher Rogov and Natalia Loukachevitch. Explaining Transformer-Based Models: a Comparative Study of flan-T5 and BERT Using Post-Hoc Methods
(PDF)
Abstract
Neural networks have become an integral part of everyday
life, finding applications in various domestic and
industrial tasks. Generative models based on the
Transformer architecture play a particularly significant
role in natural language processing. These models have
achieved, and in some cases surpassed, human-level
performance in several tasks. However, despite their high
performance, generative models can sometimes produce
unexpected results. Understanding the principles behind the
decisions of such models is an important and relevant
challenge. In this article, we investigate how effectively
the T5 model explains its answers in classification tasks.
We also compare its interpretative capabilities with those
of the BERT model using well-known interpretation methods
such as SHAP, LIME, and the attention mechanism.
Elena Bolshakova and Vladislav Semak. An Experimental Study on Cross-domain Transformer-Based Term Recognition for Russian
(PDF)
Abstract
Terminologies of specialized problem domains present an
important part of knowledge to be extracted for various
applications, such as construction of thesauri, ontologies,
glossaries and so on. Meanwhile, widely-used automatic term
extraction (ATE) methods are mainly statistics-based and
show quite aver-age quality, so ways to leverage modern
deep learning techniques are currently studied. The paper
addresses the task of term recognition based on BERT
clas-sifier, considering cross-domain settings for
experiments. The dataset constructed for experiments is
presented, which contains samples taken from scientific
texts in Russian. The results of the experiments with
cross-domain term recognition are described, demonstrating
comparable or slightly better quality than the most known
ATE methods.
SESSION 12. Large Language Models and Applications. Room 126. Chair: Alexander Ponomarenko
Dmitry Namiot and Elena Zubareva. On Open Datasets for LLM Adversarial Testing
(PDF)
Abstract
This article discusses the issues of testing large language
models. Large language models are the most popular form of
generative machine learning models. The simple and clear
usage model has led to their enormous popularity. However,
like other machine learning models, large language models
are susceptible to adversarial attacks. One could even say
that the success of large language models has greatly
increased interest in the security of machine learning
models themselves. This direction immediately turned out to
affect all users of machine learning systems. This article
discusses the use of ready-made datasets for adversarial
testing of large language models.
Pujun Xie and Anton Khritankov. An LLM Approach to Fixing Common Code Issues in Machine Learning Projects
(PDF)
Abstract
Modern empirical research in machine learning largely
relies on developing custom software. Often such software
is written by researchers and not professional software
engineering. As a result, source code issues and the
associated technical debt may accumulate and lead to higher
programming effort, obstacles to code reuse, hidden
software defects affecting the quality of the research
itself. In this paper, we investigate if it is possible to
apply automatic tools to prevent or remove these source
code issues thus alleviating the need for software
engineers in research projects. We analyze the source code
of 24 open source research projects in machine learning,
identify common issues and propose practical techniques to
prevent these issues during coding. We also investigate if
an application of an LLM coding assistant can fix common
code issues automatically. We found out that 1) frequent
source code issues largely the same for different machine
learning frameworks 2) most of the issues could be
eliminated by following simple coding practices 3) most of
the issues could be removed by applying an LLM coding
assistant.
Vasily Kostyumov, Bulat Nutfullin and Oleg Pilipenko. Uncertainty-Aware Evaluation for Vision-Language Models
(PDF)
Abstract
Vision-Language Models like GPT-4, LLaVA, and CogVLM have
surged in popularity recently due to their impressive
performance in several vision-language tasks. Current
evaluation methods, however, overlook an essential
component: uncertainty, which is crucial for a
comprehensive assessment of VLMs. Addressing this
oversight, we present a benchmark incorporating uncertainty
quantification into evaluating VLMs.
Our analysis spans 20+ VLMs, focusing on the
multiple-choice Visual Question Answering (VQA) task. We
examine models on 5 datasets that evaluate various
vision-language capabilities.
Using conformal prediction as an uncertainty estimation
approach, we demonstrate that the models’ uncertainty is
not aligned with their accuracy. Specifically, we show that
models with the highest accuracy may also have the highest
uncertainty, which confirms the importance of measuring it
for VLMs.
Our empirical findings also reveal a correlation between
model uncertainty and its language model part.
The code is available at
https://github.com/EnSec-AI/VLM-Uncertainty-Bench.
Arxiv Preprint: https://arxiv.org/pdf/2402.14418
SESSION 13. Information extraction from text II: Topic models. Room 125. Chair: Natalia Loukachevitch
Julian Serdyuk, Konstantin Vorontsov and Murat Apishev. Hypergraph topic models of document collections
(PDF)
Abstract
In this paper, the problem of constructing hypergraph
(transactional) topic models on a collection of documents
is explored. Such a model allows improve the standard the
bag of words approach and take into account the semantic
structure of the text: named entities, sentences or entire
paragraphs of text. The transactional model assumes that a
document is compiled through transactions, each of which
adds some unit to the text, be it a word, phrase, sentence,
and so on. Experiments have shown that hypergraph topic
models have higher quality compared to classical topic
modeling models.
Bulat Gizatullin and Olga Nevzorova. Comparative analysis of methods for topic modeling of mathematical documents
(PDF)
Abstract
The comparison of three topic modeling methods - LDA, NMF,
and BERTopic - was conducted across diverse collections of
mathematical articles. In the initial experiment using
articles from "Izvestiya VUZov. Matematika," LDA exhibited
superior performance based on the CV Coherence metric,
although NMF also yielded commendable results. Conversely,
BERTopic's thematic classes proved less interpretable
compared to LDA and NMF.
In the subsequent experiment, employing articles from the
same journal but with vocabulary derived from OntoMathPRO
ontology concepts, LDA again demonstrated favorable metric
outcomes. However, BERTopic showcased enhanced
interpretability of thematic classes compared to LDA and
NMF.
The third experiment, conducted on a combined collection
from two journals with vocabulary compiled via frequency
truncation and OntoMathPRO ontology concepts, reaffirmed
LDA's superiority in terms of CV Coherence. Nevertheless,
NMF exhibited significantly greater interpretability.
Hence, it is evident that each topic modeling method
possesses distinct advantages and constraints depending on
the context and assumptions. Choosing the appropriate data
preprocessing method is pivotal, as it significantly
impacts modeling outcomes. Additionally, different data
preprocessing approaches influence the interpretability of
thematic classes for each method.
[short] Alexander Sychev. Анализ тематической модели для размеченной коллекции текстовых сообщений на основе подхода Word2Vec
(PDF)
Abstract
В статье рассматривается проблема тематического
моделирования
и оценки тематических моделей, представленных в размеченных
наборах текстовых сообщений на основе модели векторного
представления слов Word2vec.
Построенные в результате анализа векторов слов кластеры
могут быть использованы для различных задач, в том числе
для диагностики тематической модели, представленной в
размеченной коллекции текстовых сообщений. Для этого
предложено вычислять матрицу пересечений между кластерами
словаря,
сформированного для всего корпуса текстов, и словарями
тематических подмножеств в корпусе. В работе представлены и
обсуждаются результаты машинного эксперимента применительно
к коллекции новостных сообщений одного из региональных
сетевых изданий. Результаты эксперимента показывают
практическую возможность проведения диагностики
существующей системы
тематических рубрик в коллекции текстовых сообщений и
определения направлений ее возможной реорганизации.
SESSION 14. Data Extraction and Storage. Room 126. Chair: Roman Samarev
Alexey Shigarov. Regular Table Language for Data Extraction from Tables Presented in Electronic Documents
(PDF)
Abstract
The paper presents "Regular Table Language" (RTL), a novel
domain-specific language for extracting recordsets from
arbitrary tables represented as parts of electronic
documents in machine-readable formats (e.g. spreadsheets,
text processors, or HTML). It is based on the hypothesis
that any of such tables can be matched to a pattern
specifying its structure sufficient to extract the required
data from it. Moreover, a whole class of tables can be
successfully matched to such a pattern. We propose an
"Interpretable Table Model" (ITM) as a mediator between
source tables and target recordsets. It extends the table
structure found in wide-spread sources (spreadsheets, text
processors, or HTML) by adding the semantics that enables
to automatically inference recordsets from tables. RTL
provides a way to formally express some patterns of table
structure in a declarative and laconic manner. The
semantics missing in a source table is recovered as a
result of matching the corresponding instance of ITM with
an appropriate RTL-pattern. ITM and RTL have been
implemented as main parts of Regtab, an open-source
software library that simplifies the development of custom
applications for data extraction from arbitrary tables
presented in electronic documents.
Evgenii Stepanov and Alexey Mitsyuk. bXES: a Binary Format For Storing and Transferring Software Event Logs
(PDF)
Abstract
Modern software produces a lot of events which can be
analyzed using process mining techniques. The first step in
any process mining pipeline is the collection of the event
logs. Then, those event logs need to be stored persistently
on the disk. The problem is, that software event logs
usually consist of many events, each of which can specify
tens of attributes. In such a context, the event log stored
in XML-based XES format consumes tremendous amount of
memory. Moreover, it is not that read-friendly, i.e. does
not provide tools with any advantage while reading a XES
file. In this paper, we present bXES, a binary format for
storing and transferring event logs, especially software
event logs. We highlight main characteristics of software
event logs which are utilized in the format scheme, next we
describe the format. Finally, we conduct experiments in
order to demonstrate the bXES compatibility. The
experiments are conducted with real-life business process
and software event logs.
[short] Евгений Александров, Игорь Александров, Дмитрий Беляков, Наталья Давыдова, Петр Зрелов, Лидия Калмыкова, Мария Любимова, Татьяна Сапожникова, Татьяна Сыресина and Александр Яковлев. Возможности разработанной в ЛИТ информационной системы сопровождения лицензий
(PDF)
Abstract
Основная цель создания информационной системы сопровождения
лицензий (ИССЛ) - автоматизация управления, приобретения,
обслуживания и использования лицензионных программных
продуктов. Представлены результаты развития системы за
последние два года. Реализован механизм согласования
заявок, и на его основе созданы различные заявки: заявки
пользователя на новую лицензию, на добавление новых
программных продуктов в каталог поддерживаемого ПО, заявки
аудитора на закупку дополнительных лицензий. Выполнены
работы по наполнению базы данных ИССЛ, удобному
представлению информации о лицензиях и различной
статистической информации, а также интеграции с другими
сервисами в рамках Цифровой экосистемы ОИЯИ.
SESSION 15. Information extraction from text III. Room 125. Chair: Natalia Loukachevitch
Евгений Владимирович Волков and Борис Викторович Добров. Автоматическое транскрибирование OOV слов для улучшения распознавания терминов предметной области
(PDF)
Abstract
Одним из популярных методов улучшения качества
распознавания незнакомых модели (OOV, от англ. out of
vocabulary – отсутствующих в словаре) слов является метод
расширения словаря (vocabulary expansion). OOV слова
особенно часто встречаются в предметных областях: это
термины и терминоподобные словосочетания. В русской речи
многие из них недавно заимствованы из английского языка и
еще не имеют принятой записи на русском языке. Это
затрудняет добавление таких слов в словарь модели для
моделей, не являющихся мультиязычными, что снижает качество
их работы на аудио из предметных областей. В данной работе
представлен метод улучшения качества распознавания речи,
содержащей такие OOV термины, на основе алгоритма
автоматического построения для произвольного английского
слова т.н. русских транскрипций. Русская транскрипция – это
сходно звучащее и записанное русскими буквами слово,
которое могло бы заменить исходный термин при добавлении
его в словарь модели, и таким образом сделать возможным его
корректное распознавание. Результаты проведенных
экспериментов позволяют говорить о возможности применения
описанного алгоритма для улучшения качества работы ASR
систем.
Artem Prosvetov, Alexey Matveev and Alexandr Andreev. Decoding the Past: Building a Comprehensive Glagolitic Dataset for Historical Text Analysis
(PDF)
Abstract
The Glagolitic script, one of the oldest known Slavic
scripts, presents a substantial challenge for historical
manuscript decryption due to its intricate glyph forms and
limited existing digital resources. This paper introduces a
novel dataset of Glagolitic letters aimed at facilitating
the application of machine learning algorithms in the
decryption of historical documents. The dataset creation
process comprised several critical stages: collection of
raw data, preparation of images, application of neural
networks for letter extraction, clustering of images,
training of models to discern noise, and manual validation
and annotation of rare letters.
The resultant dataset stands as the first publicly
accessible Glagolic script resource tailored for deep
learning applications in historical document analysis.
[short] Andrey Lovyagin and Boris Dobrov. Verifying Factographic Content in Narrative Texts
(PDF)
Abstract
This research thoroughly examines both traditional and
modern methods for automating information verification,
with a specific focus on analyzing essays and texts
containing dated content. It introduces three new
techniques — CHECK-S, CHECK-V, and CHECK-U — for analyzing
texts with various attributes, and develops a new data
storage architecture, the "Reverse Index Tree", to enhance
the efficiency of the CHECK-S method. Additionally, this
study presents a new approach to contrastive learning,
"Refinement Contrastive Learning", which has been tested in
competitive environments and has shown substantial
improvements over existing methods, setting new performance
standards.
The findings from this study indicate significant
enhancements in effectiveness compared to traditional
methods and those used by previous leaders in academic
competitions. The new method, along with its underlying
data architecture and training approaches, demonstrates
considerable advantages, affirming the effectiveness and
potential of the proposed solutions in improving automated
information verification for essays and historically dated
texts.
[short] Olga Gavenko and Sofia Obersht. Development and implementation of software application for comparative analysis of the estimates of the complexity of text data
(PDF)
Abstract
The complexity of the text is a complex concept consisting
of difficultness, readability and comprehensibility and
describing the text and its structure depending on how they
influence on the processing of information. The
determination of text complexity has applied significance
in the fields where understanding and processing of
information and knowledges are important: educational
literature, legislative and other documents, journalistic
literature. Subjective parameters of text include empirical
data on the reader’s perception of the text, physical and
cognitive abilities, knowledge and education of an
individual in certain area and experience in general.
Objective parameters can be divided into quantitative such
as length, frequency of usage or amount of language tokens,
and qualitative parameters which are related to the
analysis of linguistic means of categorical language levels
and their implementation in a definite text. The task
becomes more complicated when the complexity of large text
data needs to be estimated.
Defining text as a character sequence, the estimating model
of complexity can be developed, the parameters are the
objective parameters of text, the choice of which, as well
as methods of complexity estimation can vary depending on
the tasks; most of the formulas of readability estimates
are based on the linear-regression model. Since the
formulas are considered to be universal, the goal of this
paper is the development and implementation of software
application in Python able to do the comparative analysis
of existing readability estimates for text data; the basic
formulas for English and adapted for the Russian are
observed. School textbooks on Social Studies (5-11 classes)
included in the Russian Readability Corpus and textbooks on
History (6-11 classes) make the test sample. The
experiments with the text corpus data shows the series of
incorrect results what can be explained by the fact that
the model development based on the texts of different
genres and styles having various linguistic means,
terminology and structure. The formulas developed for the
English language give less accurate results compared to
adapted ones, what is due to the difference in languages;
in addition, the fact, that quantitative parameters may not
be sufficient to obtain reliable results, should be taken
into
account when expanding corpus data.
SESSION 16. Data Analysis in Earth Sciences. Room 126. Chair: Nikolay Skvortsov
Евгений Вязилов and Наталия Пузова. Перспективы использования средств искусственного интеллекта в гидрометеорологии
(PDF)
Abstract
Инструменты искусственного интеллекта включают программных
роботов, которые могут решать рутинные задачи интенсивной
обработки. например, загрузка и контроль данных; проведение
аналитики, обнаружение опасных явлений в потоках
оперативных данных и принятие решений на основе
климатических, прогностических и наблюденных данных.
Нейронные сети, машинное обучение, глубокое обучение могут
применяться для лучшего понимания и прогнозирования погоды.
В статье представлен широкий спектр направлений применения
искусственного интеллекта в гидрометеорологии, связанный со
сбором, поиском данных на основе метаданных, организацией
доступа к данным на основе взаимодействия с чат-ботом,
прогнозами гидрометеорологических процессов, возможных
воздействии опасных явлений на предприятия и население,
обучением населения и руководителей поведению в период
прохождения опасных явлений.
[short] Nikolai Lavrentiev, Alexey Akhlyostin, Alexey Privezentsev and Alexander Fazliev. Data quality assessment in large spectral data collections. States and transitions
(PDF)
Abstract
Two groups of spectral data collections on molecular states
and transitions are considered in this work. The main goal
of the work is to improve the spectral data quality, to
find out into what parts these groups should be divided,
and how the
groups are related to each other when analyzing the quality
of states and transitions collected. The approach to the
formation of empirical states is clarified, and a specific
problem is considered using collections of data on the
basic isotopologue
of the water molecule as an example. This paper continues
the investigation of spectral data quality started in [1].
To improve data quality, this work applies filtering first
using empirical states and then the unique set of states
that are not identical to the empirical states. The
additional filtering allowed us to add about 500 correct
transitions to the empirical states.
[short] Vladimir Budzko and Victor Medennikov. Экосистемный Подход к Стратегическому Управлению на Примере Сельского Хозяйства
(PDF)
Abstract
Рассматривается трансформация методов и моделей
стратегического управления на основе экосистемного подхода
в рамках формирования единой цифровой платформы управления
экономикой. Экосистемный подход к социально-экономическому
развитию общества набирает все большую популярность,
продиктованную общемировым социальным заказом на защиту
окружающей среды и на бережное отношение к расходованию
ограниченных природных ресурсов. Экологические проблемы в
сельском хозяйстве России нарастают, в частности из-за
активного процесса формирования агропромышленных
объединений, в основном в виде агрохолдингов. При этом
возникает проблема системного подхода к необходимости
применения технологий интеграции всех видов участвующих в
производстве ресурсов, с учетом роста числа и значения
факторов внешней среды. В качестве основного метода
исследования стратегического управления предлагается
математическое моделирование, которое в отличие от
большинства используемых моделей, носящих при этом зачастую
иконографический вид, позволяет учесть значительно большее
количество факторов и дает возможность в режиме имитации
рассчитать различные варианты развития объектов
моделирования. В результате исследований была разработана
математическая модель стратегического управления
агрохолдинга в целях его устойчивого развития. Показано,
что разработка стратегии его развития должна осуществляться
в тесной увязке с внедрением соответствующей
автоматизированной системы управления холдингом, что
приведет к коренному изменению всей системы его управления,
а также производства, что позволит объединению при
стратегическом целеполагании ориентироваться вслед за
мировыми тенденциями в первую очередь на качество,
прослеживаемость, другие составляющие
конкурентоспособности. Рассмотренная оригинальная авторская
математическая модель дает научное обоснование единых
методов цифровизации в долгосрочном плане как больших,
многоотраслевых аграрных объединений после отработки на
некотором множестве их, так и средних и малых хозяйств,
которые смогут работать с агрохолдингами на принципах
аутсорсинга.
Submission deadline for papers | June 17, 2024 |
Submission deadline for tutorials | June 3, 2024 |
Notification for the first round | August 12, 2024 |
Deadline for revised versions of the papers forwarded to the second round of reviewing | September 2, 2024 |
Final notification of acceptance | September 16, 2024 |
Deadline for camera-ready versions of the accepted papers | September 16, 2024 |
Conference | October 23-25, 2024 |