Tomita, Hisako

写真a

Affiliation

Faculty of Science and Technology, Department of Information and Computer Science ( Yagami )

Position

Assistant Professor (Non-tenured)/Research Associate (Non-tenured)/Instructor (Non-tenured)

 

Papers 【 Display / hide

  • An analysis on the effect of body tissues and surgical tools on workflow recognition in first person surgical videos

    Tomita H., Ienaga N., Kajita H., Hayashida T., Sugimoto M.

    International Journal of Computer Assisted Radiology and Surgery 19 ( 11 ) 2195 - 2202 2024.11

    ISSN  18616410

     View Summary

    Purpose : Analysis of operative fields is expected to aid in estimating procedural workflow and evaluating surgeons’ procedural skills by considering the temporal transitions during the progression of the surgery. This study aims to propose an automatic recognition system for the procedural workflow by employing machine learning techniques to identify and distinguish elements in the operative field, including body tissues such as fat, muscle, and dermis, along with surgical tools. Methods : We conducted annotations on approximately 908 first-person-view images of breast surgery to facilitate segmentation. The annotated images were used to train a pixel-level classifier based on Mask R-CNN. To assess the impact on procedural workflow recognition, we annotated an additional 43,007 images. The network, structured on the Transformer architecture, was then trained with surgical images incorporating masks for body tissues and surgical tools. Results : The instance segmentation of each body tissue in the segmentation phase provided insights into the trend of area transitions for each tissue. Simultaneously, the spatial features of the surgical tools were effectively captured. In regard to the accuracy of procedural workflow recognition, accounting for body tissues led to an average improvement of 3 % over the baseline. Furthermore, the inclusion of surgical tools yielded an additional increase in accuracy by 4 % compared to the baseline. Conclusion : In this study, we revealed the contribution of the temporal transition of the body tissues and surgical tools spatial features to recognize procedural workflow in first-person-view surgical videos. Body tissues, especially in open surgery, can be a crucial element. This study suggests that further improvements can be achieved by accurately identifying surgical tools specific to each procedural workflow step.

  • Spatiotemporal Video Highlight by Neural Network Considering Gaze and Hands of Surgeon in Egocentric Surgical Videos

    Yoshida K., Hachiuma R., Tomita H., Pan J., Kitani K., Kajita H., Hayashida T., Sugimoto M.

    Journal of Medical Robotics Research 7 ( 1 )  2022.03

    ISSN  2424905X

     View Summary

    In the medical field, surgical videos can be used to introduce surgical skills. Medical students and residents watch the videos to study the surgical skills and increase learning speed by compensating for the lack of experience in surgical rooms due to limited opportunity to join in surgery. To record egocentric surgical videos by a wearable camera is a solution to record surgical skills of a surgeon in detail. However, most egocentric surgical videos are of quite long duration. For example, in the case of tumor removal in breast surgery, a video recording time often reaches 2h. With that length, it is time consuming to see important scenes in the video, particularly because many surgical videos include nonessential scenes such as sterilization and preparation of tools. For extracting specific scenes from a long video, we can apply scene estimation by machine learning. Furthermore, it is important to know where the surgeon is looking to observe the area of the incision in detail. In particular, it is vital to be able to zoom in on key elements, allowing viewers to see the incision area and the fine details of the necessary surgical skills. In this study, we aimed to highlight incision scenes from egocentric surgical videos in the spatiotemporal domain by utilizing two neural networks for the temporal and spatial highlights. For the temporal highlights, we designed a neural network that estimates the incision scenes by learning gaze speed, hand movements, number of hands, and background movements in egocentric surgical videos. For the spatial highlights, in order to estimate the important area to zoom in, we designed a neural network that learns the surgeon's gaze on natural features of surgical scenes to form a probability map as a representation of the estimated gaze area. The estimated gaze area was also used to calculate the appropriate zoom-in position and zoom-in ratio. To control the highlighted parameters in accord with user preferences, we also made a user interface that allows for the selection of playback speed gain and zoom ratio gain. For the evaluation, we verified the performance of the networks by a quantitative assessment and conducted a user study with medical doctors by showing an actual surgical video to obtain a qualitative assessment on the proposed system.

 

Courses Taught 【 Display / hide

  • LABORATORIES IN SCIENCE AND TECHNOLOGY

    2026

  • INTRODUCTION TO COMPUTER PROGRAMMING(LECTURE AND EXERCISE)

    2026