Определение обфускации JavaScript-программ с помощью раскрасок на абстрактных синтаксических деревьях

Главная

Статья в Elpub

РУС ENG

Математика и математическое моделирование. 2020; : 1-24

Определение обфускации JavaScript-программ с помощью раскрасок на абстрактных синтаксических деревьях

Пономаренко Г. С., Ключарёв П. Г.

https://doi.org/10.24108/mathm.0220.0000218

Аннотация

В работе анализируется способ определения обфускации и вида используемого обфускатора методами машинного обучения при помощи данных о раскраске по типу вершин абстрактного синтаксического дерева (АСД) программы на языке JavaScript. Цвета вершин и рёбер назначаются в соответствии с типами вершин АСД, которые в свою очередь определяются лексической и синтаксической структурой программы и стандартом языка программирования. Исследование состояло из нескольких этапов. В начале был собран набор необфусцированных программ. После это создан набор обфусцированных программ при помощи восьми программ-обфускаторов с открытым исходным кодом. Классификаторы строились на основе алгоритма градиентного бустинга на решающих деревьях. Были построены модели, которые классифицировали программы по типу используемого обфускатора и по признаку обфусцированности. Модели, классифицирующие по признаку обфусцированности, детектировали образцы, обфусцированные в т.ч. теми обфускаторами, образцы для которых не входили в обучающую выборку. Качество полученных моделей находится на одном уровне с известными в литературе результатами. Предлагаемый в работе метод выделения признаков, подаваемых на вход классификатору, не требует предварительного анализа самих обфускаторов и знания обфусцирующих преобразований. В конце работы приводится анализ качества полученных моделей и рассматриваются некоторые статистические свойства полученного набора образцов обфусцированного кода. Анализ сгенерированных образцов обфусцированных программ показал, что предложенный в статье метод имеет некоторые ограничения, в частности, затруднено распознавание минификаторов и прочих обфусцирующих программ, в большей степени изменяющих лексическую структуру, и в меньшей — синтаксическую. Для улучшения качества детектирования запутывающих преобразований такого рода можно строить комбинированные классификаторы, использующие как метод, основанный на данных о раскраске АСД, так и дополнительную информацию о лексемах и пунктуации, например, данные об энтропии, пропорции символов в верхнем и нижнем регистре, частоте употребления определённых символов и т.д.

Список литературы

1. Collberg C., Thomborson C., Low D. A taxonomy of obfuscating transformations // New Zealand. Univ. of Auckland. Dep. of Computer Science. Technical report. 1997. No. 148. 36 p.

2. Cesare S., Yang Xiang. Software similarity and classification. L.; N.Y.: Springer, 2012. 88 p.

3. Curtsinger C., Livshits B., Zorn B.G., Seifert C. ZOZZLE: Fast and precise in-browser JavaScript malware detection // 20th USENIX security symp. (San Francisco, CA, USA, August 10-12, 2011): Proc. Berkeley: USENIX Assoc., 2011. Pp. 33-48.

4. Kapravelos A., Shoshitaishvili Y., Cova M., Kruegel C., Vigna G. Revolver: An automated approach to the detection of evasive web-based malware // 22nd USENIX security symp. (Washington. DC, USA, August 14-16, 2013): Proc. Berkeley: USENIX Assoc., 2013. Pp. 637-651.

5. Fass A., Krawczyk R.P., Backes M., Stock B. JaSt: Fully syntactic detection of malicious (obfuscated) JavaScript // Detection of intrusions and malware and vulnerability assessment: 15th intern. conf. on detection of intrusions and malware and vulnerability assessment: DIMVA 2018 (Saclay, France, June 28-29, 2018): Proc. Cham: Springer, 2018. Pp. 303-325. DOI: 10.1007/978-3-319-93411-2_14

6. Junjie Wang, Yinxing Xue, Yang Liu, Tian Huat Tan. JSDC: A hybrid approach for JavaScript malware detection and classification // 10th ACM symp. on information, computer and communications security: ASIA CCS’15 (Singapore, April 14-17, 2015): Proc. N.Y.: ACM, 2015. Pp. 109-120. DOI: 10.1145/2714576.2714620

7. Blanc G., Miyamoto D., Akiyama M., Kadobayashi Y. Characterizing obfuscated JavaScript using abstract syntax trees: Experimenting with malicious scripts // 26th intern. conf. on advanced information networking and applications workshops (Fukuoka, Japan, March 26-29, 2012): Proc. N.Y.: IEEE, 2012. Pp. 344-351. DOI: 10.1109/WAINA.2012.140

8. Tellenbach B., Paganoni S., Rennhard M. Detecting obfuscated JavaScripts from known and unknown obfuscators using machine learning // Intern. J. on Advances in Security. 2016. Vol. 9. No. 3-4. Pp. 196-206. DOI: 10.21256/zhaw-1537

9. Ndichu S., Kim S., Ozawa S., Misu T., Makishima K. A machine learning approach to detection of JavaScript-based attacks using AST features and paragraph vectors // Applied Soft Computing. 2019. Vol. 84. Article105721. DOI: 10.1016/j.asoc.2019.105721

10. ECMAScript 2019 Language Specification. Режим доступа: https://www.ecma-international.org/ecma-262/10.0/index.html (дата обращения: 20.03.2020).

11. Friedman J.H. Greedy function approximation: a gradient boosting machine // Annals of Statistics. 2001. Vol. 29. No. 5. Pp. 1189-1232.

12. Сервис GitHub [Электрон. ресурс]. Режим доступа: https://github.com/ (дата обращения: 20.03.2020).

13. Acornjs/acorn [Электрон. ресурс]. Режим доступа: https://github.com/acornjs/acorn (дата обращения: 20.03.2020).

14. Alexhorn/defendjs [Электрон. ресурс]. Режим доступа: https://github.com/alexhorn/defendjs (дата обращения: 20.03.2020).

15. Gnirts: Obfuscate string literals in JavaScript code [Электрон. ресурс]. Режим доступа: https://anseki.github.io/gnirts/ (дата обращения: 20.03.2020).

16. JavaScript obfuscator tool [Электрон. ресурс]. Режим доступа: https://obfuscator.io/ (дата обращения: 20.03.2020).

17. Zswang/jfogs [Электрон. ресурс]. Режим доступа: https://github.com/zswang/jfogs (дата обращения: 20.03.2020).

18. JScrewlt [Электрон. ресурс]. Режим доступа: https://jscrew.it/ (дата обращения: 20.03.2020).

19. UglifyJS: JavaScript compressor/minifier [Электрон. ресурс]. Режим доступа: http://lisperator.net/uglifyjs/ (дата обращения: 20.03.2020).

20. Closure tools [Электрон. ресурс]. Режим доступа: https://developers.google.com/closure (дата обращения: 20.03.2020).

21. Huu-Danh Pham, Tuan Dinh Le, Vu Thanh Nguyen. Static PE malware detection using gradient boosting decision trees algorithm // Future data and security engineering: Intern. conf. on future data and security engineering: FDSE 2018 (Ho Chi Minh City, Vietnam, November 28-30, 2018): Proc. Cham: Springer, 2018. Pp. 228-236. DOI: 10.1007/978-3-030-03192-3_17

22. Singh L., Hofmann M. Dynamic behavior analysis of android applications for malware detection // Intern. conf. on intelligent communication and computational techniques: ICCT 2017 (Jaipur, India, December 22-23, 2017): Proc. N.Y.: IEEE, 2018. Pp. 1-7. DOI: 10.1109/intelcct.2017.8324010

23. Handong Cui, Delu Huang, Yong Fang, Liang Liu, Cheng Huang. Webshell detection based on random forest–gradient boosting decision tree algorithm // 3rd intern. conf. on data science in cyberspace: DSC 2018 (Guangzhou, China, June 18-21, 2018): Proc. N.Y.: IEEE, 2018. Pp. 153-160. DOI: 10.1109/DSC.2018.00030

24. Pogosova M. Detecting obfuscated scripts with machine-learning techniques: Cand. diss. Helsinki: Aalto Univ., 2020. 58 p. Режим доступа: https://aaltodoc.aalto.fi/bitstream/handle/123456789/43575/master_Pogosova_Mariam_2020.pdf?sequence=1&isAllowed=y (дата обращения 28.06.2020).

25. Hyafil L., Rivest R.L. Constructing optimal binary decision trees is NP-complete // Information Processing Letters. 1976. Vol. 5. No. 1. Pp. 15-17. DOI: 10.1016/0020-0190(76)90095-8

26. Prokhorenkova L., Gusev G., Vorobev A., Dorogush A.V., Gulin A. CatBoost: unbiased boosting with categorical features // NIPS 2018: 32nd conf. on neural information processing systems (Montreal, Canada, December 3-8, 2018): Proc. Red Hook: Curran Assoc. Inc., 2019. Pp. 6639-6649.

27. Dorogush A.V., Ershov V., Gulin A. CatBoost: gradient boosting with categorical features support. Режим доступа: https://arxiv.org/pdf/1810.11363.pdf (дата обращения 28.06.2020).

28. Fass A., Backes M., Stock B. HideNoSeek: Camouflaging malicious JavaScript in Benign ASTs // ACM SIGSAC conf. on computer and communications security: CCS’19 (London, UK, November 11-15, 2019): Proc. N.Y.: ACM, 2019. Pp. 1899-1913. DOI: 10.1145/3319535.3345656

Mathematics and Mathematical Modeling. 2020; : 1-24

Detection of Obfuscated Javascript Code Based on Abstract Syntax Trees Coloring

Ponomarenko G. S., Klyucharev P. G.

https://doi.org/10.24108/mathm.0220.0000218

Abstract

The paper deals with a problem of the obfuscated JavaScript code detection and classification based on Abstract Syntax Trees (AST) coloring. Colors of the AST vertexes and edges are assigned with regard to the types of the AST vertexes specified by the program lexical and syntax structure and the programming language standard. Research involved a few stages. First of the all, a non-obfuscated JavaScript programs dataset was collected by the public repositories evaluation. Secondly, obfuscated samples were created using eight open-source obfuscators. Classifier models were built using an algorithm of gradient boosting on the decision trees (GBDT). We built two types of the classifiers. The first one is the model that classifies the program according to the type of the obfuscator used, i.e. based on what obfuscator created the sample. The second one tries to detect samples obfuscated by the obfuscator whose samples are not observed during training. The quality of the obtained models is on par with the known published results. The feature engineering method proposed in the paper does not require a preliminary analysis of the obfuscators and obfuscating transformations. In the final part of the paper we analyze a quality of models estimated, discussing the certain statistical properties of the obfuscated and non-obfuscated samples obtained and corresponding colored ASTs. Analysis of generated samples of obfuscated programs has shown that the method proposed in the paper has some limitations. In particular, it is difficult to recognize minifiers or other obfuscating programs, which change the lexical structure to a greater extent and the syntax to a lesser extent. To improve the quality of detection of this kind of obscuring transformations, one can built combined classifiers using both the method based on the AST coloring and the additional information about lexemes and punctuation, for example, entropy of identifiers and strings, proportion of characters in upper and lower case, usage frequency of certain characters etc.

References

1. Collberg C., Thomborson C., Low D. A taxonomy of obfuscating transformations // New Zealand. Univ. of Auckland. Dep. of Computer Science. Technical report. 1997. No. 148. 36 p.

2. Cesare S., Yang Xiang. Software similarity and classification. L.; N.Y.: Springer, 2012. 88 p.

10. ECMAScript 2019 Language Specification. Rezhim dostupa: https://www.ecma-international.org/ecma-262/10.0/index.html (data obrashcheniya: 20.03.2020).

11. Friedman J.H. Greedy function approximation: a gradient boosting machine // Annals of Statistics. 2001. Vol. 29. No. 5. Pp. 1189-1232.

12. Servis GitHub [Elektron. resurs]. Rezhim dostupa: https://github.com/ (data obrashcheniya: 20.03.2020).

13. Acornjs/acorn [Elektron. resurs]. Rezhim dostupa: https://github.com/acornjs/acorn (data obrashcheniya: 20.03.2020).

14. Alexhorn/defendjs [Elektron. resurs]. Rezhim dostupa: https://github.com/alexhorn/defendjs (data obrashcheniya: 20.03.2020).

15. Gnirts: Obfuscate string literals in JavaScript code [Elektron. resurs]. Rezhim dostupa: https://anseki.github.io/gnirts/ (data obrashcheniya: 20.03.2020).

16. JavaScript obfuscator tool [Elektron. resurs]. Rezhim dostupa: https://obfuscator.io/ (data obrashcheniya: 20.03.2020).

17. Zswang/jfogs [Elektron. resurs]. Rezhim dostupa: https://github.com/zswang/jfogs (data obrashcheniya: 20.03.2020).

18. JScrewlt [Elektron. resurs]. Rezhim dostupa: https://jscrew.it/ (data obrashcheniya: 20.03.2020).

19. UglifyJS: JavaScript compressor/minifier [Elektron. resurs]. Rezhim dostupa: http://lisperator.net/uglifyjs/ (data obrashcheniya: 20.03.2020).

20. Closure tools [Elektron. resurs]. Rezhim dostupa: https://developers.google.com/closure (data obrashcheniya: 20.03.2020).

24. Pogosova M. Detecting obfuscated scripts with machine-learning techniques: Cand. diss. Helsinki: Aalto Univ., 2020. 58 p. Rezhim dostupa: https://aaltodoc.aalto.fi/bitstream/handle/123456789/43575/master_Pogosova_Mariam_2020.pdf?sequence=1&isAllowed=y (data obrashcheniya 28.06.2020).

25. Hyafil L., Rivest R.L. Constructing optimal binary decision trees is NP-complete // Information Processing Letters. 1976. Vol. 5. No. 1. Pp. 15-17. DOI: 10.1016/0020-0190(76)90095-8

27. Dorogush A.V., Ershov V., Gulin A. CatBoost: gradient boosting with categorical features support. Rezhim dostupa: https://arxiv.org/pdf/1810.11363.pdf (data obrashcheniya 28.06.2020).

Определение обфускации JavaScript-программ с помощью раскрасок на абстрактных синтаксических деревьях

Аннотация

Detection of Obfuscated Javascript Code Based on Abstract Syntax Trees Coloring

Abstract

События

EasyCookieInfo